Bug 56914 - Samba (AD DC) restart may fail due to network sockets still being occupied by left-over processes - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
Samba (AD DC) restart may fail due to network sockets still being occupied by...
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Samba4
UCS 5.0
Other Linux
: P5 normal (vote)
: UCS 5.0-6-errata
Assigned To: Arvid Requate
Julia Bremer
:
: 55487 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2023-12-14 14:15 CET by Christian Kowarzik
Modified: 2024-03-07 13:07 CET (History)
7 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 3: Will affect average number of installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.429
Enterprise Customer affected?: Yes
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2023120121000312, 2024022621000211
Bug group (optional):
Max CVSS v3 score:


Attachments
system calls which involve process management while stopping samba-ad-dc (29.44 KB, text/plain)
2023-12-14 14:17 CET, Christian Kowarzik
Details
patch to /etc/init.d/samba-ad-dc (2.21 KB, patch)
2023-12-14 14:18 CET, Christian Kowarzik
Details | Diff
systemd service unit file for the samba-ad-dc service (426 bytes, text/x-dbus-service)
2023-12-14 14:20 CET, Christian Kowarzik
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Kowarzik 2023-12-14 14:15:58 CET
Table of contents:

1.) System-Info
2.) Abstract
3.) Log extracts from selected incidents of error
    "Failed to listen on ::1:135" and related cause for Samba (AD DC) restarts
4.) Analysis of the stop-branch of /etc/init.d/samba-ad-dc
    or: Why processes may be left-over after stopping samba-ad-dc
5.) Patch to /etc/init.d/samba-ad-dc
    or: How to terminate and finally kill all processes when stopping samba-ad-dc


1.) System-Info

UCS: 5.0-5 errata897
Installed: dhcp-server=12.0 prometheus-node-exporter=2.0.1 samba4=4.16

samba-4.18.3-1A~5.0.0.202311301306
univention-samba4-9.0.14-5

2.) Abstract

On a client system, restarting Samba (AD DC) failed on several occasions with error "Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED".
These Samba (AD DC) restarts were triggered by server_password_change or postinst-scripts during package updates (univention-samba4, samba).

We know that as part of the Samba (AD DC) restart sequence, samba-ad-dc is stopped and subsequently started again.
However, it turned out that the call "/etc/init.d/samba-ad-dc stop" returns without ensuring that all processes have terminated successfully.
This way, the subsequent call of "/etc/init.d/samba-ad-dc start" may fail due to left-over processes that still occupy network sockets.

Waiting some seconds between stopping and starting samba-ad-dc does not solve the problem, as the time needed for freeing the occupied network sockets may exceed the waiting time.
Also systemd will not kill left-over processes after samba-ad-dc stop, as the unit file for the samba-ad-dc service is generated by systemd-sysv-generator with hard-coded "KillMode=process" (see "systemctl cat samba-ad-dc"). This instructs systemd to kill only the main process (property "MainPID") on service stop - but here no MainPID exists, as /usr/sbin/samba is not forked off systemd directly.

One solution would be to provide a native systemd unit file for the samba-ad-dc service with the default Option "KillMode=control-group". This will instruct systemd on service stop to terminate and finally kill all remaining processes in the control group after whatever "ExecStop" command - which should asks the service to terminate and wait for it to do so - has returned.

Here I will show what exactly is happening during the call "/etc/init.d/samba-ad-dc stop" and why it does not ensure that all processes have exited successfully before returning.

Then I will provide a possible patch to /etc/init.d/samba-ad-dc which ensures that when stopping samba-ad-dc all processes of the samba main process tree will get a specified time to exit before killing remaining processes and only then continuing - as before - with terminating additional processes from outside the main samba process tree (samba-dcerpcd, samba-bgqd).

Additionally I will provide a possible native systemd unit file (adapted from upstream) which you might want to use together with the patched /etc/init.d/samba-ad-dc so that systemd will ensure that on service stop all remaining processes in the control-group will be killed after the "ExecStop" command has returned
(Side effects: When samba-ad-dc service is using a native systemd unit file, you might want to adjust samba.postinst, as this uses "invoke-rc.d --skip-systemd-native $service".)

Please find the patch, the patched version of samba-ad-dc, as well as the native systemd unit file for the samba-ad-dc service in the attachments.


===== TL;DR =====

3.) Log extracts from selected incidents of error "Failed to listen on ::1:135" and related cause for Samba (AD DC) restarts

2023-03-30: Samba (AD DC) restart failed after update to samba-2:4.16.8-1A~5.0.0.202303221744

	# /var/log/dpkg.log
	2023-03-30 02:04:08 status installed samba:amd64 2:4.16.8-1A~5.0.0.202303221744

	# journalctl
	Mär 30 02:03:57 srv-dc02 systemd[1]: Stopped LSB: Samba daemons for the AD DC.
	Mär 30 02:03:57 srv-dc02 systemd[1]: samba-ad-dc.service: Found left-over process 19025 (tfork(19031)) in control group while starting unit. Ignoring.
	Mär 30 02:03:57 srv-dc02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
	Mär 30 02:03:57 srv-dc02 systemd[1]: samba-ad-dc.service: Found left-over process 19031 (rpc(0)) in control group while starting unit. Ignoring.
	Mär 30 02:03:57 srv-dc02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
	Mär 30 02:03:57 srv-dc02 systemd[1]: Starting LSB: Samba daemons for the AD DC...

	# /var/log/samba/log.samba
	[2023/03/30 02:03:58.728147,  0, pid=5369] ../../source4/samba/service_stream.c:373(stream_setup_socket)
	  stream_setup_socket: Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
	[2023/03/30 02:03:58.728236,  0, pid=5369] ../../source4/rpc_server/dcerpc_server.c:511(add_socket_rpc_tcp_iface)
	  service_setup_stream_socket(address=::1,port=135) for epmapper mgmt failed - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
	[2023/03/30 02:03:58.728255,  0, pid=5369] ../../source4/samba/service_task.c:36(task_server_terminate)
	  task_server_terminate: task_server_terminate: [dcerpc: Failed to initialise end points]
	[2023/03/30 02:03:58.782804,  0, pid=5359] ../../source4/samba/server.c:392(samba_terminate)
	  samba_terminate: samba_terminate of samba 5359: dcerpc: Failed to initialise end points

2023-08-10: Samba (AD DC) restart failed after update to univention-samba4-9.0.13-5

	# /var/log/dpkg.log
	2023-08-10 02:03:17 status installed univention-samba4:amd64 9.0.13-5

	# journalctl
	Aug 10 02:03:14 srv-dc02 systemd[1]: Stopped LSB: Samba daemons for the AD DC.
	Aug 10 02:03:16 srv-dc02 systemd[1]: samba-ad-dc.service: Found left-over process 1641 (tfork(1642)) in control group while starting unit. Ignoring.
	Aug 10 02:03:16 srv-dc02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
	Aug 10 02:03:16 srv-dc02 systemd[1]: samba-ad-dc.service: Found left-over process 1642 (rpc(0)) in control group while starting unit. Ignoring.
	Aug 10 02:03:16 srv-dc02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
	Aug 10 02:03:16 srv-dc02 systemd[1]: Starting LSB: Samba daemons for the AD DC...

	# /var/log/samba/log.samba
	[2023/08/10 02:03:17.163674,  0, pid=7803] ../../source4/samba/service_stream.c:373(stream_setup_socket)
	  stream_setup_socket: Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
	[2023/08/10 02:03:17.163734,  0, pid=7803] ../../source4/rpc_server/dcerpc_server.c:513(add_socket_rpc_tcp_iface)
	  service_setup_stream_socket(address=::1,port=135) for epmapper mgmt failed - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
	[2023/08/10 02:03:17.163746,  0, pid=7803] ../../source4/samba/service_task.c:36(task_server_terminate)
	  task_server_terminate: task_server_terminate: [dcerpc: Failed to initialise end points]
	[2023/08/10 02:03:17.165269,  0, pid=7750] ../../source4/samba/server.c:392(samba_terminate)
	  samba_terminate: samba_terminate of samba 7750: dcerpc: Failed to initialise end points

2023-09-12: Samba (AD DC) restart failed after server_password_change

	# journalctl
	Sep 12 01:00:01 srv-dc02 CRON[23674]: (root) CMD (/usr/sbin/jitter 600 /usr/lib/univention-server/server_password_change)

	# journalctl
	Sep 12 01:04:09 srv-dc02 systemd[1]: Stopped LSB: Samba daemons for the AD DC.
	Sep 12 01:04:14 srv-dc02 systemd[1]: samba-ad-dc.service: Found left-over process 6166 (tfork(6167)) in control group while starting unit. Ignoring.
	Sep 12 01:04:14 srv-dc02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
	Sep 12 01:04:14 srv-dc02 systemd[1]: samba-ad-dc.service: Found left-over process 6167 (rpc(0)) in control group while starting unit. Ignoring.
	Sep 12 01:04:14 srv-dc02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
	Sep 12 01:04:14 srv-dc02 systemd[1]: Starting LSB: Samba daemons for the AD DC...

	# /var/log/samba/log.samba
	[2023/09/12 01:04:15.721242,  0, pid=24160] ../../source4/samba/service_stream.c:373(stream_setup_socket)
	  stream_setup_socket: Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
	[2023/09/12 01:04:15.722810,  0, pid=24160] ../../source4/rpc_server/dcerpc_server.c:513(add_socket_rpc_tcp_iface)
	  service_setup_stream_socket(address=::1,port=135) for epmapper mgmt failed - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
	[2023/09/12 01:04:15.722825,  0, pid=24160] ../../source4/samba/service_task.c:36(task_server_terminate)
	  task_server_terminate: task_server_terminate: [dcerpc: Failed to initialise end points]
	[2023/09/12 01:04:15.726274,  0, pid=24123] ../../source4/samba/server.c:392(samba_terminate)
	  samba_terminate: samba_terminate of samba 24123: dcerpc: Failed to initialise end points


4.) Analysis of the stop-branch of /etc/init.d/samba-ad-dc
    or: Why processes may be left-over after stopping samba-ad-dc.

First lets find out which process is occupying the network socket mentioned by the error "Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED".

	root@srv-dc02:~# netstat --numeric --listening --program --inet6 | sed --quiet '1,2p; /::1:135/p'
	Active Internet connections (only servers)
	Proto Recv-Q Send-Q Local Address	   Foreign Address	 State       PID/Program name    
	tcp6       0      0 ::1:135		 :::*		    LISTEN      4277/samba: task[rp 
       
We see, it's the process "samba: task[rpc] pre-forked worker(0)" (/proc/pid/cmdline) aka "rpc(0)" (/proc/pid/comm), here with PID 4277.

Then lets see how the main samba process tree looks like and realize that the process with PID 4277 is a child-process of the samba process tree.
 
	root@srv-dc02:~# pstree --long --arguments --show-pids --show-pgids $(pgrep --oldest samba$)
	samba,4247,4247
	  |-tfork(4266),4265,4247
	  |   `-s3fs[master],4266,4247
	  |       `-tfork(4272),4268,4247
	  |	   `-smbd,4272,4272 -D --option=server role check:inhibit=yes --foreground 
	  |	       |-smbd-cleanupd,4314,4272						       
	  |	       `-smbd-notifyd,4313,4272 .						       
	  |-tfork(4269),4267,4247
	  |   `-rpc[master],4269,4247
	  |       |-tfork(4277),4275,4247
	  |       |   `-rpc(0),4277,4247     <=====<=====<=====<= PID 4277 ==============
	  |       |-tfork(4281),4279,4247
	  |       |   `-rpc(1),4281,4247
	  |       |-tfork(4285),4282,4247
	  |       |   `-rpc(2),4285,4247
	  |       `-tfork(4290),4286,4247
	  |	   `-rpc(3),4290,4247
	  |-tfork(4271),4270,4247
	  |   `-wrepl[master],4271,4247
	  |-tfork(4274),4273,4247
	  |   `-ldap[master],4274,4247
	  |       |-tfork(4316),4315,4247
	  |       |   `-ldap(0),4316,4247
	  |       |-tfork(4318),4317,4247
	  |       |   `-ldap(1),4318,4247
	  |       |-tfork(4320),4319,4247
	  |       |   `-ldap(2),4320,4247
	  |       `-tfork(4322),4321,4247
	  |	   `-ldap(3),4322,4247
	  |-tfork(4278),4276,4247
	  |   `-cldap[master],4278,4247
	  |-tfork(4283),4280,4247
	  |   `-kdc[master],4283,4247
	  |       |-tfork(4291),4287,4247
	  |       |   `-kdc(0),4291,4247
	  |       |-tfork(4297),4292,4247
	  |       |   `-kdc(1),4297,4247
	  |       |-tfork(4303),4299,4247
	  |       |   `-kdc(2),4303,4247
	  |       `-tfork(4306),4304,4247
	  |	   `-kdc(3),4306,4247
	  |-tfork(4288),4284,4247
	  |   `-drepl[master],4288,4247
	  |-tfork(4293),4289,4247
	  |   `-winbindd[master,4293,4247
	  |       `-tfork(4301),4295,4247
	  |	   `-winbindd,4301,4301 -D --option=server role check:inhibit=yes --foreground 
	  |	       `-wb[TAS-GL],4323,4301					   
	  |-tfork(4296),4294,4247
	  |   `-ntp_signd[maste,4296,4247
	  |-tfork(4300),4298,4247
	  |   `-kcc[master],4300,4247
	  `-tfork(4305),4302,4247
	      `-dnsupdate[maste,4305,4247

Now let's find out what exactly happens when samba-ad-dc is stopped.

In samba-4.18.3-1A~5.0.0.202310041246 (5.0-4 errata877) the stop-branch of /etc/init.d/samba-ad-dc looks like this:

Line number  Contents
----------------------------------------------------------------------------------------------------
 76	  (stop)
 77		  log_daemon_msg "Stopping $DESC" $NAME
 78		  ## sometimes samba takes a long time to terminate,
 79		  ## which would make starting new samba processes fail.
 80		  start-stop-daemon --stop --quiet --pidfile $PIDFILE \
 81			  --name samba --retry 'TERM/15/KILL/1' -v \
 82			  | sed -rn 's/.*, (retry #|refused to die)/\1/p' \
 83			  | while read line; do log_action_cont_msg "$line"; done
 84		  ret="$?"
 85		  ## check for smbd and winbindd as well, in case ADDS has just been configured
 86		  for service in smbd winbindd; do
 87			  pid=$(pgrep -x "$service")
 88			  if [ -n "$pid" ]; then
 89				  start-stop-daemon --stop --quiet --oknodo \
 90					  --name "$service" --retry "TERM/3/KILL/1" -v \
 91					  | sed -rn 's/.*, (retry #|refused to die)/\1/p' \
 92					  | while read line; do log_action_cont_msg "$line"; done
 93			  fi
 94		  done
 95		  ## Check again for /usr/sbin/samba
 96		  pgrep_output="$(pgrep -f /usr/sbin/samba)"
 97		  if [ -n "$pgrep_output" ]; then
 98			  {
 99			  echo -e "ERROR: Stuck process after service stop:\n$pgrep_output"
100			  echo "PIDFILE: $(<$PIDFILE)"
101			  samba-tool processes
102			  } | logger -p daemon.error -t SAMBA
103			  pkill -9 -f /usr/sbin/samba
104		  fi
105		  pkill samba-dcerpcd
106		  pkill samba-bgqd
107		  /etc/init.d/nmbd stop
108		  log_end_msg "$ret"
109		  ;;

Lines 80 and 81:

The command "start-stop-daemon --stop --quiet --pidfile $PIDFILE --name samba --retry 'TERM/15/KILL/1' -v" does TERM/KILL a process with the command name "samba" (/proc/pid/comm), whose pid must be the one given in $PIDFILE. These conditions _only_ apply to the samba process group leader (here pid 4247), but not to the child processes. So the child processes will receive their TERM signal propagated from the samba process group leader and take their time to exit, but will not be waited for or killed by the start-stop-daemon command.

To see for yourself: Stop one child process, call "start-stop-daemon --stop --pidfile /run/samba/samba.pid --name samba --retry 'TERM/15/KILL/1' -v" and find the before stopped process still in the control group as is was not killed by the start-stop-daemon command.

	root@srv-dc02:~# pkill -STOP "rpc\(0\)"

	root@srv-dc02:~# start-stop-daemon --stop --pidfile /run/samba/samba.pid --name samba --retry 'TERM/15/KILL/1' -v
	Stopped samba (pid 4247).

	root@srv-dc02:~# systemd-cgls /system.slice/samba-ad-dc.service
	Control group /system.slice/samba-ad-dc.service:
	|-4275 samba: tfork waiter process(4277)
	`-4277 samba: task[rpc] pre-forked worker(0)

Conclusion: This start-stop-command will not wait for or kill the process "samba: task[rpc] pre-forked worker(0)" aka "rpc(0)" (here with PID 4277) which is listening on the network socket mentioned by the error "Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED".

By the way: We need to assume that some of those child processes may be still in the process of terminating themselfs after the process group leader has already exited.

To see for yourself: Trace the system calls which involve process management while stopping samba-ad-dc.

	root@ucs-2200:~# unset pids; while read foo; do pids+="-p $foo "; done <<<$(ps -o pid= -g $(pgrep --exact samba$); pgrep --exact smbd$; pgrep  --exact winbindd$)
	root@ucs-2200:~# strace -tt -e trace=process $pids &
	root@ucs-2200:~# /etc/init.d/samba-ad-dc stop

(For complete output see attachment "strace_stopping_samba-ad-dc.log".)

Here the samba process group leader (PID 1511) did exit 2 seconds before "samba: task[rpc] pre-forked worker(0)" aka "rpc(0)" (PID 1682):

	# Extract from strace run
	[pid  1511] 14:49:35.919626 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=10061, si_uid=0} ---
	[pid  1511] 14:49:36.530918 +++ exited with 127 +++
	[pid  1682] 14:49:38.532489 +++ exited with 127 +++

Lines 82 to 84 and line 108:

The variable assignment ret="$?" in line 84 will always become a ret="0", because the exit status of the pipeline in lines 80 to 83 is the exit status of the last command executed in the pipeline. As there will be always output from the "start-stop-daemon" command (because of the -v option), the exit status of the pipeline is either the exit status of the "sed" command or the exit status of the "log_action_cont_msg" command, which both will predictably be zero (compare section "Pipelines" of "dash" manual).
This way in line 108 the stop-branch will always exit successfully with log_end_msg "0".

Lines 85 to 94:

This code block was introduced in UCS 4.0.-0 (Bug 35319) (thanks Arvid Requate for providing the information) and is not needed anymore as smbd and winbindd are now child-processes of the main samba process tree and will receive their TERM signal propagated from the samba process group leader.

Lines 95 to 104:

This will do absolutely nothing as the command "pgrep -f /usr/sbin/samba" in line 96 will never match, because samba-2:4.7.8-1A~4.3.0.20190402103 (UCS 4.3-4) was the last version with process names (/proc/pid/cmdline) like "/usr/sbin/samba". Since then samba process names look like "samba: task[rpc] pre-forked worker(0)" (samba-4.18.3-1A~5.0.0.202310041246) and the like.

Lines 105 to 107:

Obviously terminating samba-dcerpcd and samba-bgqd, if they are currently running, and stopping nmbd.


Summary conclusion of this analysis:

The stop-branch of /etc/init.d/samba-ad-dc does not wait for or kill the child-process occupying the network socket mentioned by the error "Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED".


5.) Patch to /etc/init.d/samba-ad-dc
    or: How to terminate and finally kill all processes when stopping samba-ad-dc

With the above analysis in mind we know that on service stop we need to make sure that all processes of the samba main process tree will be terminated respectively killed.

The patch achieves this like this:
Right after the "start-stop-daemon --stop" command, the patch repeatedly scans the system process table for the process group IDs of the processes that form the samba main process tree.
If after a specified waiting time any of these processes are still present in the system process table, a KILL signal is sent to the remaining processes (whose state,pid,args,etc. getting logged).
Only when a KILL signal had to be sent and the system couldn't complete the execution within a second waiting time /etc/init.d/samba-ad-dc will return 1, indicating failure to systemd.

The patch requires the main samba process tree to be present to be able to achieve its goal, which is the reason I also provided a possible native system unit file for the samba-ad-dc service, so that systemd will kill remaining processes after the "ExecStop" command calling "/etc/init.d/samba-ad-dc stop" has returned.
Comment 1 Christian Kowarzik 2023-12-14 14:17:10 CET
Created attachment 11163 [details]
system calls which involve process management while stopping samba-ad-dc
Comment 2 Christian Kowarzik 2023-12-14 14:18:41 CET
Created attachment 11164 [details]
patch to /etc/init.d/samba-ad-dc
Comment 3 Christian Kowarzik 2023-12-14 14:20:11 CET
Created attachment 11165 [details]
systemd service unit file for the samba-ad-dc service
Comment 4 Arvid Requate univentionstaff 2023-12-18 19:00:18 CET
Created issue in GitLab for this bug at: https://git.knut.univention.de/univention/ucs/-/issues/2027
Comment 7 Christina Scheinig univentionstaff 2024-02-27 16:28:09 CET
Two more customers are affected. This happens during automatic updates and server-password-change that samba is not coming back.
I got this:
----------------------------------------------------------------------------
Feb 26 01:07:58 UCS-EDU-DC systemd[1]: samba-ad-dc.service: Found left-over process 31104 (tfork(31106)) in control group while starting unit. Ignoring.
Feb 26 01:07:58 UCS-EDU-DC systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Feb 26 01:07:58 UCS-EDU-DC systemd[1]: samba-ad-dc.service: Found left-over process 31106 (s3fs[master]) in control group while starting unit. Ignoring.
Feb 26 01:07:58 UCS-EDU-DC systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Feb 26 01:07:58 UCS-EDU-DC systemd[1]: samba-ad-dc.service: Found left-over process 31107 (tfork(31110)) in control group while starting unit. Ignoring.
Feb 26 01:07:58 UCS-EDU-DC systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Feb 26 01:07:58 UCS-EDU-DC systemd[1]: samba-ad-dc.service: Found left-over process 31110 (rpc[master]) in control group while starting unit. Ignoring.
Feb 26 01:07:58 UCS-EDU-DC systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Feb 26 01:07:58 UCS-EDU-DC systemd[1]: samba-ad-dc.service: Found left-over process 31113 (tfork(31115)) in control group while starting unit. Ignoring.
Feb 26 01:07:58 UCS-EDU-DC systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Feb 26 01:07:58 UCS-EDU-DC systemd[1]: samba-ad-dc.service: Found left-over process 31115 (wrepl[master]) in control group while starting unit. Ignoring.
----------------------------------------------------------------------------
caused by a server-password-change
and this:


[2024/02/24 02:10:20.261804,  0, pid=42235] ../../source4/samba/server.c:623(binary_smbd_main)
  samba version 4.18.3-Univention started.
  Copyright Andrew Tridgell and the Samba Team 1992-2023
[2024/02/24 02:10:20.621296,  0, pid=42236] ../../source4/samba/server.c:896(binary_smbd_main)
  binary_smbd_main: samba: using 'prefork' process model
[2024/02/24 02:10:20.780697,  0, pid=42317] ../../source4/samba/service_stream.c:373(stream_setup_socket)
  stream_setup_socket: Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
[2024/02/24 02:10:20.780747,  0, pid=42317] ../../source4/rpc_server/dcerpc_server.c:513(add_socket_rpc_tcp_iface)
  service_setup_stream_socket(address=::1,port=135) for epmapper mgmt failed - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
[2024/02/24 02:10:20.780755,  0, pid=42317] ../../source4/samba/service_task.c:36(task_server_terminate)
  task_server_terminate: task_server_terminate: [dcerpc: Failed to initialise end points]
[2024/02/24 02:10:20.781831,  0, pid=42236] ../../source4/samba/server.c:392(samba_terminate)
  samba_terminate: samba_terminate of samba 42236: dcerpc: Failed to initialise end points
----------------------------------------------------------------------------
and after upgrade same  picture
Upgrade: univention-samba4:amd64 (9.0.15-1, 9.0.15-2), python3-univention-directory-manager-cli:amd64 (15.0.25-5, 15.0.25-6), univention-directory-manager-tools:amd64 (15.0.25-5, 15.0.25-6), python3-uni
vention-directory-manager:amd64 (15.0.25-5, 15.0.25-6), python-univention-directory-manager:amd64 (15.0.25-5, 15.0.25-6), univention-errata-level:amd64 (5.0.6-959, 5.0.6-962), python3-univention-directo
ry-manager-rest:amd64 (10.0.7-8, 10.0.7-9), univention-directory-manager-rest:amd64 (10.0.7-8, 10.0.7-9), python3-univention-directory-manager-rest-client:amd64 (10.0.7-8, 10.0.7-9), python-univention-d
irectory-manager-cli:amd64 (15.0.25-5, 15.0.25-6), univention-samba4-sysvol-sync:amd64 (9.0.15-1, 9.0.15-2)

syslog
-----------------------------------------------------------------------------
Feb 24 02:10:18 H77-UcDC02 samba-ad-dc[41977]: Stopping Samba AD DC server: sambaStopping nmbd (via systemctl): nmbd.service.
Feb 24 02:10:18 H77-UcDC02 samba-ad-dc[41977]: .
Feb 24 02:10:18 H77-UcDC02 systemd[1]: samba-ad-dc.service: Succeeded.
Feb 24 02:10:18 H77-UcDC02 systemd[1]: Stopped LSB: Samba daemons for the AD DC.
Feb 24 02:10:19 H77-UcDC02 systemd[1]: nmbd.service: Found left-over process 1844 (nmbd) in control group while starting unit. Ignoring.
Feb 24 02:10:19 H77-UcDC02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Feb 24 02:10:19 H77-UcDC02 systemd[1]: Starting LSB: Samba NetBIOS nameserver (nmbd)...
Feb 24 02:10:19 H77-UcDC02 nmbd[42194]: Starting NetBIOS name server: nmbd.
Feb 24 02:10:19 H77-UcDC02 systemd[1]: Started LSB: Samba NetBIOS nameserver (nmbd).
Feb 24 02:10:19 H77-UcDC02 systemd[1]: samba-ad-dc.service: Found left-over process 2106 (tfork(2110)) in control group while starting unit. Ignoring.
Feb 24 02:10:19 H77-UcDC02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Feb 24 02:10:19 H77-UcDC02 systemd[1]: samba-ad-dc.service: Found left-over process 2110 (rpc(0)) in control group while starting unit. Ignoring.
Feb 24 02:10:19 H77-UcDC02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Feb 24 02:10:19 H77-UcDC02 systemd[1]: samba-ad-dc.service: Found left-over process 2114 (tfork(2115)) in control group while starting unit. Ignoring.
Feb 24 02:10:19 H77-UcDC02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Feb 24 02:10:19 H77-UcDC02 systemd[1]: samba-ad-dc.service: Found left-over process 2115 (rpc(1)) in control group while starting unit. Ignoring.
Feb 24 02:10:19 H77-UcDC02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Feb 24 02:10:19 H77-UcDC02 systemd[1]: Starting LSB: Samba daemons for the AD DC...
-------------------------------------------------------------------------------
Comment 8 Arvid Requate univentionstaff 2024-03-04 18:51:27 CET
Ok, I adjusted 15_samba4_stop.patch in ucs-patches,
following the suggestion of Comment 2 with minor adjustments.

899721da8 | Improve robustness of samba-ad-dc stop

Package rebuilt as:

Package: samba
Version: 2:4.18.3-1A~5.0.0.202403041818
Branch: ucs_5.0-0
Scope: errata5.0-6

07cbbe7a6b | Advisory

As discussed I didn't include the systemd service unit as systemd
already generates /run/systemd/generator.late/samba-ad-dc.service
and samba-ad-dc alone doesn't solve the broader Bug 44237.
Comment 9 Arvid Requate univentionstaff 2024-03-05 13:21:24 CET
QA showed that `pgrep --exact samba` cannot be used as it also matches /etc/init.d/samba.
I replaced that by `pgrep -F "$PIDFILE"`.

2657bb24f | fixup

Package: samba
Version: 2:4.18.3-1A~5.0.0.202403051313
Branch: ucs_5.0-0
Scope: errata5.0-6

06373a32c3 | Advisory update
Comment 10 Arvid Requate univentionstaff 2024-03-05 15:48:33 CET
*** Bug 55487 has been marked as a duplicate of this bug. ***
Comment 11 Julia Bremer univentionstaff 2024-03-06 10:27:03 CET
I was not able to reproduce the original problem, so I "faked" the problem by attachine gdb to one of the subprocesses of samba
.

OK: Repeated restart of samba with hanging processes
OK: Hung processes are killed after a timeout
OK: The actually correct subprocesses are killed. No more, no less
OK: Repeated restart of samba in a loop while running samba tests showed no problem
OK: Server password change in a loop
OK: Package update
OK: Jenkins
Verified