Univention Bugzilla – Bug 56914
Samba (AD DC) restart may fail due to network sockets still being occupied by left-over processes - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
Last modified: 2024-03-07 13:07:31 CET
Table of contents: 1.) System-Info 2.) Abstract 3.) Log extracts from selected incidents of error "Failed to listen on ::1:135" and related cause for Samba (AD DC) restarts 4.) Analysis of the stop-branch of /etc/init.d/samba-ad-dc or: Why processes may be left-over after stopping samba-ad-dc 5.) Patch to /etc/init.d/samba-ad-dc or: How to terminate and finally kill all processes when stopping samba-ad-dc 1.) System-Info UCS: 5.0-5 errata897 Installed: dhcp-server=12.0 prometheus-node-exporter=2.0.1 samba4=4.16 samba-4.18.3-1A~5.0.0.202311301306 univention-samba4-9.0.14-5 2.) Abstract On a client system, restarting Samba (AD DC) failed on several occasions with error "Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED". These Samba (AD DC) restarts were triggered by server_password_change or postinst-scripts during package updates (univention-samba4, samba). We know that as part of the Samba (AD DC) restart sequence, samba-ad-dc is stopped and subsequently started again. However, it turned out that the call "/etc/init.d/samba-ad-dc stop" returns without ensuring that all processes have terminated successfully. This way, the subsequent call of "/etc/init.d/samba-ad-dc start" may fail due to left-over processes that still occupy network sockets. Waiting some seconds between stopping and starting samba-ad-dc does not solve the problem, as the time needed for freeing the occupied network sockets may exceed the waiting time. Also systemd will not kill left-over processes after samba-ad-dc stop, as the unit file for the samba-ad-dc service is generated by systemd-sysv-generator with hard-coded "KillMode=process" (see "systemctl cat samba-ad-dc"). This instructs systemd to kill only the main process (property "MainPID") on service stop - but here no MainPID exists, as /usr/sbin/samba is not forked off systemd directly. One solution would be to provide a native systemd unit file for the samba-ad-dc service with the default Option "KillMode=control-group". This will instruct systemd on service stop to terminate and finally kill all remaining processes in the control group after whatever "ExecStop" command - which should asks the service to terminate and wait for it to do so - has returned. Here I will show what exactly is happening during the call "/etc/init.d/samba-ad-dc stop" and why it does not ensure that all processes have exited successfully before returning. Then I will provide a possible patch to /etc/init.d/samba-ad-dc which ensures that when stopping samba-ad-dc all processes of the samba main process tree will get a specified time to exit before killing remaining processes and only then continuing - as before - with terminating additional processes from outside the main samba process tree (samba-dcerpcd, samba-bgqd). Additionally I will provide a possible native systemd unit file (adapted from upstream) which you might want to use together with the patched /etc/init.d/samba-ad-dc so that systemd will ensure that on service stop all remaining processes in the control-group will be killed after the "ExecStop" command has returned (Side effects: When samba-ad-dc service is using a native systemd unit file, you might want to adjust samba.postinst, as this uses "invoke-rc.d --skip-systemd-native $service".) Please find the patch, the patched version of samba-ad-dc, as well as the native systemd unit file for the samba-ad-dc service in the attachments. ===== TL;DR ===== 3.) Log extracts from selected incidents of error "Failed to listen on ::1:135" and related cause for Samba (AD DC) restarts 2023-03-30: Samba (AD DC) restart failed after update to samba-2:4.16.8-1A~5.0.0.202303221744 # /var/log/dpkg.log 2023-03-30 02:04:08 status installed samba:amd64 2:4.16.8-1A~5.0.0.202303221744 # journalctl Mär 30 02:03:57 srv-dc02 systemd[1]: Stopped LSB: Samba daemons for the AD DC. Mär 30 02:03:57 srv-dc02 systemd[1]: samba-ad-dc.service: Found left-over process 19025 (tfork(19031)) in control group while starting unit. Ignoring. Mär 30 02:03:57 srv-dc02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mär 30 02:03:57 srv-dc02 systemd[1]: samba-ad-dc.service: Found left-over process 19031 (rpc(0)) in control group while starting unit. Ignoring. Mär 30 02:03:57 srv-dc02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mär 30 02:03:57 srv-dc02 systemd[1]: Starting LSB: Samba daemons for the AD DC... # /var/log/samba/log.samba [2023/03/30 02:03:58.728147, 0, pid=5369] ../../source4/samba/service_stream.c:373(stream_setup_socket) stream_setup_socket: Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED [2023/03/30 02:03:58.728236, 0, pid=5369] ../../source4/rpc_server/dcerpc_server.c:511(add_socket_rpc_tcp_iface) service_setup_stream_socket(address=::1,port=135) for epmapper mgmt failed - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED [2023/03/30 02:03:58.728255, 0, pid=5369] ../../source4/samba/service_task.c:36(task_server_terminate) task_server_terminate: task_server_terminate: [dcerpc: Failed to initialise end points] [2023/03/30 02:03:58.782804, 0, pid=5359] ../../source4/samba/server.c:392(samba_terminate) samba_terminate: samba_terminate of samba 5359: dcerpc: Failed to initialise end points 2023-08-10: Samba (AD DC) restart failed after update to univention-samba4-9.0.13-5 # /var/log/dpkg.log 2023-08-10 02:03:17 status installed univention-samba4:amd64 9.0.13-5 # journalctl Aug 10 02:03:14 srv-dc02 systemd[1]: Stopped LSB: Samba daemons for the AD DC. Aug 10 02:03:16 srv-dc02 systemd[1]: samba-ad-dc.service: Found left-over process 1641 (tfork(1642)) in control group while starting unit. Ignoring. Aug 10 02:03:16 srv-dc02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Aug 10 02:03:16 srv-dc02 systemd[1]: samba-ad-dc.service: Found left-over process 1642 (rpc(0)) in control group while starting unit. Ignoring. Aug 10 02:03:16 srv-dc02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Aug 10 02:03:16 srv-dc02 systemd[1]: Starting LSB: Samba daemons for the AD DC... # /var/log/samba/log.samba [2023/08/10 02:03:17.163674, 0, pid=7803] ../../source4/samba/service_stream.c:373(stream_setup_socket) stream_setup_socket: Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED [2023/08/10 02:03:17.163734, 0, pid=7803] ../../source4/rpc_server/dcerpc_server.c:513(add_socket_rpc_tcp_iface) service_setup_stream_socket(address=::1,port=135) for epmapper mgmt failed - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED [2023/08/10 02:03:17.163746, 0, pid=7803] ../../source4/samba/service_task.c:36(task_server_terminate) task_server_terminate: task_server_terminate: [dcerpc: Failed to initialise end points] [2023/08/10 02:03:17.165269, 0, pid=7750] ../../source4/samba/server.c:392(samba_terminate) samba_terminate: samba_terminate of samba 7750: dcerpc: Failed to initialise end points 2023-09-12: Samba (AD DC) restart failed after server_password_change # journalctl Sep 12 01:00:01 srv-dc02 CRON[23674]: (root) CMD (/usr/sbin/jitter 600 /usr/lib/univention-server/server_password_change) # journalctl Sep 12 01:04:09 srv-dc02 systemd[1]: Stopped LSB: Samba daemons for the AD DC. Sep 12 01:04:14 srv-dc02 systemd[1]: samba-ad-dc.service: Found left-over process 6166 (tfork(6167)) in control group while starting unit. Ignoring. Sep 12 01:04:14 srv-dc02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Sep 12 01:04:14 srv-dc02 systemd[1]: samba-ad-dc.service: Found left-over process 6167 (rpc(0)) in control group while starting unit. Ignoring. Sep 12 01:04:14 srv-dc02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Sep 12 01:04:14 srv-dc02 systemd[1]: Starting LSB: Samba daemons for the AD DC... # /var/log/samba/log.samba [2023/09/12 01:04:15.721242, 0, pid=24160] ../../source4/samba/service_stream.c:373(stream_setup_socket) stream_setup_socket: Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED [2023/09/12 01:04:15.722810, 0, pid=24160] ../../source4/rpc_server/dcerpc_server.c:513(add_socket_rpc_tcp_iface) service_setup_stream_socket(address=::1,port=135) for epmapper mgmt failed - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED [2023/09/12 01:04:15.722825, 0, pid=24160] ../../source4/samba/service_task.c:36(task_server_terminate) task_server_terminate: task_server_terminate: [dcerpc: Failed to initialise end points] [2023/09/12 01:04:15.726274, 0, pid=24123] ../../source4/samba/server.c:392(samba_terminate) samba_terminate: samba_terminate of samba 24123: dcerpc: Failed to initialise end points 4.) Analysis of the stop-branch of /etc/init.d/samba-ad-dc or: Why processes may be left-over after stopping samba-ad-dc. First lets find out which process is occupying the network socket mentioned by the error "Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED". root@srv-dc02:~# netstat --numeric --listening --program --inet6 | sed --quiet '1,2p; /::1:135/p' Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp6 0 0 ::1:135 :::* LISTEN 4277/samba: task[rp We see, it's the process "samba: task[rpc] pre-forked worker(0)" (/proc/pid/cmdline) aka "rpc(0)" (/proc/pid/comm), here with PID 4277. Then lets see how the main samba process tree looks like and realize that the process with PID 4277 is a child-process of the samba process tree. root@srv-dc02:~# pstree --long --arguments --show-pids --show-pgids $(pgrep --oldest samba$) samba,4247,4247 |-tfork(4266),4265,4247 | `-s3fs[master],4266,4247 | `-tfork(4272),4268,4247 | `-smbd,4272,4272 -D --option=server role check:inhibit=yes --foreground | |-smbd-cleanupd,4314,4272 | `-smbd-notifyd,4313,4272 . |-tfork(4269),4267,4247 | `-rpc[master],4269,4247 | |-tfork(4277),4275,4247 | | `-rpc(0),4277,4247 <=====<=====<=====<= PID 4277 ============== | |-tfork(4281),4279,4247 | | `-rpc(1),4281,4247 | |-tfork(4285),4282,4247 | | `-rpc(2),4285,4247 | `-tfork(4290),4286,4247 | `-rpc(3),4290,4247 |-tfork(4271),4270,4247 | `-wrepl[master],4271,4247 |-tfork(4274),4273,4247 | `-ldap[master],4274,4247 | |-tfork(4316),4315,4247 | | `-ldap(0),4316,4247 | |-tfork(4318),4317,4247 | | `-ldap(1),4318,4247 | |-tfork(4320),4319,4247 | | `-ldap(2),4320,4247 | `-tfork(4322),4321,4247 | `-ldap(3),4322,4247 |-tfork(4278),4276,4247 | `-cldap[master],4278,4247 |-tfork(4283),4280,4247 | `-kdc[master],4283,4247 | |-tfork(4291),4287,4247 | | `-kdc(0),4291,4247 | |-tfork(4297),4292,4247 | | `-kdc(1),4297,4247 | |-tfork(4303),4299,4247 | | `-kdc(2),4303,4247 | `-tfork(4306),4304,4247 | `-kdc(3),4306,4247 |-tfork(4288),4284,4247 | `-drepl[master],4288,4247 |-tfork(4293),4289,4247 | `-winbindd[master,4293,4247 | `-tfork(4301),4295,4247 | `-winbindd,4301,4301 -D --option=server role check:inhibit=yes --foreground | `-wb[TAS-GL],4323,4301 |-tfork(4296),4294,4247 | `-ntp_signd[maste,4296,4247 |-tfork(4300),4298,4247 | `-kcc[master],4300,4247 `-tfork(4305),4302,4247 `-dnsupdate[maste,4305,4247 Now let's find out what exactly happens when samba-ad-dc is stopped. In samba-4.18.3-1A~5.0.0.202310041246 (5.0-4 errata877) the stop-branch of /etc/init.d/samba-ad-dc looks like this: Line number Contents ---------------------------------------------------------------------------------------------------- 76 (stop) 77 log_daemon_msg "Stopping $DESC" $NAME 78 ## sometimes samba takes a long time to terminate, 79 ## which would make starting new samba processes fail. 80 start-stop-daemon --stop --quiet --pidfile $PIDFILE \ 81 --name samba --retry 'TERM/15/KILL/1' -v \ 82 | sed -rn 's/.*, (retry #|refused to die)/\1/p' \ 83 | while read line; do log_action_cont_msg "$line"; done 84 ret="$?" 85 ## check for smbd and winbindd as well, in case ADDS has just been configured 86 for service in smbd winbindd; do 87 pid=$(pgrep -x "$service") 88 if [ -n "$pid" ]; then 89 start-stop-daemon --stop --quiet --oknodo \ 90 --name "$service" --retry "TERM/3/KILL/1" -v \ 91 | sed -rn 's/.*, (retry #|refused to die)/\1/p' \ 92 | while read line; do log_action_cont_msg "$line"; done 93 fi 94 done 95 ## Check again for /usr/sbin/samba 96 pgrep_output="$(pgrep -f /usr/sbin/samba)" 97 if [ -n "$pgrep_output" ]; then 98 { 99 echo -e "ERROR: Stuck process after service stop:\n$pgrep_output" 100 echo "PIDFILE: $(<$PIDFILE)" 101 samba-tool processes 102 } | logger -p daemon.error -t SAMBA 103 pkill -9 -f /usr/sbin/samba 104 fi 105 pkill samba-dcerpcd 106 pkill samba-bgqd 107 /etc/init.d/nmbd stop 108 log_end_msg "$ret" 109 ;; Lines 80 and 81: The command "start-stop-daemon --stop --quiet --pidfile $PIDFILE --name samba --retry 'TERM/15/KILL/1' -v" does TERM/KILL a process with the command name "samba" (/proc/pid/comm), whose pid must be the one given in $PIDFILE. These conditions _only_ apply to the samba process group leader (here pid 4247), but not to the child processes. So the child processes will receive their TERM signal propagated from the samba process group leader and take their time to exit, but will not be waited for or killed by the start-stop-daemon command. To see for yourself: Stop one child process, call "start-stop-daemon --stop --pidfile /run/samba/samba.pid --name samba --retry 'TERM/15/KILL/1' -v" and find the before stopped process still in the control group as is was not killed by the start-stop-daemon command. root@srv-dc02:~# pkill -STOP "rpc\(0\)" root@srv-dc02:~# start-stop-daemon --stop --pidfile /run/samba/samba.pid --name samba --retry 'TERM/15/KILL/1' -v Stopped samba (pid 4247). root@srv-dc02:~# systemd-cgls /system.slice/samba-ad-dc.service Control group /system.slice/samba-ad-dc.service: |-4275 samba: tfork waiter process(4277) `-4277 samba: task[rpc] pre-forked worker(0) Conclusion: This start-stop-command will not wait for or kill the process "samba: task[rpc] pre-forked worker(0)" aka "rpc(0)" (here with PID 4277) which is listening on the network socket mentioned by the error "Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED". By the way: We need to assume that some of those child processes may be still in the process of terminating themselfs after the process group leader has already exited. To see for yourself: Trace the system calls which involve process management while stopping samba-ad-dc. root@ucs-2200:~# unset pids; while read foo; do pids+="-p $foo "; done <<<$(ps -o pid= -g $(pgrep --exact samba$); pgrep --exact smbd$; pgrep --exact winbindd$) root@ucs-2200:~# strace -tt -e trace=process $pids & root@ucs-2200:~# /etc/init.d/samba-ad-dc stop (For complete output see attachment "strace_stopping_samba-ad-dc.log".) Here the samba process group leader (PID 1511) did exit 2 seconds before "samba: task[rpc] pre-forked worker(0)" aka "rpc(0)" (PID 1682): # Extract from strace run [pid 1511] 14:49:35.919626 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=10061, si_uid=0} --- [pid 1511] 14:49:36.530918 +++ exited with 127 +++ [pid 1682] 14:49:38.532489 +++ exited with 127 +++ Lines 82 to 84 and line 108: The variable assignment ret="$?" in line 84 will always become a ret="0", because the exit status of the pipeline in lines 80 to 83 is the exit status of the last command executed in the pipeline. As there will be always output from the "start-stop-daemon" command (because of the -v option), the exit status of the pipeline is either the exit status of the "sed" command or the exit status of the "log_action_cont_msg" command, which both will predictably be zero (compare section "Pipelines" of "dash" manual). This way in line 108 the stop-branch will always exit successfully with log_end_msg "0". Lines 85 to 94: This code block was introduced in UCS 4.0.-0 (Bug 35319) (thanks Arvid Requate for providing the information) and is not needed anymore as smbd and winbindd are now child-processes of the main samba process tree and will receive their TERM signal propagated from the samba process group leader. Lines 95 to 104: This will do absolutely nothing as the command "pgrep -f /usr/sbin/samba" in line 96 will never match, because samba-2:4.7.8-1A~4.3.0.20190402103 (UCS 4.3-4) was the last version with process names (/proc/pid/cmdline) like "/usr/sbin/samba". Since then samba process names look like "samba: task[rpc] pre-forked worker(0)" (samba-4.18.3-1A~5.0.0.202310041246) and the like. Lines 105 to 107: Obviously terminating samba-dcerpcd and samba-bgqd, if they are currently running, and stopping nmbd. Summary conclusion of this analysis: The stop-branch of /etc/init.d/samba-ad-dc does not wait for or kill the child-process occupying the network socket mentioned by the error "Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED". 5.) Patch to /etc/init.d/samba-ad-dc or: How to terminate and finally kill all processes when stopping samba-ad-dc With the above analysis in mind we know that on service stop we need to make sure that all processes of the samba main process tree will be terminated respectively killed. The patch achieves this like this: Right after the "start-stop-daemon --stop" command, the patch repeatedly scans the system process table for the process group IDs of the processes that form the samba main process tree. If after a specified waiting time any of these processes are still present in the system process table, a KILL signal is sent to the remaining processes (whose state,pid,args,etc. getting logged). Only when a KILL signal had to be sent and the system couldn't complete the execution within a second waiting time /etc/init.d/samba-ad-dc will return 1, indicating failure to systemd. The patch requires the main samba process tree to be present to be able to achieve its goal, which is the reason I also provided a possible native system unit file for the samba-ad-dc service, so that systemd will kill remaining processes after the "ExecStop" command calling "/etc/init.d/samba-ad-dc stop" has returned.
Created attachment 11163 [details] system calls which involve process management while stopping samba-ad-dc
Created attachment 11164 [details] patch to /etc/init.d/samba-ad-dc
Created attachment 11165 [details] systemd service unit file for the samba-ad-dc service
Created issue in GitLab for this bug at: https://git.knut.univention.de/univention/ucs/-/issues/2027
Two more customers are affected. This happens during automatic updates and server-password-change that samba is not coming back. I got this: ---------------------------------------------------------------------------- Feb 26 01:07:58 UCS-EDU-DC systemd[1]: samba-ad-dc.service: Found left-over process 31104 (tfork(31106)) in control group while starting unit. Ignoring. Feb 26 01:07:58 UCS-EDU-DC systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 26 01:07:58 UCS-EDU-DC systemd[1]: samba-ad-dc.service: Found left-over process 31106 (s3fs[master]) in control group while starting unit. Ignoring. Feb 26 01:07:58 UCS-EDU-DC systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 26 01:07:58 UCS-EDU-DC systemd[1]: samba-ad-dc.service: Found left-over process 31107 (tfork(31110)) in control group while starting unit. Ignoring. Feb 26 01:07:58 UCS-EDU-DC systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 26 01:07:58 UCS-EDU-DC systemd[1]: samba-ad-dc.service: Found left-over process 31110 (rpc[master]) in control group while starting unit. Ignoring. Feb 26 01:07:58 UCS-EDU-DC systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 26 01:07:58 UCS-EDU-DC systemd[1]: samba-ad-dc.service: Found left-over process 31113 (tfork(31115)) in control group while starting unit. Ignoring. Feb 26 01:07:58 UCS-EDU-DC systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 26 01:07:58 UCS-EDU-DC systemd[1]: samba-ad-dc.service: Found left-over process 31115 (wrepl[master]) in control group while starting unit. Ignoring. ---------------------------------------------------------------------------- caused by a server-password-change and this: [2024/02/24 02:10:20.261804, 0, pid=42235] ../../source4/samba/server.c:623(binary_smbd_main) samba version 4.18.3-Univention started. Copyright Andrew Tridgell and the Samba Team 1992-2023 [2024/02/24 02:10:20.621296, 0, pid=42236] ../../source4/samba/server.c:896(binary_smbd_main) binary_smbd_main: samba: using 'prefork' process model [2024/02/24 02:10:20.780697, 0, pid=42317] ../../source4/samba/service_stream.c:373(stream_setup_socket) stream_setup_socket: Failed to listen on ::1:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED [2024/02/24 02:10:20.780747, 0, pid=42317] ../../source4/rpc_server/dcerpc_server.c:513(add_socket_rpc_tcp_iface) service_setup_stream_socket(address=::1,port=135) for epmapper mgmt failed - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED [2024/02/24 02:10:20.780755, 0, pid=42317] ../../source4/samba/service_task.c:36(task_server_terminate) task_server_terminate: task_server_terminate: [dcerpc: Failed to initialise end points] [2024/02/24 02:10:20.781831, 0, pid=42236] ../../source4/samba/server.c:392(samba_terminate) samba_terminate: samba_terminate of samba 42236: dcerpc: Failed to initialise end points ---------------------------------------------------------------------------- and after upgrade same picture Upgrade: univention-samba4:amd64 (9.0.15-1, 9.0.15-2), python3-univention-directory-manager-cli:amd64 (15.0.25-5, 15.0.25-6), univention-directory-manager-tools:amd64 (15.0.25-5, 15.0.25-6), python3-uni vention-directory-manager:amd64 (15.0.25-5, 15.0.25-6), python-univention-directory-manager:amd64 (15.0.25-5, 15.0.25-6), univention-errata-level:amd64 (5.0.6-959, 5.0.6-962), python3-univention-directo ry-manager-rest:amd64 (10.0.7-8, 10.0.7-9), univention-directory-manager-rest:amd64 (10.0.7-8, 10.0.7-9), python3-univention-directory-manager-rest-client:amd64 (10.0.7-8, 10.0.7-9), python-univention-d irectory-manager-cli:amd64 (15.0.25-5, 15.0.25-6), univention-samba4-sysvol-sync:amd64 (9.0.15-1, 9.0.15-2) syslog ----------------------------------------------------------------------------- Feb 24 02:10:18 H77-UcDC02 samba-ad-dc[41977]: Stopping Samba AD DC server: sambaStopping nmbd (via systemctl): nmbd.service. Feb 24 02:10:18 H77-UcDC02 samba-ad-dc[41977]: . Feb 24 02:10:18 H77-UcDC02 systemd[1]: samba-ad-dc.service: Succeeded. Feb 24 02:10:18 H77-UcDC02 systemd[1]: Stopped LSB: Samba daemons for the AD DC. Feb 24 02:10:19 H77-UcDC02 systemd[1]: nmbd.service: Found left-over process 1844 (nmbd) in control group while starting unit. Ignoring. Feb 24 02:10:19 H77-UcDC02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 24 02:10:19 H77-UcDC02 systemd[1]: Starting LSB: Samba NetBIOS nameserver (nmbd)... Feb 24 02:10:19 H77-UcDC02 nmbd[42194]: Starting NetBIOS name server: nmbd. Feb 24 02:10:19 H77-UcDC02 systemd[1]: Started LSB: Samba NetBIOS nameserver (nmbd). Feb 24 02:10:19 H77-UcDC02 systemd[1]: samba-ad-dc.service: Found left-over process 2106 (tfork(2110)) in control group while starting unit. Ignoring. Feb 24 02:10:19 H77-UcDC02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 24 02:10:19 H77-UcDC02 systemd[1]: samba-ad-dc.service: Found left-over process 2110 (rpc(0)) in control group while starting unit. Ignoring. Feb 24 02:10:19 H77-UcDC02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 24 02:10:19 H77-UcDC02 systemd[1]: samba-ad-dc.service: Found left-over process 2114 (tfork(2115)) in control group while starting unit. Ignoring. Feb 24 02:10:19 H77-UcDC02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 24 02:10:19 H77-UcDC02 systemd[1]: samba-ad-dc.service: Found left-over process 2115 (rpc(1)) in control group while starting unit. Ignoring. Feb 24 02:10:19 H77-UcDC02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 24 02:10:19 H77-UcDC02 systemd[1]: Starting LSB: Samba daemons for the AD DC... -------------------------------------------------------------------------------
Ok, I adjusted 15_samba4_stop.patch in ucs-patches, following the suggestion of Comment 2 with minor adjustments. 899721da8 | Improve robustness of samba-ad-dc stop Package rebuilt as: Package: samba Version: 2:4.18.3-1A~5.0.0.202403041818 Branch: ucs_5.0-0 Scope: errata5.0-6 07cbbe7a6b | Advisory As discussed I didn't include the systemd service unit as systemd already generates /run/systemd/generator.late/samba-ad-dc.service and samba-ad-dc alone doesn't solve the broader Bug 44237.
QA showed that `pgrep --exact samba` cannot be used as it also matches /etc/init.d/samba. I replaced that by `pgrep -F "$PIDFILE"`. 2657bb24f | fixup Package: samba Version: 2:4.18.3-1A~5.0.0.202403051313 Branch: ucs_5.0-0 Scope: errata5.0-6 06373a32c3 | Advisory update
*** Bug 55487 has been marked as a duplicate of this bug. ***
I was not able to reproduce the original problem, so I "faked" the problem by attachine gdb to one of the subprocesses of samba . OK: Repeated restart of samba with hanging processes OK: Hung processes are killed after a timeout OK: The actually correct subprocesses are killed. No more, no less OK: Repeated restart of samba in a loop while running samba tests showed no problem OK: Server password change in a loop OK: Package update OK: Jenkins Verified
<https://errata.software-univention.de/#/?erratum=5.0x975>