Bug 57310 - Samba (AD DC) restart may fail due to network sockets still being occupied by left-over processes - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
Summary: Samba (AD DC) restart may fail due to network sockets still being occupied by...
Status: CLOSED FIXED
Alias: None
Product: UCS
Classification: Unclassified
Component: Samba4
Version: UCS 5.0
Hardware: Other Linux
: P5 normal
Target Milestone: UCS 5.0-9-errata
Assignee: Arvid Requate
QA Contact: Felix Botner
URL:
Keywords:
Depends on: 56914
Blocks:
  Show dependency treegraph
 
Reported: 2024-05-22 09:47 CEST by Julia Bremer
Modified: 2024-11-27 13:57 CET (History)
9 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.400
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Customer ID:
Max CVSS v3 score:
requate: Patch_Available+


Attachments
patch to /usr/lib/univention-install/96univention-samba4.inst (1.21 KB, patch)
2024-05-28 22:15 CEST, Christian Kowarzik
Details | Diff
patch to /etc/init.d/samba-ad-dc (883 bytes, patch)
2024-05-28 22:15 CEST, Christian Kowarzik
Details | Diff
revised patch to /etc/init.d/samba-ad-dc (927 bytes, patch)
2024-07-02 13:58 CEST, Christian Kowarzik
Details | Diff
revised patch to /usr/lib/univention-install/96univention-samba4.inst (1.15 KB, patch)
2024-07-02 14:04 CEST, Christian Kowarzik
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Julia Bremer univentionstaff 2024-05-22 09:47:16 CEST
+++ This bug was initially created as a clone of Bug #56914 +++

With Bug #56914 we added some logic to the samba init.d script, so that hanging processes are terminated during the stop of the server.

In our tests we saw the following:
The join of a replica server failed because the joinscript RUNNING 98univention-samba4-dns.inst was waiting for the service account dns-slave to become visible in its local Samba. 

This never happened, because the samba service was dead. We can see in the logs that the service was restarted, the code from Bug #56914 was run but still, the service couldn't start because of NT_STATUS_ADDRESS_ALREADY_ASSOCIATED.




journalctl.log 
May 22 02:56:51 slave samba-ad-dc[7716]: Stopping Samba AD DC server: samba
May 22 02:56:51 slave samba-ad-dc[7716]: Samba did not terminate in time. Killing remaining processes.
May 22 02:56:51 slave samba-ad-dc[7716]: S   PID  PGRP     TIME COMMAND.
May 22 02:56:51 slave samba-ad-dc[7716]: S  3376  3376 00:00:00 /usr/sbin/winbindd -D.
May 22 02:56:51 slave samba-ad-dc[7716]: S  3384  3376 00:00:00   winbindd: domain child [SLAVE].
May 22 02:56:51 slave samba-ad-dc[7716]: S  3627  3376 00:00:00   winbindd: idmap child.
May 22 02:56:51 slave SAMBA[7857]: ERROR: Stuck process after service stop:
May 22 02:56:51 slave SAMBA[7857]: S   PID  PGRP     TIME COMMAND
May 22 02:56:51 slave SAMBA[7857]: S  3376  3376 00:00:00 /usr/sbin/winbindd -D
May 22 02:56:51 slave SAMBA[7857]: S  3384  3376 00:00:00   winbindd: domain child [SLAVE]
May 22 02:56:51 slave SAMBA[7857]: S  3627  3376 00:00:00   winbindd: idmap child
May 22 02:56:51 slave SAMBA[7857]: PIDFILE: 7551
May 22 02:56:51 slave systemd[1]: Stopping LSB: Samba NetBIOS nameserver (nmbd)...
May 22 02:56:52 slave nmbd[7871]: Stopping NetBIOS name server: nmbd.
May 22 02:56:52 slave systemd[1]: nmbd.service: Succeeded.
May 22 02:56:52 slave samba-ad-dc[7716]: Stopping nmbd (via systemctl): nmbd.service.
May 22 02:56:52 slave systemd[1]: Stopped LSB: Samba NetBIOS nameserver (nmbd).
May 22 02:56:52 slave samba-ad-dc[7716]: .
May 22 02:56:52 slave systemd[1]: samba-ad-dc.service: Succeeded.
May 22 02:56:52 slave systemd[1]: Stopped LSB: Samba daemons for the AD DC.
May 22 02:56:53 slave systemd[1]: Starting LSB: Samba NetBIOS nameserver (nmbd)...
May 22 02:56:53 slave nmbd[7904]: Starting NetBIOS name server: nmbd.
May 22 02:56:53 slave systemd[1]: Started LSB: Samba NetBIOS nameserver (nmbd).
May 22 02:56:53 slave systemd[1]: Starting LSB: Samba daemons for the AD DC...
May 22 02:56:53 slave samba-ad-dc[7926]: Starting nmbd (via systemctl): nmbd.service.
May 22 02:56:54 slave samba-ad-dc[7926]: Starting Samba AD DC server: samba.
May 22 02:56:54 slave systemd[1]: Started LSB: Samba daemons for the AD DC.


log.samba:

[2024/05/22 02:56:55.227036,  0, pid=7962] ../../source4/samba/service_stream.c:373(stream_setup_socket)
  stream_setup_socket: Failed to listen on 127.0.0.1:49153 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
[2024/05/22 02:56:55.227130,  0, pid=7962] ../../source4/rpc_server/dcerpc_server.c:513(add_socket_rpc_tcp_iface)
  service_setup_stream_socket(address=127.0.0.1,port=49153) for dnsserver backupkey eventlog6 browser unixinfo dssetup drsuapi lsarpc mgmt failed - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
[2024/05/22 02:56:55.227158,  0, pid=7962] ../../source4/samba/service_task.c:36(task_server_terminate)
  task_server_terminate: task_server_terminate: [dcerpc: Failed to initialise end points]
Comment 2 Julia Bremer univentionstaff 2024-05-23 09:49:18 CEST
Happened again in the tests:

journald.log

May 23 00:16:14 master091 samba-ad-dc[20366]: Stopping Samba AD DC server: samba
May 23 00:16:14 master091 samba-ad-dc[20366]: Samba did not terminate in time. Killing remaining processes.
May 23 00:16:14 master091 samba-ad-dc[20366]: S   PID  PGRP     TIME COMMAND.
May 23 00:16:14 master091 samba-ad-dc[20366]: S  5699  5699 00:00:00 /usr/sbin/winbindd -D.
May 23 00:16:14 master091 samba-ad-dc[20366]: S  5700  5699 00:00:00   winbindd: domain child [MASTER091].
May 23 00:16:14 master091 samba-ad-dc[20366]: S  5941  5699 00:00:00   winbindd: idmap child.
May 23 00:16:14 master091 SAMBA[20605]: ERROR: Stuck process after service stop:
May 23 00:16:14 master091 SAMBA[20605]: S   PID  PGRP     TIME COMMAND
May 23 00:16:14 master091 SAMBA[20605]: S  5699  5699 00:00:00 /usr/sbin/winbindd -D
May 23 00:16:14 master091 SAMBA[20605]: S  5700  5699 00:00:00   winbindd: domain child [MASTER091]
May 23 00:16:14 master091 SAMBA[20605]: S  5941  5699 00:00:00   winbindd: idmap child
May 23 00:16:14 master091 SAMBA[20605]: PIDFILE: 20125
May 23 00:16:14 master091 systemd[1]: Stopping LSB: Samba NetBIOS nameserver (nmbd)...
May 23 00:16:14 master091 nmbd[20619]: Stopping NetBIOS name server: nmbd.
May 23 00:16:14 master091 systemd[1]: nmbd.service: Succeeded.
May 23 00:16:14 master091 samba-ad-dc[20366]: Stopping nmbd (via systemctl): nmbd.service.
May 23 00:16:14 master091 systemd[1]: Stopped LSB: Samba NetBIOS nameserver (nmbd).
May 23 00:16:14 master091 samba-ad-dc[20366]: .
May 23 00:16:14 master091 systemd[1]: samba-ad-dc.service: Succeeded.
May 23 00:16:14 master091 systemd[1]: Stopped LSB: Samba daemons for the AD DC.
May 23 00:16:15 master091 systemd[1]: Starting LSB: Samba NetBIOS nameserver (nmbd)...
May 23 00:16:16 master091 nmbd[20652]: Starting NetBIOS name server: nmbd.
May 23 00:16:16 master091 systemd[1]: Started LSB: Samba NetBIOS nameserver (nmbd).
May 23 00:16:16 master091 systemd[1]: Starting LSB: Samba daemons for the AD DC...
May 23 00:16:16 master091 samba-ad-dc[20676]: Starting nmbd (via systemctl): nmbd.service.
May 23 00:16:18 master091 samba-ad-dc[20676]: Starting Samba AD DC server: samba.
May 23 00:16:18 master091 systemd[1]: Started LSB: Samba daemons for the AD DC.


log.samba

[2024/05/23 00:16:18.189325,  0, pid=20695] ../../source4/samba/server.c:623(binary_smbd_main)
  samba version 4.18.3-Univention started.
  Copyright Andrew Tridgell and the Samba Team 1992-2023
[2024/05/23 00:16:18.788086,  0, pid=20696] ../../source4/samba/server.c:896(binary_smbd_main)
  binary_smbd_main: samba: using 'prefork' process model
[2024/05/23 00:16:25.532500,  0, pid=20709] ../../source4/samba/service_stream.c:373(stream_setup_socket)
  stream_setup_socket: Failed to listen on ::1:389 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
[2024/05/23 00:16:25.532604,  0, pid=20709] ../../source4/ldap_server/ldap_server.c:1186(add_socket)
  add_socket: ldapsrv failed to bind to ::1:389 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
[2024/05/23 00:16:25.532620,  0, pid=20709] ../../source4/samba/service_task.c:36(task_server_terminate)
  task_server_terminate: task_server_terminate: [Failed to startup ldap server task]
[2024/05/23 00:16:25.539240,  0, pid=20696] ../../source4/samba/server.c:392(samba_terminate)
  samba_terminate: samba_terminate of samba 20696: Failed to startup ldap server task
Comment 3 Christian Kowarzik 2024-05-28 22:14:13 CEST
I am looking into this and although I have not yet been able to reproduce the NT_STATUS_ADDRESS_ALREADY_ASSOCIATED error, I like to share my preliminary findings with you.

The solution to Bug 56914 added the assumption to the stop-branch of /etc/init.d/samba-ad-dc that the processes /usr/sbin/smbd and /usr/sbin/winbindd are always part of the main samba process tree.

This assumption is the result of removing the following lines from the stop-branch of /etc/init.d/samba-ad-dc (for details see description of Bug 56914):

-		## check for smbd and winbindd as well, in case ADDS has just been configured
-		for service in smbd winbindd; do
-			pid=$(pgrep -x "$service")
-			if [ -n "$pid" ]; then
-				start-stop-daemon --stop --quiet --oknodo \
-					--name "$service" --retry "TERM/3/KILL/1" -v \
-					| sed -rn 's/.*, (retry #|refused to die)/\1/p' \
-					| while read line; do log_action_cont_msg "$line"; done
-			fi

This assumption in generally true after successful provision of Samba (AD DC) as then /usr/sbin/smbd and /usr/sbin/winbindd are indeed forked by the main samba process tree startet by /etc/init.d/samba-ad-dc.

Nevertheless, during provision of Samba (AD DC) this assumption is not true as the postinst-scripts from the packages winbind and samba will already have started /usr/sbin/winbindd (through 'invoke-rc.d --skip-systemd-native winbind $_dh_action', with $_dh_action="start") and /usr/sbin/smbd (through 'invoke-rc.d --skip-systemd-native smbd $_dh_action', with $_dh_action="start") respectively when the join scripts from package univention-samba4 will run.

In my tests I could reproduce that the order of the commands of the function stop_conflicting_services() from /usr/lib/univention-install/96univention-samba4.inst _always_ causes the call '/etc/init.d/samba-ad-dc stop' to complain that the process '/usr/sbin/winbindd -D' (and its children) is still running and then terminating it - exactly the excerpt from journalctl.log in the description of this Bug and in comment 2 shows.

This beahaviour of the call '/etc/init.d/samba-ad-dc stop' is a direct result from the assumption that /usr/sbin/winbindd is always part of the main samba process tree - which is not true when stop_conflicting_services() gets called, as the process '/usr/sbin/winbindd -D' was startet by 'invoke-rc.d --skip-systemd-native winbind $_dh_action', with $_dh_action="start, as explained above. 

# excerpt from /usr/lib/univention-install/96univention-samba4.inst 
stop_conflicting_services() {
        ## stop samba3 services and heimdal-kdc if present
        if [ -x /etc/init.d/samba ]; then
                if [ -n "$(pgrep -f '/usr/sbin/(smbd|nmbd)')" ]; then
                        /etc/init.d/samba stop
                        ## the smbd init script might refuse to run if it detects ADDC config in smb.conf
                        start-stop-daemon --stop --quiet --retry 2 --exec /usr/sbin/smbd
                fi
        fi
        if [ -x /etc/init.d/winbind ]; then
                if [ -n "$(pgrep -xf /usr/sbin/winbindd)" ]; then
                        /etc/init.d/winbind stop
                        # Bug #35600: Really stop all winbind processes
                        start-stop-daemon --stop --quiet --retry 2 --exec /usr/sbin/winbindd
                fi
        fi

To be precise, calling stop_conflicting_services() will detect the running /usr/sbin/smbd, then call '/etc/init.d/samba stop', which calls '/etc/init.d/samba-ad-dc stop' with the results explained above.

All this will not make the provision of Samba (AD DC) fail because we didn't hit here the NT_STATUS_ADDRESS_ALREADY_ASSOCIATED error.

However, IMHO this situation should be avoided and i suggest the following two patches to do so:

1. In stop_conflicting_services() change the order of calls to first stop winbind, then samba
(see attachment 'patch to /usr/lib/univention-install/96univention-samba4.inst')

2. In the stop-branch of /etc/init.d/samba-ad-dc send the TERM Signal also to /usr/sbin/smbd and /usr/sbin/winbindd.
(see attachment 'patch to /etc/init.d/samba-ad-dc')
Comment 4 Christian Kowarzik 2024-05-28 22:15:21 CEST
Created attachment 11214 [details]
patch to /usr/lib/univention-install/96univention-samba4.inst
Comment 5 Christian Kowarzik 2024-05-28 22:15:50 CEST
Created attachment 11215 [details]
patch to /etc/init.d/samba-ad-dc
Comment 6 Jan-Luca Kiok univentionstaff 2024-05-29 14:00:10 CEST
Thanks for your efforts and the time spent to troubleshoot this, that's really helpful! We will review your patches and see how to proceed.
Comment 8 Christian Kowarzik 2024-07-02 13:38:19 CEST
Comment on attachment 11214 [details]
patch to /usr/lib/univention-install/96univention-samba4.inst

--- a/usr/lib/univention-install/96univention-samba4.inst        2024-02-05 16:53:54.000000000 +0100
+++ b/usr/lib/univention-install/96univention-samba4.inst        2024-07-02 15:37:14.044000000 +0200
@@ -207,6 +207,13 @@
 
 stop_conflicting_services() {
        ## stop samba3 services and heimdal-kdc if present
+       if [ -x /etc/init.d/winbind ]; then
+               if [ -n "$(pgrep -f /usr/sbin/winbindd)" ]; then
+                       /etc/init.d/winbind stop
+                       # Bug #35600: Really stop all winbind processes
+                       start-stop-daemon --stop --quiet --retry 2 --exec /usr/sbin/winbindd
+               fi
+       fi
        if [ -x /etc/init.d/samba ]; then
                if [ -n "$(pgrep -f '/usr/sbin/(smbd|nmbd)')" ]; then
                        /etc/init.d/samba stop
@@ -214,13 +221,6 @@
                        start-stop-daemon --stop --quiet --retry 2 --exec /usr/sbin/smbd
                fi
        fi
-       if [ -x /etc/init.d/winbind ]; then
-               if [ -n "$(pgrep -xf /usr/sbin/winbindd)" ]; then
-                       /etc/init.d/winbind stop
-                       # Bug #35600: Really stop all winbind processes
-                       start-stop-daemon --stop --quiet --retry 2 --exec /usr/sbin/winbindd
-               fi
-       fi
        if [ -x /etc/init.d/heimdal-kdc ]; then
                if [ -n "$(pgrep -f '/usr/lib/heimdal-servers/(kdc|kpasswdd)')" ]; then
                        /etc/init.d/heimdal-kdc stop
Comment 9 Christian Kowarzik 2024-07-02 13:52:08 CEST
Comment on attachment 11215 [details]
patch to /etc/init.d/samba-ad-dc

--- a/etc/init.d/samba-ad-dc	2024-03-05 13:13:26.000000000 +0100
+++ b/etc/init.d/samba-ad-dc	2024-07-02 15:50:59.616000000 +0200
@@ -77,10 +77,15 @@
 		log_daemon_msg "Stopping $DESC" $NAME
 		## sometimes samba takes a long time to terminate,
 		## which would make starting new samba processes fail.
-		pids=$(pgrep -F "$PIDFILE"; pgrep --exact '(smbd|winbindd)')
+		pid_samba=$(pgrep -F "$PIDFILE")
+		pid_smbd=$(pgrep --exact 'smbd')
+		pid_winbindd=$(pgrep --exact 'winbindd')
 		start-stop-daemon --stop --quiet --pidfile $PIDFILE --name samba
 		ret="$?"
-		if [ -n "$pids" ]; then
+		[ -n "$pid_smbd" ] && start-stop-daemon --stop --quiet --pid $pid_smbd --name smbd
+		[ -n "$pid_winbindd" ] && start-stop-daemon --stop --quiet --pid $pid_winbindd --name winbindd
+		pids="$pid_samba $pid_smbd $pid_winbindd"
+		if [ -n "${pids// /}" ]; then
 			unset pgids kgids
 			for pid in $pids; do
 				pgids="$pgids -g $pid"
Comment 10 Christian Kowarzik 2024-07-02 13:58:50 CEST
Created attachment 11224 [details]
revised  patch to /etc/init.d/samba-ad-dc

cosmetic changes
Comment 11 Christian Kowarzik 2024-07-02 14:04:01 CEST
Created attachment 11225 [details]
revised  patch to /usr/lib/univention-install/96univention-samba4.inst

"pgrep -xf /usr/sbin/winbindd" will never match as both winbind.service and samba-ad-dc.service will start winbind with at least "-D" Option.
Comment 14 Arvid Requate univentionstaff 2024-11-22 16:33:43 CET
The suggested changes affect two parts:

1. /etc/init.d/samba-ad-dc shipped by source package samba
2. Joinscript /usr/lib/univention-install/96univention-samba4.inst

The first part has been done via ucs-patches and that package has been built via repo-ng:

1.a) For UCS 5.2-0

ucs-patches@1fbc3549 | Make samba-ad-dc stop smbd and winbindd explicitly

ucs-patches:samba/ucs_5.2-0/2:4.21.1-1/15_samba4_stop.patch

Package: samba
Version: 2:4.21.1-1A~5.2.0.202411191702
Branch: 5.2-0

1.b) Backport for 5.0-9:

ucs-patches@d9fe48bf | Make samba-ad-dc stop smbd and winbindd explicitly

ucs-patches:samba/ucs_5.0-0-errata5.0-9/2:4.18.3-1/15_samba4_stop.patch

Package: samba
Version: 2:4.18.3-1A~5.0.0.202411191740
Branch: 5.0-0
Scope: errata5.0-9


The second part has been done normally via the ucs repo:

2.a) For UCS 5.2-0

ucs@8ee0c6a8b46 | Let stop_conflicting_services stop winbindd first

Package: univention-samba4
Version: 11.0.7
Branch: 5.2-0

2.b) Backport for 5.0-9:

ucs@97b95a86470 | Let stop_conflicting_services stop winbindd first
5460dd9a93f | Advisories

Package: univention-samba4
Version: 9.0.18-3
Branch: 5.0-0
Scope: errata5.0-9
Comment 15 Felix Botner univentionstaff 2024-11-25 09:24:20 CET
OK, looks good