Bug 47634

Summary: restarting only samba-ad-dc breaks drs replication in Samba 4.6.15
Product: UCS Reporter: Arvid Requate <requate>
Component: Samba4Assignee: Samba maintainers <samba-maintainers>
Status: RESOLVED WONTFIX QA Contact: Samba maintainers <samba-maintainers>
Severity: normal    
Priority: P5 CC: best, botner, m.bunkus
Version: UCS 4.2   
Target Milestone: ---   
Hardware: Other   
OS: Linux   
URL: https://bepasty.knut.univention.de/hiwdAtqz
See Also: https://forge.univention.org/bugzilla/show_bug.cgi?id=44237
https://forge.univention.org/bugzilla/show_bug.cgi?id=42241
What kind of report is it?: Development Internal What type of bug is this?: ---
Who will be affected by this bug?: --- How will those affected feel about the bug?: ---
User Pain: Enterprise Customer affected?:
School Customer affected?: ISV affected?:
Waiting Support: Flags outvoted (downgraded) after PO Review:
Ticket number: Bug group (optional):
Max CVSS v3 score:
Bug Depends on: 47429    
Bug Blocks: 47635, 47637, 47638    

Description Arvid Requate univentionstaff 2018-08-22 19:51:44 CEST
Restarting only samba-ad-dc breaks drs replication in Samba 4.6.15 (UCS 4.2)

For example after

  /etc/init.d/samba-ad-dc restart


the command "samba-tool drs showrepl" fails:

ERROR(runtime): DsReplicaGetInfo of type 0 failed - (-1073610699, 'The operation cannot be performed.')

The same happens for restarts with "systemctl" and "service".


After a "/etc/init.d/samba restart" it works again.
Comment 1 Arvid Requate univentionstaff 2018-08-22 19:58:31 CEST
Felix discovered a nasty variation of this in a jenkins test setup, where some bind9 related test temporarily added a network interface (and later removed it again) which apparently caused systemd to restart dependent processes.


A pretty characteristic sequence in /var/log/syslog looks like this:

=============================================================================
Aug 22 17:49:16 slave systemd[1]: Stopping LSB: start Samba daemons for the AD DC...
Aug 22 17:49:16 slave samba-ad-dc[3724]: Stopping Samba AD DC daemon: samba.
Aug 22 17:49:16 slave systemd[1]: Starting LSB: start Samba daemons for the AD DC...
Aug 22 17:49:17 slave samba-ad-dc[3737]: Starting Samba AD DC daemon: samba.
Aug 22 17:49:17 slave systemd[1]: Started LSB: start Samba daemons for the AD DC.
=============================================================================

I'm still evaluating if we have a similar issue with Samba 4.7.8. This morning I saw something like this in an internal environment, but just now I cannot reproduce it.
Comment 2 Arvid Requate univentionstaff 2018-08-22 20:16:54 CEST
In log.samba this is already logged at level 0 as:

  IRPC callback failed for DsReplicaSync - NT_STATUS_OBJECT_NAME_NOT_FOUND
Comment 3 Felix Botner univentionstaff 2018-08-23 10:45:19 CEST
-> /usr/lib/univention-server/server_password_change.d/univention-samba4 | grep samba-ad
	
test -x /etc/init.d/samba-ad-dc && invoke-rc.d samba-ad-dc restart
Comment 4 Felix Botner univentionstaff 2018-08-23 11:05:47 CEST
Just checked to test logfile again, i think systemd is not the main problem

(1)

systemd only knows about samba-ad-c, smbd and nmbd, /etc/init.d/samba is not linked in any sysv runlevel and therefor not considered as a service by systemd

=> systemd uses samba-ad-c, smbd and nmbd and in the correct order

(2)

The only restart of samba-ad-c i could find in the syslog was at the time the use-test "server_password_change" has beed executed, and our server password script explicitly restart samba-ad-dc (as many other packages, tools, like printerserver.postinst, s4 connector joinscipt...)

(3)

So i think we should avoid using /etc/init.d/samba-ad-dc in at least 4.2, as this breaks drs replication on samba DC's
Comment 5 Felix Botner univentionstaff 2018-08-23 11:27:32 CEST
Also happened on billy (password change -> samba-ad-dc restart ->IRPC callback failed for DsReplicaGetInfo - NT_STATUS_CONNECTION_REFUSED ) with UCS 4.3, but i can not reproduce this behavior with 4.3. Nevertheless, we should fix /usr/lib/univention-server/server_password_change.d/univention-samba4 (samba restart instead of samba-ad-dc restart) asap.
Comment 6 Moritz Bunkus 2019-05-13 16:18:14 CEST
I've encountered the same issue: A customer who wants to use Pacemaker/Corosync on their DCs; as Samba listens on specific IPs (instead of e.g. *:445), adding & removing IPs requires a restart of Samba. Using `systemctl restart samba-ad-dc.service` leads to DRS breaking with the same error messages as reported before.

My debug fu lead me to the following observations; maybe they're useful to someone:

1. All four services ("samba", "samba-ad-dc", "smbd", "nmbd") are, from the PoV of systemd, generated by the sysv-generator from the scripts in /etc/init.d (see "systemctl cat {samba,samba-ad-dc,smbd,nmbd}.service").

2. Apart from a lot of setup stuff, "/etc/init.d/samba-ad-dc" effectively executes "/usr/sbin/samba" with certain options.

3. "/etc/init.d/samba" on the other hand always executes the services "smbd", "nmbd" und "samba-ad-dc", each of which should effectively execute the scripts in "/etc/init.d" due to 1.

4. Both "/etc/init.d/smbd" and "/etc/init.d/nmbd" are noops on AD DCs.

5. This leaves us with the main difference between "/etc/init.d/samba restart" and "systemctl restart samba-ad-dc.service": the enforced "sleep 1" in "/etc/init.d/samba". All of this looks like a timing issue.
Comment 7 Arvid Requate univentionstaff 2019-07-31 13:04:43 CEST
Reminder: Please only use "/etc/init.d/samba restart" not "service samba restart" or "service samba-ad-dc restart" or whatever
until we have fixed Bug #44237.
Comment 8 Ingo Steuwer univentionstaff 2020-07-03 20:55:11 CEST
This issue has been filed against UCS 4.2.

UCS 4.2 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed.

If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.