Bug 47634 - restarting only samba-ad-dc breaks drs replication in Samba 4.6.15
restarting only samba-ad-dc breaks drs replication in Samba 4.6.15
Status: RESOLVED WONTFIX
Product: UCS
Classification: Unclassified
Component: Samba4
UCS 4.2
Other Linux
: P5 normal (vote)
: ---
Assigned To: Samba maintainers
Samba maintainers
https://bepasty.knut.univention.de/hi...
:
Depends on: 47429
Blocks: 47635 47637 47638
  Show dependency treegraph
 
Reported: 2018-08-22 19:51 CEST by Arvid Requate
Modified: 2020-07-03 20:55 CEST (History)
3 users (show)

See Also:
What kind of report is it?: Development Internal
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2018-08-22 19:51:44 CEST
Restarting only samba-ad-dc breaks drs replication in Samba 4.6.15 (UCS 4.2)

For example after

  /etc/init.d/samba-ad-dc restart


the command "samba-tool drs showrepl" fails:

ERROR(runtime): DsReplicaGetInfo of type 0 failed - (-1073610699, 'The operation cannot be performed.')

The same happens for restarts with "systemctl" and "service".


After a "/etc/init.d/samba restart" it works again.
Comment 1 Arvid Requate univentionstaff 2018-08-22 19:58:31 CEST
Felix discovered a nasty variation of this in a jenkins test setup, where some bind9 related test temporarily added a network interface (and later removed it again) which apparently caused systemd to restart dependent processes.


A pretty characteristic sequence in /var/log/syslog looks like this:

=============================================================================
Aug 22 17:49:16 slave systemd[1]: Stopping LSB: start Samba daemons for the AD DC...
Aug 22 17:49:16 slave samba-ad-dc[3724]: Stopping Samba AD DC daemon: samba.
Aug 22 17:49:16 slave systemd[1]: Starting LSB: start Samba daemons for the AD DC...
Aug 22 17:49:17 slave samba-ad-dc[3737]: Starting Samba AD DC daemon: samba.
Aug 22 17:49:17 slave systemd[1]: Started LSB: start Samba daemons for the AD DC.
=============================================================================

I'm still evaluating if we have a similar issue with Samba 4.7.8. This morning I saw something like this in an internal environment, but just now I cannot reproduce it.
Comment 2 Arvid Requate univentionstaff 2018-08-22 20:16:54 CEST
In log.samba this is already logged at level 0 as:

  IRPC callback failed for DsReplicaSync - NT_STATUS_OBJECT_NAME_NOT_FOUND
Comment 3 Felix Botner univentionstaff 2018-08-23 10:45:19 CEST
-> /usr/lib/univention-server/server_password_change.d/univention-samba4 | grep samba-ad
	
test -x /etc/init.d/samba-ad-dc && invoke-rc.d samba-ad-dc restart
Comment 4 Felix Botner univentionstaff 2018-08-23 11:05:47 CEST
Just checked to test logfile again, i think systemd is not the main problem

(1)

systemd only knows about samba-ad-c, smbd and nmbd, /etc/init.d/samba is not linked in any sysv runlevel and therefor not considered as a service by systemd

=> systemd uses samba-ad-c, smbd and nmbd and in the correct order

(2)

The only restart of samba-ad-c i could find in the syslog was at the time the use-test "server_password_change" has beed executed, and our server password script explicitly restart samba-ad-dc (as many other packages, tools, like printerserver.postinst, s4 connector joinscipt...)

(3)

So i think we should avoid using /etc/init.d/samba-ad-dc in at least 4.2, as this breaks drs replication on samba DC's
Comment 5 Felix Botner univentionstaff 2018-08-23 11:27:32 CEST
Also happened on billy (password change -> samba-ad-dc restart ->IRPC callback failed for DsReplicaGetInfo - NT_STATUS_CONNECTION_REFUSED ) with UCS 4.3, but i can not reproduce this behavior with 4.3. Nevertheless, we should fix /usr/lib/univention-server/server_password_change.d/univention-samba4 (samba restart instead of samba-ad-dc restart) asap.
Comment 6 Moritz Bunkus 2019-05-13 16:18:14 CEST
I've encountered the same issue: A customer who wants to use Pacemaker/Corosync on their DCs; as Samba listens on specific IPs (instead of e.g. *:445), adding & removing IPs requires a restart of Samba. Using `systemctl restart samba-ad-dc.service` leads to DRS breaking with the same error messages as reported before.

My debug fu lead me to the following observations; maybe they're useful to someone:

1. All four services ("samba", "samba-ad-dc", "smbd", "nmbd") are, from the PoV of systemd, generated by the sysv-generator from the scripts in /etc/init.d (see "systemctl cat {samba,samba-ad-dc,smbd,nmbd}.service").

2. Apart from a lot of setup stuff, "/etc/init.d/samba-ad-dc" effectively executes "/usr/sbin/samba" with certain options.

3. "/etc/init.d/samba" on the other hand always executes the services "smbd", "nmbd" und "samba-ad-dc", each of which should effectively execute the scripts in "/etc/init.d" due to 1.

4. Both "/etc/init.d/smbd" and "/etc/init.d/nmbd" are noops on AD DCs.

5. This leaves us with the main difference between "/etc/init.d/samba restart" and "systemctl restart samba-ad-dc.service": the enforced "sleep 1" in "/etc/init.d/samba". All of this looks like a timing issue.
Comment 7 Arvid Requate univentionstaff 2019-07-31 13:04:43 CEST
Reminder: Please only use "/etc/init.d/samba restart" not "service samba restart" or "service samba-ad-dc restart" or whatever
until we have fixed Bug #44237.
Comment 8 Ingo Steuwer univentionstaff 2020-07-03 20:55:11 CEST
This issue has been filed against UCS 4.2.

UCS 4.2 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed.

If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.