Univention Bugzilla – Bug 41168
UCS@school Samba/AD Slaves left over in Master DNS
Last modified: 2016-09-14 15:38:57 CEST
Created attachment 7629 [details]
In the situation of Bug #41167 the UCS@school DC Slaves were also still present in the DNS of the UCS@school Samba/AD Master.
The attached script may be useful for support to clean this up.
Generally this should be taken care of by the S4-Connector on the Master via the connector/s4/mapping/dns/* UCS variables.
+++ This bug was initially created as a clone of Bug #41167 +++
Ticket#2016042721000409 reported 100% CPU consumed samba (dreplsrv) due to Slave accounts left over in the Samba/AD account database on the UCS@school Master.
@Arvid: what kind of impact does it have if the school slaves are present in DNS of the UCS@school Samba/AD Master?
Well, if they are advertised in the SRV records, e.g. _kerberos._tcp, then they will be contacted by clients. Since the UCS@School Slave (branch site) DCs don't have the full user database this may e.g. result in intermittent authentication errors.
See also Bug #41167.
The case of Ticket#2016071121000755 again showed the intricate replication issues this causes:
After performing the steps to get out of Bug #41167 in that case the two DCs in the central school department had replication issues, apparently because they tried and failed to get proper information from the School Slave. We had to
1. run a modified version of the cleanup-script above
2. service samba stop & start on the DC Master
samba restart was not enough in this case, apparently it left
a stuck kccserv process running, causing this showrepl error:
ERROR(runtime): DsReplicaGetInfo of type 0 failed - (-1073610699, 'The operation cannot be performed.')
After step 1 or 2 DRS replication traffic ramped up to 100% on the DC Master,
maybe some stuff had not been replicated yet to "the other DC"
3. stop & start samba on "the other DC" to get rid of "WERR_BAD_NETPATH" in the
4. Wait for DRS-replication to stabilize
All together neither a pleasant nor a straight forward experience.
It looks like I'm able to reproduce it. For Bug #41167 I've implemented a "_remove_slavepdc_account_from_master_s4" in the Slave PDC join script : 96univention-samba4slavepdc.inst.
After a school slave has been removed, I see on the DC Backup:
Default-First-Site-Name\MASTER300 via RPC
DSA object GUID: 8380008f-135a-4868-bb07-95a32cd687ec
Last attempt @ Mon Aug 29 03:45:07 2016 EDT failed, result 58 (WERR_BAD_NET_RESP)
17 consecutive failure(s).
Last success @ Mon Aug 29 03:17:09 2016 EDT
And from the log.samba file:
[2016/08/29 03:45:07.706391, 0, pid=3676] ../source4/dsdb/repl/replicated_objects.c:783(dsdb_replicated_objects_commit)
Failed to apply records: ../ldb_tdb/ldb_index.c:1216: Failed to re-index objectGUID in CN=slave300-s1\0ACNF:65eb34c9-4121-4583-8785-b0ad90b555aa,CN=dc,CN=server,CN=computers,OU=School1,DC=autotest300,DC=local - ../ldb_tdb/ldb_index.c:1148: unique index violation on objectGUID in CN=slave300-s1\0ACNF:65eb34c9-4121-4583-8785-b0ad90b555aa,CN=dc,CN=server,CN=computers,OU=School1,DC=autotest300,DC=local: Entry already exists
[2016/08/29 03:45:07.706556, 0, pid=3676] ../source4/dsdb/repl/drepl_out_helpers.c:773(dreplsrv_op_pull_source_apply_changes_trigger)
Failed to commit objects: WERR_GENERAL_FAILURE/NT_STATUS_INVALID_NETWORK_RESPONSE
This Jenkins job can be used to reproduce it:
http://jenkins.knut.univention.de:8080/job/UCSschool 4.1/job/UCSschool 4.1 (R2) Large Environment
*** Bug 41167 has been marked as a duplicate of this bug. ***
The Slave account is now "demoted" manually, see 96univention-samba4slavepdc.inst.
The tests were successful:
I execute the 'samba-tool drs kcc' only on one central Samba server. The others will execute it within five minutes.
OK - update, slave demoted on master, back (userAccountControl: 4096)
OK - installation, userAccountControl: 4096 for slave in masters samba
OK - yaml
OK - merged to 4.2-0