Bug 41168 - UCS@school Samba/AD Slaves left over in Master DNS
UCS@school Samba/AD Slaves left over in Master DNS
Product: UCS
Classification: Unclassified
Component: Samba4
UCS 4.1
Other Linux
: P5 normal (vote)
: UCS 4.1-3-errata
Assigned To: Stefan Gohmann
Felix Botner
: 41167 (view as bug list)
Depends on:
  Show dependency treegraph
Reported: 2016-04-27 19:01 CEST by Arvid Requate
Modified: 2016-09-14 15:38 CEST (History)
3 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Ticket number:
Bug group (optional): Troubleshooting
Max CVSS v3 score:

remove_ucsschool_samba4_slaves_from_dns.sh (1.85 KB, text/plain)
2016-04-27 19:01 CEST, Arvid Requate

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2016-04-27 19:01:13 CEST
Created attachment 7629 [details]

In the situation of Bug #41167 the UCS@school DC Slaves were also still present in the DNS of the UCS@school Samba/AD Master.

The attached script may be useful for support to clean this up.

Generally this should be taken care of by the S4-Connector on the Master via the connector/s4/mapping/dns/* UCS variables.

+++ This bug was initially created as a clone of Bug #41167 +++

Ticket#2016042721000409 reported 100% CPU consumed samba (dreplsrv) due to Slave accounts left over in the Samba/AD account database on the UCS@school Master.
Comment 1 Sönke Schwardt-Krummrich univentionstaff 2016-07-07 23:27:01 CEST
@Arvid: what kind of impact does it have if the school slaves are present in DNS of the UCS@school Samba/AD Master?
Comment 2 Arvid Requate univentionstaff 2016-07-11 13:51:17 CEST
Well, if they are advertised in the SRV records, e.g. _kerberos._tcp, then they will be contacted by clients. Since the UCS@School Slave (branch site) DCs don't have the full user database this may e.g. result in intermittent authentication errors.
Comment 3 Stefan Gohmann univentionstaff 2016-07-11 16:47:30 CEST
See also Bug #41167.
Comment 4 Arvid Requate univentionstaff 2016-07-12 15:47:59 CEST
The case of Ticket#2016071121000755 again showed the intricate replication issues this causes:

After performing the steps to get out of Bug #41167 in that case the two DCs in the central school department had replication issues, apparently because they tried and failed to get proper information from the School Slave. We had to

1. run a modified version of the cleanup-script above

2. service samba stop & start on the DC Master
   samba restart was not enough in this case, apparently it left
   a stuck kccserv process running, causing this showrepl error:

ERROR(runtime): DsReplicaGetInfo of type 0 failed - (-1073610699, 'The operation cannot be performed.')

   After step 1 or 2 DRS replication traffic ramped up to 100% on the DC Master,
   maybe some stuff had not been replicated yet to "the other DC"

3. stop & start samba on "the other DC" to get rid of "WERR_BAD_NETPATH" in the
   showrepl output

4. Wait for DRS-replication to stabilize

All together neither a pleasant nor a straight forward experience.
Comment 5 Stefan Gohmann univentionstaff 2016-08-29 09:50:31 CEST
It looks like I'm able to reproduce it. For Bug #41167 I've implemented a "_remove_slavepdc_account_from_master_s4" in the Slave PDC join script : 96univention-samba4slavepdc.inst.

After a school slave has been removed, I see on the DC Backup:

        Default-First-Site-Name\MASTER300 via RPC
                DSA object GUID: 8380008f-135a-4868-bb07-95a32cd687ec
                Last attempt @ Mon Aug 29 03:45:07 2016 EDT failed, result 58 (WERR_BAD_NET_RESP)
                17 consecutive failure(s).
                Last success @ Mon Aug 29 03:17:09 2016 EDT

And from the log.samba file:

 [2016/08/29 03:45:07.706391,  0, pid=3676] ../source4/dsdb/repl/replicated_objects.c:783(dsdb_replicated_objects_commit)
  Failed to apply records: ../ldb_tdb/ldb_index.c:1216: Failed to re-index objectGUID in CN=slave300-s1\0ACNF:65eb34c9-4121-4583-8785-b0ad90b555aa,CN=dc,CN=server,CN=computers,OU=School1,DC=autotest300,DC=local - ../ldb_tdb/ldb_index.c:1148: unique index violation on objectGUID in CN=slave300-s1\0ACNF:65eb34c9-4121-4583-8785-b0ad90b555aa,CN=dc,CN=server,CN=computers,OU=School1,DC=autotest300,DC=local: Entry already exists
[2016/08/29 03:45:07.706556,  0, pid=3676] ../source4/dsdb/repl/drepl_out_helpers.c:773(dreplsrv_op_pull_source_apply_changes_trigger)

This Jenkins job can be used to reproduce it:
http://jenkins.knut.univention.de:8080/job/UCSschool 4.1/job/UCSschool 4.1 (R2) Large Environment
Comment 6 Stefan Gohmann univentionstaff 2016-09-01 06:23:33 CEST
*** Bug 41167 has been marked as a duplicate of this bug. ***
Comment 7 Stefan Gohmann univentionstaff 2016-09-01 06:27:43 CEST
The Slave account is now "demoted" manually, see 96univention-samba4slavepdc.inst.

The tests were successful:

I execute the 'samba-tool drs kcc' only on one central Samba server. The others will execute it within five minutes.
Comment 8 Felix Botner univentionstaff 2016-09-13 10:24:50 CEST
OK - update, slave demoted on master, back (userAccountControl: 4096)
OK - installation, userAccountControl: 4096 for slave in masters samba
OK - yaml
OK - merged to 4.2-0
Comment 9 Janek Walkenhorst univentionstaff 2016-09-14 15:38:57 CEST