Univention Bugzilla – Bug 30836
98univention-samba4-dns stopped because no RID Set was replicated in 180 seconds
Last modified: 2016-10-20 12:39:52 CEST
During initial Join of a ucs3.1-1 Samba4 DC Backup the Joinscript 98univention-samba4-dns stopped because no RID Set was replicated to the DC Backup within 180 seconds. As a result the "dns-$hostname" service account was not created. join.log: ================================================================================== Configure 98univention-samba4-dns.inst Wed Feb 13 04:15:43 CET 2013 Waiting for RID Pool replication: ........................................................................................................................... ........................................................ Error no rIDSetReferences replicated for backup11 Wed Feb 13 04:19:31 CET 2013: finish /usr/sbin/univention-join ================================================================================== Another call to univention-run-join-scripts fixed the issue. Maybe the timeout needs to be increased a bit.
*** Bug 32993 has been marked as a duplicate of this bug. ***
I guess a better solution would be to address this via Bug 30115.
higher priority, the problem occured in 2 larger environments (both UCS 3.1, the newer one Errata 190) and was not to solve by waiting. References: 2013111221001099 and 2013041821001047 Workaround in both cases: - copy the attached DRSUAPI_EXOP_FSMO_RID_ALLOC.sh and getncchanges on the DC Slave and make them executable - run ./DRSUAPI_EXOP_FSMO_RID_ALLOC.sh, enter "Administrator" password - the script waits for 10 seconds, which might be to short (change in the code) The Rid Pool object should be created and replicated.
Created attachment 5626 [details] DRSUAPI_EXOP_FSMO_RID_ALLOC.sh
Created attachment 5627 [details] getncchanges
Occured again at ticket 2013121021002509
Again 2014060421004323
I just faced this again on a DC Slave which picked the DC Backup DC as the system to join against. My impression is that this cause the problem, I guess that it takes too long for the joining Slave until 1. The new DC Slave account is replicated to the DC Master (PDC emulator) 2. The PDC Emulator has created a "CN=RID Set" for the DC Slave 3. The "CN=RID Set" object replicated to the DC Slave Usually the "CN=RID Set" should be present at the time the Samba join ha completed. This is the join.log: ====================================================================== Finding a writeable DC for domain 'ar40i1.qa' Found DC backup51.ar40i1.qa workgroup is AR40I1 realm is ar40i1.qa [...] Configure 98univention-samba4-dns.inst Wed Nov 26 13:19:36 CET 2014 2014-11-26 13:19:36.874980262+01:00 (in joinscript_init) Waiting for RID Pool replication: ................................................................................................................................................................................... Error no rIDSetReferences replicated for slave52 ====================================================================== In case this happens again before we have a go at fixing it, please attach the relevant join.log info, especially the "Found DC " line. It would also be relevant to know if there are "DC Master only" cases where this happens, which would help falsify my theory.
Btw. in my case the situation fixed itself, I just had to run univention-run-join-scripts again. Also, no CNF-objects appeared, i.e. Bug 33388 did not raise his head in this case. I would propse the same fix though, make Samba join against the system which has the "PDC Empulator" FSMO (with a reasonable alternative for Slave PDCs like in UCS@school).
Again via 2015032721000261
*** Bug 38228 has been marked as a duplicate of this bug. ***
*** Bug 38229 has been marked as a duplicate of this bug. ***
This happens again in a fresh UCS 4.1 test installation with 3 Samba 4 DCs (Master, Backup and Slave): Waiting for RID Pool replication: ................................................................................................................................................................................... Error no rIDSetReferences replicated for slave413 After rebooting and running univention-run-join-scripts it worked directly.
Ok, I could reproduce it: it looks looks this happens when the slave joins against the DC Backup (i.e. not the RID Master). So we have some options: a) make Samba join against the master (S4-Connector or PDC emulator) always b) make 96univention-samba4.inst trigger "RID Set" generation explicitly c) make 98univention-samba4-dns.inst trigger "RID Set" generation
Bug 33388 could be an argument for option a)
Actually in my test domain the CN=RID Set eventually got created on the DC Master, but the timestamps show that it's more that 10 minutes after the account object was created on the DC Backup: ============================================================================= dn: CN=SLAVE12,OU=Domain Controllers,DC=ar41i1,DC=qa replPropertyMetaData: NDR: struct replPropertyMetaDataBlob version : 0x00000001 (1) reserved : 0x00000000 (0) ctr : union replPropertyMetaDataCtr(case 1) ctr1: struct replPropertyMetaDataCtr1 count : 0x0000001a (26) reserved : 0x00000000 (0) array: ARRAY(26) array: struct replPropertyMetaData1 attid : DRSUAPI_ATTID_objectClass (0x0) version : 0x00000001 (1) originating_change_time : Mon Nov 23 18:37:56 2015 CET originating_invocation_id: <ID of DC Backup> ============================================================================= ============================================================================= dn: CN=RID Set,CN=SLAVE12,OU=Domain Controllers,DC=ar41i1,DC=qa replPropertyMetaData: NDR: struct replPropertyMetaDataBlob version : 0x00000001 (1) reserved : 0x00000000 (0) ctr : union replPropertyMetaDataCtr(case 1) ctr1: struct replPropertyMetaDataCtr1 count : 0x0000000a (10) reserved : 0x00000000 (0) array: ARRAY(10) array: struct replPropertyMetaData1 attid : DRSUAPI_ATTID_objectClass (0x0) version : 0x00000001 (1) originating_change_time : Mon Nov 23 18:48:45 2015 CET originating_invocation_id: <ID of Master> =============================================================================
(In reply to Arvid Requate from comment #15) > Bug 33388 could be an argument for option a) Yes, I vote for a).
happend again at Ticket#2016060821000576 (slave joins against backup instead of master)
Created attachment 7728 [details] join_against_s4c_dc.patch Via Bug 32257 we introduced a function get_available_s4connector_dc in the univention-samba4 join script. I guess we could use that, see attachment, untested.
It looks like it happened again: Ticket #2016101121000687.
* It is possible that Samba 4 joins against another DC and not against the master. This could led to different problems. The join script now tries to join against the S4 Connector system first (Bug #30836). UCS 4.1-3: r73194 UCS 4.2: r73195 YAML: r73196
Created attachment 8127 [details] check_domain_info_for_bug30836.diff Ok, works. Corner case: If I stop samba4 on the S4-Connector host (master in my case) then the first join attempt fails (python traceback) and continues as before by letting Samba choose any DC on the domain. So the script then falls back to the old behavior. That's ok. Maybe we should also do the "samba-tool domain info" introduced for Bug 34422 comment 2 to avoid a broken sam.ldb in case of replication issues? See attached patch. On the other hand, we may want to avoid adding yet another layer of logic and instead choose for UCS 5.0 to simplify the joinscript to *always* join against the S4-Connector host and just immediately abort the join if that fails instead of desperately attempting to "somehow" get the join done and possibly ending up in an undefined state in the end.
Thanks, the patch makes sense. Applied: r73304 + r73305 + r73306
Ok works and code is merged to UCS 4.2. Advisory is up to date to.
<http://errata.software-univention.de/ucs/4.1/309.html>