Univention Bugzilla – Bug 35560
Samba DRS replication hangs after after DC Re-Join
Last modified: 2020-07-03 20:56:47 CEST
After re-joining an UCS Samba4 DC in a domain with more than one Samba4 DC, samba-tool drs showrepl on the other Samba4 DC shows connection problems to the re-joined DC. It seems like Samba4 on the "other" DCs still tries to connect with some Kerberos tickets which are invalid after the re-join. More details are required about this scenario. Currently the only known workaround is to restart samba4 on the "other" Samba4 DCs in the domain. In the case I just faced the output of samba-tool drs showrepl on the master said "WERR_GENERAL_FAILURE" for the INBOUND and OUTBOUND connections to the re-joined slave. The log.samba on the slave showed bursts of 5 messages repeated every 5 seconds, probably for each connect by the master Samba4 drepl server: [2014/08/04 17:52:00.943781, 1, pid=25917] ../source4/auth/gensec/gensec_gssapi.c:648(gensec_gssapi_update) GSS server Update(krb5)(1) Update failed: Miscellaneous failure (see text): Decrypt integrity check failed for checksum type hmac-sha1-96-aes256, key type aes256-cts-hmac-sha1-96 The log.samba on the master shows corresponding messages of this kind: [2014/08/04 17:52:01.685103, 0, pid=18449] ../source4/librpc/rpc/dcerpc_util.c:681(dcerpc_pipe_auth_recv) Failed to bind to uuid e3514235-4b06-11d1-ab04-00c04fc2dcd2 for e3514235-4b06-11d1-ab04-00c04fc2dcd2@ncacn _ip_tcp:32af0d98-9d10-4805-bf97-99bebff7e62f._msdcs.w2k12.test[1024,seal,krb5] NT_STATUS_UNSUCCESSFUL (32af0d98-9d10-4805-bf97-99bebff7e62f._msdcs points to the IP of the slave).
Quite possibly this is a duplicate of Bug #37358.
*** This bug has been marked as a duplicate of bug 37358 ***
I'm not sure if this is Bug #37358 (because no change on the master here, just a re-join of a non-Master UCS system). I have a the same issue with s4 master + s4 backup + s4 slave after the second re-join of the backup, drs to the backup from master and slave is broken. CN=Schema,CN=Configuration,DC=four,DC=two Default-First-Site-Name\BACKUP via RPC DSA object GUID: 012a971c-bac3-4dc9-a036-5e4538c94a81 Last attempt @ Thu Jul 6 00:17:12 2017 CEST failed, result 31 (WERR_GEN_FAILURE) 6 consecutive failure(s). Last success @ NTTIME(0) master log.samba: [2017/07/06 00:17:37.955740, 0, pid=4283] ../source4/librpc/rpc/dcerpc_util.c:737(dcerpc_pipe_auth_recv) Failed to bind to uuid e3514235-4b06-11d1-ab04-00c04fc2dcd2 for ncacn_ip_tcp:10.200.7.52[1024,seal,krb5,target_hostname=012a971c-bac3-4dc9-a036-5e4538c94a81._msdcs.four.two,target_principal=GC/backup.four.two/four.two,abstract_syntax=e3514235-4b06-11d1-ab04-00c04fc2dcd2/0x00000004,localaddress=10.200.7.50] NT_STATUS_UNSUCCESSFUL backup log.samba: GSS server Update(krb5)(1) Update failed: Miscellaneous failure (see text): Decrypt integrity check failed for checksum type hmac-sha1-96-aes256, key type aes256-cts-hmac-sha1-96 [2017/07/06 00:17:58.776201, 1, pid=2203] ../source4/auth/gensec/gensec_gssapi.c:622(gensec_gssapi_update) GSS server Update(krb5)(1) Update failed: Miscellaneous failure (see text): Decrypt integrity check failed for checksum type hmac-sha1-96-aes256, key type aes256-cts-hmac-sha1-96 Re-joining a UCS system has to work!
IIRC in that case DRS replication between re-joined backup and master worked. But the DRS replication between backup and other Slave DCs didn't work. That situation had another special behaviour: Even a samba restart on the slaves didn't get the replication going again. The reason was, that the Samba/AD data on the Slaves still had an old "CN=NTDS Settings" Object for the backup-DC. That object is stored in the CN=Configuration partition. univention-s4search --cross-ncs "CN=NTDS Settings" objectGUID The objectGUID of those objects is relevant, because it's used by the replication for a DNS lookup of a DNS alias. In the given case, the Slave DCs continued to look for the DNS alias the the old objectGUID -- and worse, they seem to fetch a Kerberos-Ticket for FQDN. As a result we saw Kerberos authentication errors in the samba.log on the DC backup.
replication work again after merging the old password kvno entries from the old (before join) /etc/krb5.keytab @slave cp /etc/krb5.keytab /etc/krb5.keyta.OLD @slave re join @master samba-tool drs showrepl DC=DomainDnsZones,DC=four,DC=three Default-First-Site-Name\SLAVE via RPC DSA object GUID: 0bd5a0f8-9a3d-41de-865a-940a59e47cc7 Last attempt @ Wed Nov 28 16:45:10 2018 CET failed, result 31 (WERR_GEN_FAILURE) @slave ktutil copy /etc/krb5.keytab.OLD /etc/krb5.keytab @master samba-tool drs showrep OK
Just as addition: based on the last comment I was able to fix this issue on a customer site based on comment #5 from Felix.
This issue has been filed against UCS 4.2. UCS 4.2 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed. If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.