Bug 37358 - Broken DRS replication while the connection is broken and the master password is changed twice
Broken DRS replication while the connection is broken and the master password...
Status: NEW
Product: UCS
Classification: Unclassified
Component: Samba4
UCS 4.4
Other Linux
: P2 normal (vote)
: ---
Assigned To: Samba maintainers
:
: 40260 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-12-16 11:04 CET by Tim Petersen
Modified: 2022-04-21 15:49 CEST (History)
5 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 1: Cosmetic issue or missing function but workaround exists
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.011
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2014121521000134, 2019090221000405, 2020113021000561
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Petersen univentionstaff 2014-12-16 11:04:45 CET
2014121521000134

At least under the following constructed cicumstances, the drs replication between master and slave breaks permanently:

1. DRS Connection between Master and Slave unreachable (S4 stopped for example)
2. Server-Password-Change at Master twice

We hold two versions of kvno - lets say, the master started with 1.
It is raised to 3 then after two password changes. The master holds 3 and 2 now in keytab.

The slave still asks for 1, if Samba/DRS is responsive again...


The kvno is hold at msDS-KeyVersionNumber in CN=MASTER,OU=Domain Controllers,DC=domain,DC=test - it is only searchable if directly addressed.

The attribute itself is not editable because it is constructed:

ldbedit -H /var/lib/samba/private/sam.ldb.d/DC=DOMAIN,DC=TIM.ldb -b "cn=master,OU=Domain Controllers,DC=domain,DC=tim" msDS-KeyVersionNumber
failed to modify CN=MASTER,OU=Domain Controllers,DC=domain,DC=tim - objectclass_attrs: attribute 'msDS-KeyVersionNumber' on entry 'CN=MASTER,OU=Domain Controllers,DC=domain,DC=tim' is constructed!

ldbedit -H ldapi:///var/lib/samba/private/ldap_priv/ldapi sAMAccountName='Administrator' supplementalCredentials msds-keyversionnumber --controls=local_oid:1.3.6.1.4.1.7165.4.3.12:0
failed to modify CN=MASTER,OU=Domain Controllers,DC=domain,DC=tim - LDAP error 19 LDAP_CONSTRAINT_VIOLATION -  <0000202F: objectclass_attrs: attribute 'msDS-KeyVersionNumber' on entry 'CN=MASTER,OU=Domain Controllers,DC=domain,DC=tim' is constructed!> <>
Comment 1 Tim Petersen univentionstaff 2014-12-16 11:09:21 CET
ldbsearch -H ldapi:///var/lib/samba/private/ldap_priv/ldapi sAMAccountName=master\$ replPropertyMetaData > master.replPropertyMetaData.ldif

ldbedit -H ldapi:///var/lib/samba/private/ldap_priv/ldapi samaccountname=master\$ replPropertyMetaData --controls=local_oid:1.3.6.1.4.1.7165.4.3.12:0

rebuilds the msDS-KeyVersionNumber
Comment 2 Tim Petersen univentionstaff 2014-12-16 11:43:29 CET
...but it only fixes the kvno mismatch (Failed to find MASTER$@DOMAIN.TIM(kvno xy) in keytab FILE:/etc/krb5.keytab)

the DRS replication remains broken (after s4 restart).
Comment 3 Tim Petersen univentionstaff 2014-12-16 11:51:43 CET
(In reply to Tim Petersen from comment #1)

At the master:
> ldbsearch -H ldapi:///var/lib/samba/private/ldap_priv/ldapi
> sAMAccountName=master\$ replPropertyMetaData >
> master.replPropertyMetaData.ldif


store and write to slave:
> ldbedit -H ldapi:///var/lib/samba/private/ldap_priv/ldapi
> samaccountname=master\$ replPropertyMetaData
> --controls=local_oid:1.3.6.1.4.1.7165.4.3.12:0
> 
> rebuilds the msDS-KeyVersionNumber
Comment 4 Tim Petersen univentionstaff 2014-12-16 12:06:37 CET
At slave:
ucr set kerberos/kdc=ip_master
invoke-rc.d samba-ad-dc restart

works!
Comment 5 Arvid Requate univentionstaff 2014-12-16 13:22:06 CET
The issue was that the slave uses 127.0.0.1 as kerberos/kdc, so he will always get the old keys which are rejected by the master. This has been introduced via Bug 29291.

Proposal:
========
Either we revert that change, or (better) we configure a special krb5.conf to be used by the samba-processes, e.g. by turning /var/lib/samba/private/krb5.conf into an UCR template (that's used by samba_dnsupdate) where we don't set 127.0.0.1 as kdc (This proposal contradicts Bug 34908) and set KRB5_CONFIG to this file in /etc/init.d/samba-ad-dc.



Detailed explanation:
====================
When a DC (call him "A" here) replaces his keys (server-password-change) the "other" DCs still hold onto their tickets, which stil refer to the previous Kerberos key version number (kvno). To cope with this situation, Samba4 keeps the Keys with the previous kvno in /etc/krb5.keytab on DC "A". This way he still accepts the "previous" Kerberos Service Tickets presented by the "other" DCs and replication continues to work. Especially the updated kerberos keys and unicodePwd (+ attribute version == kvno) are distributed to all "other" DCs.

What could possibly go wrong?

When DC "B" is offline for longer than two password changes of DC "A", then Service Tickets derived from his own local Kerberos-Samba4-Database will not be accepted any longer by DC "A", because he doesn't find the tickets kvno in his local /etc/krb5.keytab. DC "B" cannot fetch any changes from DC "A" any longer. A special variation of this is documented on Bug 35560. The key issue here is that DC "B" asks his own local KDC for tickets, which pulls the keys from the local Samba4 backend (Bug 29291).
Comment 6 Tim Petersen univentionstaff 2015-06-04 15:36:48 CEST
Reprted again at 2015060321000363
Comment 7 Arvid Requate univentionstaff 2015-06-04 16:19:20 CEST
As discussed, we should make samba use the DNS SRV records. That way we get the "self healing" effect from the round robin mechanism.


Simply reverting the change of Bug 29291 certainly is one but possibly not be the best option:

(1) In case we have a Samba4 DC in the SRV-records which doesn't exist, this might slow down (or result in temporary failures?) for Kerberos-authentication of local clients (users and processes) on other DCs.

(2) Likewise, in case we have a Samba4 DC in the SRC-records which has a large clock skew, then local clients (users and processes) would experience occasional authentication errors.


The ideal solution would be, if the Samba "drepl" process could be configured to use the SRV-records, while all other parts use 127.0.0.1. But I don't see any standard way to achieve this in Samba. I think the second best option is the proposal of Comment 5.
Comment 8 Arvid Requate univentionstaff 2015-12-16 19:09:32 CET
*** Bug 40260 has been marked as a duplicate of this bug. ***
Comment 9 Arvid Requate univentionstaff 2017-04-24 13:07:08 CEST
*** Bug 35560 has been marked as a duplicate of this bug. ***
Comment 10 Arvid Requate univentionstaff 2020-12-03 19:22:45 CET
For Ticket #2020113021000561 I had this idea, which worked quite nicely:

On the Master the password haad been rotated and it had KVNO 126,
but the Backup still had KVNO 122 (for the master account).

Running the following command on the DC Backup solved the issue:

samba-tool drs replicate --local "dummy" "$IP_of_the_Master" "$(ucr get samba4/ldap/base)"

It seemed to be important to use the IP and not the FQDN, to avoid Kerberos.

This gave me the idea that we could automate this by means of a listener module.
Comment 11 Felix Botner univentionstaff 2020-12-03 20:22:39 CET
(In reply to Arvid Requate from comment #10)
> For Ticket #2020113021000561 I had this idea, which worked quite nicely:
> 
> On the Master the password haad been rotated and it had KVNO 126,
> but the Backup still had KVNO 122 (for the master account).
> 
> Running the following command on the DC Backup solved the issue:
> 
> samba-tool drs replicate --local "dummy" "$IP_of_the_Master" "$(ucr get
> samba4/ldap/base)"
> 
> It seemed to be important to use the IP and not the FQDN, to avoid Kerberos.
> 
> This gave me the idea that we could automate this by means of a listener
> module.

Cool, so if the listener module detects a password change for the master (or any other DC?), we could simple call samba-tool drs replicate ... ?