Univention Bugzilla – Bug 40260
Potential Samba/AD DRS replication deadlock
Last modified: 2015-12-16 19:09:32 CET
Since fixing bug 29291 Samba/AD DCs always use the local KDC. In some support sessions it was observed that this may cause Samba/AD DRS replication to fail permanently after one of the Samba/AD DCs rotated his password twice a) during the life time of the Service Tickets used by other Samba/AD DCs or b) while at least one of the other Samba/AD DCs was not connected (off or whatever) This is how things are supposed to work: When DC A changes his password, his Kerberos keys change too. After that DC B still continues to connect with the Kerberos service ticket he obtained before, which contains data (the session key) encrypted by the KDC with the old Kerberos key hashes of DC A. To make key transitions like these work seamlessly for the Kerberos clients, Kerberos uses the Key version number and retains the last set of old Kerberos keys in keytab of the service (in this case /etc/krb5.keytab). So, DC A can stil identify and use the previous Kerberos keys to decrypt the Service Ticket and authentication succeeds and replication continues to work. Everybody is happy. The important point here is that the local server only keeps the last set of old Kerberos keys (i.e. the previous), not an indefinite history of outdated Kerberos Keys (that's for security reasons, obviously). Now, in case DC A changes his password *twice* during the lifetime of a service ticket, then DC B gets an authentication error, because DC A cannot open the relevant part of the Service Ticket any longer. In that case, he cannot replicate any longer. Fixing bug 29291 made this worse: Now, he cannot replicate any longer and he never asks any other KDC but himself, so he has no chance to learn the new Kerberos keys and cannot get a valid Service Ticket for DC A and finally replicate, ever. No other DC receives changes from DC A any longer. Before fixing bug 29291, DC B at least had a statistical chance to contact a different KDC found in the DNS SRV record, possibly an up to date one, which would get him a fresh pair of keys and replication from DC A could recover. The implications of this situation are bad enough IMHO to open this bug. The motivation for fixing bug 29291 was "ok", but maybe taking out the statistical element was not a good idea. I guess we need a new idea here. If nothing better comes up we may have to revert that change.
*** This bug has been marked as a duplicate of bug 37358 ***