Bug 38234 - Takeover does not work when msDS-ReplicationEpoch is set
Takeover does not work when msDS-ReplicationEpoch is set
Status: RESOLVED WONTFIX
Product: UCS
Classification: Unclassified
Component: AD Takeover
UCS 4.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: Samba maintainers
https://bugzilla.samba.org/show_bug.c...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-04-13 09:25 CEST by Janis Meybohm
Modified: 2021-06-15 19:20 CEST (History)
5 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 4: Minor Usability: Impairs usability in secondary scenarios
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.069
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2015041021000201, 2021060821000388
Bug group (optional): Workaround is available
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Janis Meybohm univentionstaff 2015-04-13 09:25:52 CEST
Ticket: 2015041021000201

Takeover of a 2003 AD Domain failed with the following traceback:


2015-04-10 16:04:03,404 Pre-loading the Samba 4 and AD schema
2015-04-10 16:04:03,541 A Kerberos configuration suitable for Samba 4 has been generated at /var/lib/samba/private/krb5.conf
2015-04-10 16:04:03,792 ERROR(runtime): uncaught exception - (8593, 'WERR_DS_DIFFERENT_REPL_EPOCHS')
2015-04-10 16:04:03,792   File "/usr/lib/python2.7/dist-packages/samba/netcmd/__init__.py", line 175, in _run
2015-04-10 16:04:03,792     return self.run(*args, **kwargs)
2015-04-10 16:04:03,792   File "/usr/lib/python2.7/dist-packages/samba/netcmd/domain.py", line 620, in run
2015-04-10 16:04:03,793     keep_existing=keep_existing)
2015-04-10 16:04:03,793   File "/usr/lib/python2.7/dist-packages/samba/join.py", line 1190, in join_DC
2015-04-10 16:04:03,793     ctx.do_join()
2015-04-10 16:04:03,793   File "/usr/lib/python2.7/dist-packages/samba/join.py", line 1095, in do_join
2015-04-10 16:04:03,793     ctx.join_replicate()
2015-04-10 16:04:03,793   File "/usr/lib/python2.7/dist-packages/samba/join.py", line 818, in join_replicate
2015-04-10 16:04:03,793     replica_flags=ctx.replica_flags)
2015-04-10 16:04:03,794   File "/usr/lib/python2.7/dist-packages/samba/drs_utils.py", line 252, in replicate
2015-04-10 16:04:03,810     (level, ctr) = self.drs.DsGetNCChanges(self.drs_handle, req_level, req)
2015-04-10 16:04:03,814 checking sAMAccountName


The NTDS-Settings object of one of the AD DCs had msDS-ReplicationEpoch set to 1. I assume this was set due to a domain name change in the past.
The affected DC was downgraded to memberserver (dcpromo) but that only led to msDS-ReplicationEpoch=1 being set at another DCs NTDS-Settings.

We then decided to remove the attribute from AD (although this is rated "catastrophic") and the takeover traceback was gone. I don't have details about the overall state of the domain no so I won't recommend this as workaround for now.

Please see the following links for details:

https://technet.microsoft.com/de-de/library/aa996670%28v=exchg.80%29.aspx?f=255&MSPPError=-2147217396
https://bugzilla.samba.org/show_bug.cgi?id=9500
Comment 1 Arvid Requate univentionstaff 2015-04-13 13:20:29 CEST
According to Microsoft doc "How Domain Rename Works" a non-zero value indicates that the domain has been renamed at some point. In that case the incremented replication epoch takes care to aboid replication with not-yet-renamed DCs. Quoting:

=========================================================================
[...] If two DCs have different msDS-ReplicationEpoch values, no directory replication RPC interaction is allowed between them. In addition to replication, nested group membership evaluation and global catalog lookups are also discontinued. [...]. The goal of the msDS-ReplicationEpochattribute is to minimize potentially complex interactions, including replication, between DCs that have completed the domain rename and those DCs that have not yet completed the domain rename. 
=========================================================================

So I guess your workaround is fine.
Comment 2 Janis Meybohm univentionstaff 2015-04-13 15:34:32 CEST
(In reply to Arvid Requate from comment #1)
That does not exactly match what I have seen in the customers environment. In fact he had 4 AD DCs of which one has msDS-ReplicationEpoch=1 but the rename has happened years ago and replication between all DCs seemed normal.
Comment 3 Nico Stöckigt univentionstaff 2016-12-01 16:33:26 CET
I found this behavior in an environment with the following specifications:
- Windows 2k8 AD-Master
- UCS 4.1-4 Membermode (syncmode: read)
  with installed Samba4 (role: DC) [probably installed afterwards]

Replication from MS-AD  >  OpenLDAP  >  Samba-AD works just fine, but:
- univention-connector-list-rejected shows >500 'AD rejected' (in sync read?!)
- connector.log show for nearly each of this rejects 'RuntimeError: (8593, 'WERR_DS_DIFFERENT_REPL_EPOCHS')

for more info see ticket#...274
Comment 4 Nico Stöckigt univentionstaff 2016-12-01 17:00:25 CET
Comment 3 has been split of as Bug 43093, because that's about Member-Mode.
Comment 5 Stefan Gohmann univentionstaff 2019-01-03 07:16:28 CET
This issue has been filled against UCS 4.0. The maintenance with bug and security fixes for UCS 4.0 has ended on 31st of May 2016.

Customers still on UCS 4.0 are encouraged to update to UCS 4.3. Please contact
your partner or Univention for any questions.

If this issue still occurs in newer UCS versions, please use "Clone this bug" or simply reopen the issue. In this case please provide detailed information on how this issue is affecting you.
Comment 6 Dirk Schnick univentionstaff 2021-06-15 16:04:14 CEST
I reopened the bug as discussed in dev consultation meeting, as this problem is a showstopper for our partner, see attached ticket.
I also removed the bug group Workaround is available, as the existing workaround is only possible if the windows domain will be taken over. In this case the domain should be permanently connected to the UCS domain via connector.
Comment 7 Dirk Schnick univentionstaff 2021-06-15 17:11:26 CEST
Set back old status and cloned the bug as the topic and the component did not match here.