Bug 43786 - Clients loose their trust-relationship to the domain
Clients loose their trust-relationship to the domain
Status: RESOLVED WONTFIX
Product: UCS
Classification: Unclassified
Component: Samba4
UCS 4.1
Other Linux
: P5 normal (vote)
: ---
Assigned To: Samba maintainers
Samba maintainers
https://blogs.technet.microsoft.com/a...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-03-10 14:59 CET by Christina Scheinig
Modified: 2019-01-03 07:22 CET (History)
6 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.171
Enterprise Customer affected?: Yes
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2017030321000301, 2017030121000332, 2018050821000341, 2018060121000601
Bug group (optional):
Max CVSS v3 score:


Attachments
collect_windowsclient_info.sh (4.96 KB, application/x-shellscript)
2017-06-06 13:54 CEST, Arvid Requate
Details
log_client_communication.sh (3.45 KB, application/x-shellscript)
2017-06-06 13:58 CEST, Arvid Requate
Details
Logfile of DataCollector for Client bv-pgf-02 (19.06 KB, application/pgp-encrypted)
2018-06-01 13:26 CEST, edv
Details
Logfile of TCPDump for Client bv-pgf-02 (6.08 KB, application/pgp-encrypted)
2018-06-01 13:27 CEST, edv
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christina Scheinig univentionstaff 2017-03-10 14:59:31 CET
Problem:
With UCS version 4.1-4 errata 360 (between 355 and 366) clients loose their trust to the domain every 60 days. 
The clients were rolled out via Acronis TrueImage. Acronis allocates the same SIDs for the machines.
--------------------------------------------------------------------------------
 	
This update fixes the following issues:
* Overflow in Samba NDR parsing function ndr_pull_dnsp_name causes
  vulnerability to remote code execution (CVE-2016-2123).
* Unconditional privilege delegation to Kerberos servers in trusted realms
  (CVE-2016-2125).
* Flaws in Kerberos PAC validation can trigger privilege elevation
  (CVE-2016-2126).
* Samba has been updated to version 4.5.3. The Debian package version
  doesn't reflect this and stays at 2:4.5.1-1.849.
* Rejoining a DC Backup or DN Slave failed in UCS 4.1-4 because samba-tool
  domain join didn't support the option --keep existing any longer.

Bug#43132
Bug#43144
Bug#43176
--------------------------------------------------------------------------------

With ACMP (Aagon) this does not happen
Comment 1 Michel Smidt 2017-04-25 21:52:30 CEST
Seems like we had the same issue today.
Workaround for now:
1. Deactivate "Domain member: Maximum machine account password age" GPO for now.
2. Rejoin affected systems.
Comment 2 Stefan Gohmann univentionstaff 2017-04-26 06:03:49 CEST
(In reply to Michel Smidt from comment #1)
> Seems like we had the same issue today.
> Workaround for now:
> 1. Deactivate "Domain member: Maximum machine account password age" GPO for
> now.
> 2. Rejoin affected systems.

How was the windows system installed and which UCS version is used?
Comment 3 Michel Smidt 2017-04-26 09:30:31 CEST
(In reply to Stefan Gohmann from comment #2)
> (In reply to Michel Smidt from comment #1)
> > Seems like we had the same issue today.
> > Workaround for now:
> > 1. Deactivate "Domain member: Maximum machine account password age" GPO for
> > now.
> > 2. Rejoin affected systems.
> 
> How was the windows system installed and which UCS version is used?

The systems were Windows 7 systems which were distributed via Acronis.
Affected systems were "a few" in a computer room, "all" in a science class.
The terminalserver wasn't affected.
Furthermore a virtualized client which was installed by hand (iso) was affected as well.
The problem was noticed after the easter holiday.
The password rotation before was set to 30 days.
The resulting time frame (30 + 2 weeks) fell into the roll out phase of the school.

UCS versions:
Master - 4.1-4 errata 408
School-Slave - 4.1-4 errata 408
Comment 4 Arvid Requate univentionstaff 2017-05-08 22:58:22 CEST
I couldn't find any "hard facts" about our cases yet, such as

* Exact windows client error messages and eventlog entries.

* univention-s4search objectsid=$SID_OF_MYCLIENT

* Complete machine account objects (OpenLDAP and Samba/AD), as obtainable via

  ldbsearch -H ldapi:///var/lib/samba/private/ldap_priv/ldapi \
  objectsid=$SID_OF_MYCLIENT \
  '*' supplementalcredentials unicodepwd replPropertyMetaData \
  ntsecuritydescriptor msds-keyversionnumber \
  --show-binary

In case anything like this happens again, we should definitely collect this data before working around the issue by setting DisablePasswordChange.


So we can only assess rather vague evidence currently:

* There are reports of failed "unattended" Windows installations in MS forums where the client got renamed during the process, which could break things. But this does not explain, why it worked until the machine password was rotated. So I would not bet on this.

* On the other hand the Samba-advisory for CVE-2016-2126 mentioned above explicitly talks about a winbindd security issue when changing his own machine password. Note that they are not talking about client password changes here, but still it has a certain smell (e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1403115 ). Then Bug 43850 Comment 1 comes to my mind, where we found that Kerberos ticket issued by Samba 4.6.1 was rejected by bind9 during DDNS update. My gut feeling is that the security fix for CVE-2016-2126 for Samba 4.5.1 and that Kerberos issue in 4.6.1 are connected and are about the des-cbc-crc keytype: Bug 43850 said "Checksum type 1 not keyed" and the Advisory for CVE-2016-2126 talks about unkeyed checksums. If this very vague line of reasoning should have a grain of truth, then I could imagine that Windows clients could also experience strange issues with their Kerberos Keys. If we have another case of this bug, it may be worth to try to use the re-ordered key priorities from a UCS 4.2 krb5.conf to check if the client starts to work again. And if not, we should experiment with "netdom resetpwd /server:DC_NAME /userd:USERNAME / password:PASSWORD". The only question would be: Why would this be connected to a client password change?
Comment 5 Arvid Requate univentionstaff 2017-05-08 23:05:11 CEST
Another interesting thing here is the time interval of 60 days mentioned in the original bug report above: "clients loose their trust to the domain every 60 days."

According to https://blogs.technet.microsoft.com/askds/2009/02/15/machine-account-password-process-2/ MS-Clients change their password every 30 days. So the issue could be triggered by the *second* password change. Incidentally Samba/AD stores the current and also the previous Kerberos hashes. After two rotations the original set of Kerberos hashes, as obtained during initial domain join would be dropped.
Comment 6 Arvid Requate univentionstaff 2017-05-08 23:06:02 CEST
This seems to be the authoritative MS advice about this topic: https://social.technet.microsoft.com/wiki/contents/articles/9157.troubleshooting-ad-trust-relationship-between-workstation-and-primary-domain-failed.aspx

It names three common causes:

1. SID has been assigned to multiple computers.

2. "If there are problems with system time, DNS configuration or other settings, secure channel’s password between Workstation and DCs may not [work]."

3. No SPN or DNSHost Name mentioned in the computer account attributes.



I'll go again through the tickets to check if I can find any of the error messages documented as symptomatic for case 2 here:

  https://blogs.technet.microsoft.com/asiasupp/2007/01/17/typical-symptoms-when-secure-channel-is-broken/
Comment 7 Arvid Requate univentionstaff 2017-06-06 13:54:52 CEST
Created attachment 8900 [details]
collect_windowsclient_info.sh

Since the issue is not reproducible until now (Last attempt: Windows 7 Clients rolled out with OPSI as in Ticket 2017030121000332, joined to UCS@school Slave), we'll have to collect more information if this happens again.

The attached script should help collect server side information about the affected windows client:

./collect_windowsclient_info.sh

In the end it encrypts the collected log file with the GPG support key.
Comment 8 Arvid Requate univentionstaff 2017-06-06 13:58:01 CEST
Created attachment 8901 [details]
log_client_communication.sh

This second script should help capture network traffic between the Samba DC and the affected windows client:

./log_client_communication.sh <short-client-hostname>

It needs to be run on the logon server of the client.

In the end it encrypts the archive file with the GPG support key.
Comment 9 Nico Stöckigt univentionstaff 2018-05-08 14:04:41 CEST
Again a customer reported issues related to this bug.
UCS 4.1-5 e502, ca. 750 Win7 Clients

He also mentioned 
 http://implbits.com/active-directory/2012/04/13/dont-rejoin-to-fix.html
as a partly successful workaround.
Comment 10 edv 2018-06-01 13:26:37 CEST
Created attachment 9545 [details]
Logfile of DataCollector for Client bv-pgf-02
Comment 11 edv 2018-06-01 13:27:10 CEST
Created attachment 9546 [details]
Logfile of TCPDump for Client bv-pgf-02
Comment 12 edv 2018-06-01 13:36:01 CEST
I run into the same problem while upgrading the UCS System, at the moment I'm at 4.1-3.
We have only one UCS system which acts as PDC with Samba4.
Currently around 5 Clients fall daily in this situation (the attached logfile is one of them). The tcpdump is form a Client with Win7 and it is at Prelogin State - I tried to Login, bumped into trust error and then I ended the capture.

As there are only a few people reporting to this I believe when I upgrade to 4.2/4.3 the problem maybe solves itself. 
One thing makes me unsure to that is that univention-s4search (as also seen in tcpdump) is throwing an error. 
I can run telnet ucs 636 and telnet ucs 7636 getting a connect. Quite unsure if it is related to this and what to do next (update to 4.2/4.3 vs. fixing).
Comment 13 Stefan Gohmann univentionstaff 2019-01-03 07:22:32 CET
This issue has been filled against UCS 4.1. The maintenance with bug and security fixes for UCS 4.1 has ended on 5st of April 2018.

Customers still on UCS 4.1 are encouraged to update to UCS 4.3. Please contact
your partner or Univention for any questions.

If this issue still occurs in newer UCS versions, please use "Clone this bug" or simply reopen the issue. In this case please provide detailed information on how this issue is affecting you.