Bug 40434 - UCS@school Samba/AD DC Slave join fails at 99ucs-school-umc-printermoderation
UCS@school Samba/AD DC Slave join fails at 99ucs-school-umc-printermoderation
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Samba4
UCS 4.1
Other Linux
: P5 critical (vote)
: UCS 4.1-0-errata
Assigned To: Arvid Requate
Stefan Gohmann
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-14 19:46 CET by Arvid Requate
Modified: 2016-02-04 13:59 CET (History)
3 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
join.log (184.38 KB, text/x-log)
2016-01-14 19:46 CET, Arvid Requate
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2016-01-14 19:46:14 CET
Created attachment 7408 [details]
join.log

See attached join.log. The problem is that the slave cannot authenticate with his host-account against Samba/AD LDAP:

Really mysterious at this point. Some things work, others not:

root@slave12:~# smbclient //$(hostname -f)/sysvol \
                -Uslave12$%"$(cat /etc/machine.secret)" \
                -c showconnect && echo OK
Domain=[AR41I1] OS=[Windows 6.1] Server=[Samba 4.3.3-Debian]
//slave12.ar41i1.qa/sysvol
OK


root@slave12:~# kinit --password-file=/etc/machine.secret 'slave12$' && echo OK
OK
root@slave12:~# smbclient -k //$(hostname -f)/sysvol 
session setup failed: NT_STATUS_LOGON_FAILURE



root@slave12:~# ldbsearch -H "ldaps://$(hostname -f)" \
                         -Uslave12$%"$(cat /etc/machine.secret)"
Failed to bind - LDAP error 49 LDAP_INVALID_CREDENTIALS -  <SASL:[GSS-SPNEGO]: NT_STATUS_LOGON_FAILURE> <>


log.samba shows:
=====================================================================
[2015/11/23 18:49:38.478847,  3, pid=13006] ../source4/auth/kerberos/krb5_init_context.c:80(smb_
krb5_debug_wrapper)
  Kerberos: TGS-REQ SLAVE12$@AR41I1.QA from ipv4:127.0.0.1:45662 for ldap/slave12.ar41i1.qa@AR41
I1.QA [canonicalize]
[2015/11/23 18:49:38.484000,  3, pid=13006] ../source4/auth/kerberos/krb5_init_context.c:80(smb_
krb5_debug_wrapper)
  Kerberos: TGS-REQ authtime: 2015-11-23T18:49:38 starttime: 2015-11-23T18:49:38 endtime: 2015-1
1-24T04:49:38 renew till: unset
[2015/11/23 18:49:38.485106,  3, pid=13006] ../source4/auth/kerberos/krb5_init_context.c:80(smb_
krb5_debug_wrapper)
  Kerberos: TGS-REQ SLAVE12$@AR41I1.QA from ipv4:127.0.0.1:46119 for krbtgt/AR41I1.QA@AR41I1.QA 
[forwarded, forwardable]
[2015/11/23 18:49:38.487436,  3, pid=13006] ../source4/auth/kerberos/krb5_init_context.c:80(smb_
krb5_debug_wrapper)
  Kerberos: TGS-REQ authtime: 2015-11-23T18:49:38 starttime: 2015-11-23T18:49:38 endtime: 2015-1
1-24T04:49:38 renew till: unset
[2015/11/23 18:49:38.488728,  1, pid=13004] ../source4/auth/gensec/gensec_gssapi.c:619(gensec_gs
sapi_update)
  GSS server Update(krb5)(1) Update failed:  Miscellaneous failure (see text): Decrypt integrity check failed for checksum type hmac-sha1-96-aes256, key type aes256-cts-hmac-sha1-96
[2015/11/23 18:49:38.488754,  1, pid=13004] ../auth/gensec/spnego.c:523(gensec_spnego_parse_negTokenInit)
  SPNEGO(gssapi_krb5) NEG_TOKEN_INIT failed: NT_STATUS_LOGON_FAILURE
=====================================================================
Comment 1 Arvid Requate univentionstaff 2016-01-14 19:48:06 CET
root@slave12:~# kinit -t /etc/krb5.keytab 'SLAVE12$' && echo OK
OK

So secrets.ldb seems to be fine (i.e. in sync with machine.secret and sam.ldb) as well.
Comment 2 Arvid Requate univentionstaff 2016-01-14 19:49:16 CET
root@slave12:~# ucr search --brief version/version version/errata
version/erratalevel: 55
version/version: 4.1
Comment 3 Arvid Requate univentionstaff 2016-01-18 16:14:55 CET
This one works too:

root@slave12:~# ldbsearch -H "ldaps://$(hostname -f)" \
                          --simple-bind-dn="slave12\$@$(hostname -d)" \
                          --password="$(< /etc/machine.secret)"

and this one too:

root@slave12:~# ldbsearch --kerberos=no -H "ldaps://$(hostname -f)" \
                          -Uslave12$%"$(< /etc/machine.secret)"
Comment 4 Arvid Requate univentionstaff 2016-01-18 21:33:52 CET
Ok, it's that docker interface. This seems to fix everything:

root@slave12:~# ucr set samba/interfaces=eth0 samba/interfaces/bindonly=yes
root@slave12:~# service samba restart

No clue yet why this only happens on a UCS@school Slave but not on the Master.
Comment 5 Arvid Requate univentionstaff 2016-01-25 20:36:07 CET
Ok, a bit more about this:

1. This workaround works partially:

root@slave12:~# ucr set samba/interfaces=eth0

but the drawback is that samba doesn't listener any longer on localhost, and things like kinit break too. So, this is not an option.

This looks a bit like a change of behaviour of Samba 4.3.1 (UCS 4.1-0) as compared to Samba 4.2.3 (UCS 4.0-4).



2. This workaround doesn't work:

root@slave12:~# ucr set samba/interfaces='eth0 127.0.0.1'

In that case all the <SASL:[GSS-SPNEGO]: NT_STATUS_LOGON_FAILURE> stuff happens with univention-s4search against the FQDN. Search against localhost works though.


BUT: All of this trouble only happens on an UCS@school Slave PDC! Before installing UCS@school on the Samba/AD DC Slave all the univention-s4search variations work fine on UCS 4.1! So my impression is that we are still barking up the wrong tree here.
Comment 6 Arvid Requate univentionstaff 2016-01-25 21:10:46 CET
The issue still occurs when I effectively remove the docker0 interface on Master and Slave before installing UCS@school on the Samba AD DC Slave:

service docker stop
ip link set docker0 down; brctl delbr docker0
service samba restart
Comment 7 Stefan Gohmann univentionstaff 2016-01-26 12:56:17 CET
I'm able to reproduce it in Jenkins. The Jenkins setup has S4 installed on the slave previously.

It didn't help to remove the samba private directory and to re-join the slave.
Comment 8 Stefan Gohmann univentionstaff 2016-01-26 20:38:11 CET
The following works for me:

root@slave2032:~# ls -la /etc/krb5.keytab*
-rw------- 1 root nogroup 8021 Jan 26 05:53 /etc/krb5.keytab
-rw------- 1 root root    8021 Jan 26 05:53 /etc/krb5.keytab.SAVE
root@slave2032:~# rm /etc/krb5.keytab
root@slave2032:~# /usr/share/univention-samba4/scripts/create-keytab.sh 
Modified 1 records successfully
Modified 1 records successfully
root@slave2032:~# ls -la /etc/krb5.keytab*
-rw------- 1 root root 2222 Jan 26 14:34 /etc/krb5.keytab
-rw------- 1 root root 8021 Jan 26 05:53 /etc/krb5.keytab.SAVE
root@slave2032:~# ldbsearch -H "ldaps://$(hostname -f)"                          -U"$(hostname)$%$(cat /etc/machine.secret)" -s base | grep ^dn
dn: DC=autotest203,DC=local
root@slave2032:~#

It looks like the keytab includes several keys. Should the keytab be removed if the server is rejoined?
Comment 9 Stefan Gohmann univentionstaff 2016-01-26 21:04:05 CET
(In reply to Arvid Requate from comment #0)
> root@slave12:~# kinit --password-file=/etc/machine.secret 'slave12$' && echo
> OK
> OK

[...]

> root@slave12:~# ldbsearch -H "ldaps://$(hostname -f)" \
>                          -Uslave12$%"$(cat /etc/machine.secret)"
> Failed to bind - LDAP error 49 LDAP_INVALID_CREDENTIALS - 
> <SASL:[GSS-SPNEGO]: NT_STATUS_LOGON_FAILURE> <>

I've added these both commands to ucs-test:
 00_checks/21_kinit_hostaccount
 00_checks/22_ldbsearch_hostaccount
Comment 10 Arvid Requate univentionstaff 2016-01-27 19:39:24 CET
I adjusted create-keytab.sh to work around changed Samba 4.3.x behaviour (maybe internal heimdal), that causes duplicate hashes when password is changed in secrets.ldb but the same KVNO is used.

This seems to have fixed the issue.

Additionally I merged the patches from Bug 39601. This may help avoid IPs from the Docker 172.17.0.0/16 address range getting registered in DNS automatically during join (not via samba_dnsupdate, that's a separate issue).

Advisory: univention-samba4.yaml
Comment 11 Arvid Requate univentionstaff 2016-01-27 20:51:50 CET
Note: the patches from Bug 39601 might help address Bug 40374 but there is no hard proof for that connection yet.
Comment 12 Stefan Gohmann univentionstaff 2016-02-01 08:09:31 CET
I'll move the bug to UCS.
Comment 13 Stefan Gohmann univentionstaff 2016-02-01 22:17:20 CET
Code review: OK

Tests: OK 

YAML: OK
Comment 14 Janek Walkenhorst univentionstaff 2016-02-04 13:59:11 CET
<http://errata.software-univention.de/ucs/4.1/95.html>