Bug 40259 - Local LDAP bind stops working during UCS@school rejoin
Local LDAP bind stops working during UCS@school rejoin
Status: CLOSED WORKSFORME
Product: UCS@school
Classification: Unclassified
Component: General
UCS@school 4.1
Other Linux
: P5 normal with 2 votes (vote)
: UCS@school 4.1 R2 v10
Assigned To: Daniel Tröder
Sönke Schwardt-Krummrich
:
: 42814 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-15 19:07 CET by Michael Grandjean
Modified: 2017-03-24 16:17 CET (History)
5 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 4: Minor Usability: Impairs usability in secondary scenarios
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.137
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
three retries of remote univention-server-join (1.55 KB, patch)
2016-11-17 15:16 CET, Daniel Tröder
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Grandjean univentionstaff 2015-12-15 19:07:36 CET
In an UCS@school workshop, 2 out of 5 UCS@school Slave installations failed during the re-join of the UCS@school configuration wizard. 
The problem was that the authentication against the local slapd stopped working. The process was still running, but a simple univention-ldapsearch was not possible anymore (ldap_bind: Invalid credentials (49)).

Workaround: After manually restarting the slapd service, authentication against the LDAP was possible again and the pending join scripts could be executed successfully.

I will collect the join.log of at least one of the systems tomorrow.
Comment 1 Florian Best univentionstaff 2015-12-16 08:32:10 CET
I had the same problem twice. I opened Bug #40247, Bug #40248, Bug #40249.
Comment 2 Sönke Schwardt-Krummrich univentionstaff 2016-07-07 22:08:20 CEST
Any hints on how to reproduce this issue? Minimum number of UCS@school slaves? Or does it also happen with the first school slave?
Comment 3 Florian Best univentionstaff 2016-07-08 10:26:59 CEST
(In reply to Sönke Schwardt-Krummrich from comment #2)
> Any hints on how to reproduce this issue? Minimum number of UCS@school
> slaves? Or does it also happen with the first school slave?
Yes, with the first slave.
Comment 5 Florian Best univentionstaff 2016-07-19 14:18:59 CEST
03univention-directory-listener.inst contains the following errors:

 113 18.07.16 19:45:44.526  LISTENER    ( ERROR   ) : replication: Object class violation; dn="uid=Guest,cn=users,dc=autotest201,dc=local": object class violation while adding
 114 18.07.16 19:45:44.526  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 115 18.07.16 19:45:44.544  LISTENER    ( ERROR   ) : replication: Object class violation; dn="uid=krbtgt,cn=users,dc=autotest201,dc=local": object class violation while adding
 116 18.07.16 19:45:44.544  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 117 18.07.16 19:45:44.653  LISTENER    ( ERROR   ) : replication: Object class violation; dn="uid=join-slave,cn=users,dc=autotest201,dc=local": object class violation while adding
 118 18.07.16 19:45:44.653  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 119 18.07.16 19:45:44.670  LISTENER    ( ERROR   ) : replication: Object class violation; dn="uid=join-backup,cn=users,dc=autotest201,dc=local": object class violation while adding
 120 18.07.16 19:45:44.670  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 121 18.07.16 19:45:44.694  LISTENER    ( ERROR   ) : replication: Object class violation; dn="uid=sys-idp-user,cn=users,dc=autotest201,dc=local": object class violation while adding
 122 18.07.16 19:45:44.694  LISTENER    ( ERROR   ) : »  additional info: object class 'simpleSecurityObject' requires attribute 'userPassword'
 123 18.07.16 19:45:44.715  LISTENER    ( ERROR   ) : replication: Object class violation; dn="uid=Administrator,cn=users,dc=autotest201,dc=local": object class violation while adding
 124 18.07.16 19:45:44.715  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 125 18.07.16 19:45:44.719  LISTENER    ( ERROR   ) : replication: Object class violation; dn="uid=dns-master201,cn=users,dc=autotest201,dc=local": object class violation while adding
 126 18.07.16 19:45:44.719  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 127 18.07.16 19:45:44.922  LISTENER    ( ERROR   ) : replication: Object class violation; dn="cn=master201,cn=dc,cn=computers,dc=autotest201,dc=local": object class violation while adding
 128 18.07.16 19:45:44.922  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 129 18.07.16 19:45:44.949  LISTENER    ( ERROR   ) : replication: Object class violation; dn="uid=http-proxy-master201,cn=users,dc=autotest201,dc=local": object class violation while adding
 130 18.07.16 19:45:44.949  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 131 18.07.16 19:45:45.363  LISTENER    ( ERROR   ) : replication: Object class violation; dn="cn=slave202,cn=dc,cn=server,cn=computers,ou=School1,dc=autotest201,dc=local": object class violation while adding
 132 18.07.16 19:45:45.364  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 133 18.07.16 19:45:45.445  LISTENER    ( ERROR   ) : replication: Object class violation; dn="krb5PrincipalName=default@AUTOTEST201.LOCAL,cn=kerberos,dc=autotest201,dc=local": object class violation while adding
 134 18.07.16 19:45:45.445  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 135 18.07.16 19:45:45.514  LISTENER    ( ERROR   ) : replication: Object class violation; dn="krb5PrincipalName=kadmin/admin@AUTOTEST201.LOCAL,cn=kerberos,dc=autotest201,dc=local": object class violation while adding
 136 18.07.16 19:45:45.514  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 137 18.07.16 19:45:45.516  LISTENER    ( ERROR   ) : replication: Object class violation; dn="krb5PrincipalName=kadmin/hprop@AUTOTEST201.LOCAL,cn=kerberos,dc=autotest201,dc=local": object class violation while adding
 138 18.07.16 19:45:45.516  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 139 18.07.16 19:45:45.550  LISTENER    ( ERROR   ) : replication: Object class violation; dn="krb5PrincipalName=kadmin/changepw@AUTOTEST201.LOCAL,cn=kerberos,dc=autotest201,dc=local": object class violation while adding
 140 18.07.16 19:45:45.550  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 141 18.07.16 19:45:45.558  LISTENER    ( ERROR   ) : replication: Object class violation; dn="krb5PrincipalName=changepw/kerberos@AUTOTEST201.LOCAL,cn=kerberos,dc=autotest201,dc=local": object class violation while adding
 142 18.07.16 19:45:45.558  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 143 18.07.16 19:45:45.586  LISTENER    ( ERROR   ) : replication: Object class violation; dn="krb5PrincipalName=WELLKNOWN/ANONYMOUS@AUTOTEST201.LOCAL,cn=kerberos,dc=autotest201,dc=local": object class violation while adding
 144 18.07.16 19:45:45.586  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 145 18.07.16 19:45:45.611  LISTENER    ( ERROR   ) : replication: Object class violation; dn="krb5PrincipalName=krbtgt/AUTOTEST201.LOCAL@AUTOTEST201.LOCAL,cn=kerberos,dc=autotest201,dc=local": object class violation while adding
 146 18.07.16 19:45:45.611  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 147 18.07.16 19:45:45.632  LISTENER    ( ERROR   ) : replication: Object class violation; dn="krb5PrincipalName=WELLKNOWN/org.h5l.fast-cookie@WELLKNOWN:ORG.H5L,cn=kerberos,dc=autotest201,dc=local": object class violation while adding
 148 18.07.16 19:45:45.632  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
 149 18.07.16 19:45:45.644  LISTENER    ( ERROR   ) : replication: Object class violation; dn="krb5PrincipalName=ldap/master201.autotest201.local@AUTOTEST201.LOCAL,cn=kerberos,dc=autotest201,dc=local": object class violation while adding
 150 18.07.16 19:45:45.644  LISTENER    ( ERROR   ) : »  additional info: object class 'krb5KDCEntry' requires attribute 'krb5KeyVersionNumber'
Comment 6 Florian Best univentionstaff 2016-07-20 12:57:16 CEST
I had a further look into the logfiles of ucs-test this night and made the following observations:

The join.log contains the following error:
> Configure 01univention-ldap-server-init.inst Tue Jul 19 18:34:24 EDT 2016
> 
> Starting ldap server(s): slapd ...failed.
> 578eaaf1 /etc/ldap/slapd.conf: line 110: unknown attr "@univentionVirtualMachine" in to clause 578eaaf1 … slapsshema: bad configuration file!.
> invoke-rc.d: initscript slapd, action "start" failed.
→ This is because the schema is not known yet on the DC Slave. This should be okay. Bug #41782.

The listener.log contains the following error:
19.07.16 18:43:50.897  LISTENER    ( ERROR   ) : ldap_extension: slapd.conf validation failed:
578ead26 /etc/ldap/slapd.conf: line 250: unknown attr "@msPrintConnectionPolicy" in to clause

This might not be okay? There is nothing done at 18:43 on the DC Slave system.

In the join.log we can see the 2 joins:

Tue Jul 19 18:38:05 EDT 2016: finish /usr/share/univention-join/univention-join
Tue Jul 19 18:53:48 EDT 2016: starting /usr/sbin/univention-join -dcaccount Administrator -dcpwd /tmp/tmpGpy01n

→ The first is the regular join, the second is the rejoin by the UCS@school installer. Between those 2 joins there is a reboot.

The second join/school-installer failed immediately:
19.07.16 18:54:23.507  MODULE      ( PROCESS ) : **************************************************************************
19.07.16 18:54:23.507  MODULE      ( PROCESS ) : * Join failed!                                                           *
19.07.16 18:54:23.507  MODULE      ( PROCESS ) : * Contact your system administrator                                      *
19.07.16 18:54:23.507  MODULE      ( PROCESS ) : **************************************************************************
19.07.16 18:54:23.507  MODULE      ( PROCESS ) : * Message:  FAILED: 00ucs-school-slave-check-ou.inst
19.07.16 18:54:23.507  MODULE      ( WARN    ) : Die Softwarepakete wurden erfolgreich installiert, jedoch konnte der Domänenbeitritt nicht abgeschlossen werden: FAILED: 00ucs-school-slave-check-ou.inst. Mehr Hinweise können in der Log-Datei /var/log/univention/join.log gefunden werden. Nach Beheben der entsprechenden Konflikte kann der Domänenbeitritt über das UMC-Modul "Domänenbeitritt" abgeschlossen werden.
19.07.16 18:54:23.507  MODULE      ( PROCESS ) : **************************************************************************

At the same time of the error shown in the listener.log at 18:43 there runs the school-installer on the DC Master.
Is that error the cause? Every schema/acl extension added by ucsRegisterLDAPSchema is checked for validity and reverted if adding it fails.
Comment 7 Florian Best univentionstaff 2016-07-21 12:53:35 CEST
All current UCS@school jenkins tests have been fixed by svn r71119 which checks if the schema for specific attributes/objectclasses exists when writing the ACL's. Maybe this Bug can then be resolved as duplicate of Bug #41725 (not sure though).
Comment 8 Sönke Schwardt-Krummrich univentionstaff 2016-09-21 15:00:56 CEST
Does this problem still happen in our jenkins environment?
Comment 9 Florian Best univentionstaff 2016-09-26 16:01:33 CEST
(In reply to Sönke Schwardt-Krummrich from comment #8)
> Does this problem still happen in our jenkins environment?
No.

From the logfile in Bug #40249:
14.12.15 13:53:28.350  LISTENER    ( WARN    ) : ucsschool-user-logonscripts: Invalid credentials: failed to connect to LDAP server
14.12.15 13:53:28.350  LISTENER    ( ERROR   ) : ucsschool-user-logonscripts: Invalid credentials: giving up creating a new LDAP connection

A different reason is in Bug #42505.

(In reply to Michael Grandjean from comment #0)
> I will collect the join.log of at least one of the systems tomorrow.
@Michael: when is tomorrow? :D
Comment 10 Daniel Tröder univentionstaff 2016-09-29 10:54:34 CEST
It still fails regularly on jenkins 4.1r2 multiserver.

The slaves (10.210.201.10) 2nd join:

---------------------------- join.log ------------------------------------
Wed Sep 28 19:46:34 EDT 2016: starting /usr/sbin/univention-join -dcaccount Administrator -dcpwd /tmp/tmp7qoZYW
[..]
E: failed to modify DC Slave cn=slave2033,cn=dc,cn=server,cn=computers,ou=School2,dc=autotest203,dc=local [LDAP Error: Can't contact LDAP server]
Wed Sep 28 19:46:51 EDT 2016: finish /usr/sbin/univention-join
----------------------------------------------------------------------

At the same time on the master:

----------- management-console-web-server.log -------------------------------
28.09.16 19:46:23.528  MAIN        ( INFO    ) : CPAuth/auth: got new auth request (10.210.201.10:41293 <=> )

----------- management-console-server.log -------------------------------
28.09.16 19:46:26.528  MODULE      ( INFO    ) : Executing ['schoolwizards/schools/move_dc']

----------- auth.log -----------------------------
Sep 28 19:46:35 master203 sshd[18358]: Accepted keyboard-interactive/pam for Administrator from 10.210.201.10 port 50563 ssh2
Sep 28 19:46:35 master203 sshd[18358]: pam_unix(sshd:session): session opened for user Administrator by (uid=0)
Sep 28 19:46:35 master203 sshd[18368]: Received disconnect from 10.210.201.10: 11: disconnected by user
Sep 28 19:46:35 master203 sshd[18358]: pam_unix(sshd:session): session closed for user Administrator
Sep 28 19:46:35 master203 sshd[18370]: Accepted keyboard-interactive/pam for Administrator from 10.210.201.10 port 50582 ssh2
Sep 28 19:46:35 master203 sshd[18370]: pam_unix(sshd:session): session opened for user Administrator by (uid=0)
Sep 28 19:46:36 master203 sshd[18378]: Received disconnect from 10.210.201.10: 11: disconnected by user
Sep 28 19:46:36 master203 sshd[18370]: pam_unix(sshd:session): session closed for user Administrator
Sep 28 19:46:45 master203 sshd[18387]: Accepted keyboard-interactive/pam for Administrator from 10.210.201.10 port 50773 ssh2
Sep 28 19:46:45 master203 sshd[18387]: pam_unix(sshd:session): session opened for user Administrator by (uid=0)
Sep 28 19:46:45 master203 sshd[18394]: Received disconnect from 10.210.201.10: 11: disconnected by user
Sep 28 19:46:45 master203 sshd[18387]: pam_unix(sshd:session): session closed for user Administrator
Sep 28 19:46:48 master203 sshd[18396]: Accepted keyboard-interactive/pam for Administrator from 10.210.201.10 port 50840 ssh2
Sep 28 19:46:48 master203 sshd[18396]: pam_unix(sshd:session): session opened for user Administrator by (uid=0)
Sep 28 19:46:49 master203 sshd[18403]: Received disconnect from 10.210.201.10: 11: disconnected by user
Sep 28 19:46:49 master203 sshd[18396]: pam_unix(sshd:session): session closed for user Administrator
Sep 28 19:46:49 master203 sshd[18410]: Accepted keyboard-interactive/pam for Administrator from 10.210.201.10 port 50864 ssh2
Sep 28 19:46:49 master203 sshd[18410]: pam_unix(sshd:session): session opened for user Administrator by (uid=0)
Sep 28 19:46:51 master203 slapd[18118]: DIGEST-MD5 common mech free
Sep 28 19:46:51 master203 sshd[18421]: Received disconnect from 10.210.201.10: 11: disconnected by user
Sep 28 19:46:51 master203 sshd[18410]: pam_unix(sshd:session): session closed for user Administrator
Sep 28 19:46:51 master203 slapd[18491]: Loaded metadata from "/usr/share/univention-management-console/saml/idp/ucs-sso.autotest203.local.xml"

----------- syslog -----------------------------
Sep 28 19:46:51 master203 logger: /etc/init.d/slapd graceful-restart (pid: 18458, ppid:18352 univention-dire)
Sep 28 19:46:51 master203 logger: /etc/init.d/slapd graceful-stop (pid: 18466, ppid:18458 slapd)
Sep 28 19:46:51 master203 logger: /etc/init.d/slapd start (pid: 18479, ppid:18458 slapd)
Sep 28 19:46:51 master203 slapd[18491]: @(#) $OpenLDAP: slapd  (Nov  6 2015 08:03:15) $#012#011root@ladda:/var/build/temp/tmp.bfW1EHuZcn/pbuilder/openldap-2.4.42+dfsg/debian/build/servers/
----------------------------------------------------------------------

So the slave fails joining, because the masters slapd does a restart.
I couldn't figure out why it does that. Is it possible, that on the master installations/tests were running while the slaves were being setup?
Comment 11 Daniel Tröder univentionstaff 2016-11-02 10:53:30 CET
*** Bug 42814 has been marked as a duplicate of this bug. ***
Comment 12 Daniel Tröder univentionstaff 2016-11-17 15:16:04 CET
Created attachment 8233 [details]
three retries of remote univention-server-join

I was not able to reproduce this and I couldn't find recent cases in jenkins.

The attached patch tries three times to run univention-server-join on the master, in case it fails. This might fix a previously observed case. With the lack of logfiles, I don't know what other things to make more robust in univention-join...
Comment 13 Daniel Tröder univentionstaff 2016-11-22 16:20:21 CET
Problem cannot be reproduced. The patch is not applied.

If the problem reoccurs, please reopen.
Comment 14 Sönke Schwardt-Krummrich univentionstaff 2017-02-12 23:20:29 CET
We've not seen this problem again during the last 12 months.
Please reopen this bug with further information, if the problem happens again.
Comment 15 Sönke Schwardt-Krummrich univentionstaff 2017-02-12 23:20:43 CET
(In reply to Sönke Schwardt-Krummrich from comment #14)
> Please reopen this bug with further information, if the problem happens
> again.

s/reopen/clone/