Bug 40460 - UCS@school 4.1 Slave: FAILED 03univention-directory-listener.inst
UCS@school 4.1 Slave: FAILED 03univention-directory-listener.inst
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Listener (univention-directory-listener)
UCS 4.1
Other Linux
: P3 critical (vote)
: UCS 4.1-0-errata
Assigned To: Philipp Hahn
Arvid Requate
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-19 18:40 CET by Arvid Requate
Modified: 2016-02-11 13:59 CET (History)
5 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:
hahn: Patch_Available+


Attachments
join.log (263.77 KB, text/x-log)
2016-01-19 18:40 CET, Arvid Requate
Details
management-console-module-schoolinstaller.log (213.46 KB, text/x-log)
2016-01-19 18:41 CET, Arvid Requate
Details
Untested patch - compiles at least (8.77 KB, patch)
2016-02-05 10:20 CET, Philipp Hahn
Details | Diff
1.c (281 bytes, text/x-csrc)
2016-02-10 18:26 CET, Arvid Requate
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2016-01-19 18:40:59 CET
Created attachment 7418 [details]
join.log

The installation of UCS@school on a Samba/AD Slave DC via UMC installer aborted during the joinscript  03univention-directory-listener.inst with the message:

==========================================================================
Die Softwarepakete wurden erfolgreich installiert, jedoch konnte der Domänenbeitritt nicht abgeschlossen werden: FAILED: 03univention-directory-listener.inst. Mehr Hinweise können in der Log-Datei /var/log/univention/join.log gefunden werden. Nach Beheben der entsprechenden Konflikte kann der Domänenbeitritt über das UMC-Modul "Domänenbeitritt" abgeschlossen werden.
==========================================================================

This happened with UCS 4.1 errata 1 as well as with UCS@4.1 errata56.
In both cases the Master was already set up as UCS@school Samba/AD DC (Multiserver).

To track this down I had added a "set -x" to 01univention-ldap-server-init.inst and 03univention-directory-listener.inst at the time where the attached join.log was created. But the interesting part seems to happen with the listener starts up:

=========================================================================
+ /usr/sbin/univention-directory-listener -o -x -ZZ -d 2 -i -h master10.ar41i1.qa -b dc=ar41i1,d
c=qa -m /usr/lib/univention-directory-listener/system -c /var/lib/univention-directory-listener 
-D cn=slave12,cn=dc,cn=server,cn=computers,ou=school2,dc=ar41i1,dc=qa -y /etc/machine.secret
23.11.15 18:27:37.021  DEBUG_INIT
UNIVENTION_DEBUG_BEGIN  : uldap.__open host=slave12.ar41i1.qa port=7389 base=dc=ar41i1,dc=qa
UNIVENTION_DEBUG_END    : uldap.__open host=slave12.ar41i1.qa port=7389 base=dc=ar41i1,dc=qa
23.11.15 18:27:39.413  LISTENER    ( ERROR   ) : import of filename=/usr/lib/univention-director
y-listener/system/ucsschool-s4-branch-site.py failed
Traceback (most recent call last):
  File "/usr/lib/univention-directory-listener/system/ucsschool-s4-branch-site.py", line 134, in
 <module>
    raise ImportError("Error accessing LDAP via machine account: %s" % (ex,))
ImportError: Error accessing LDAP via machine account: {'desc': 'Invalid credentials'}
23.11.15 18:27:39.413  LISTENER    ( ERROR   ) : import of filename=/usr/lib/univention-director
y-listener/system/ucsschool-s4-branch-site.py failed in module_import()
23.11.15 18:27:39.671  LISTENER    ( WARN    ) : replication: ldap server changed to master10.ar
41i1.qa
UNIVENTION_DEBUG_BEGIN  : uldap.__open host=master10.ar41i1.qa port=7389 base=dc=ar41i1,dc=qa
UNIVENTION_DEBUG_END    : uldap.__open host=master10.ar41i1.qa port=7389 base=dc=ar41i1,dc=qa
23.11.15 18:27:39.754  LISTENER    ( WARN    ) : handler: replication (not ready) (ignore)
Restarting ldap server(s).
Stopping ldap server(s): slapd ...done.
Starting ldap server(s): slapd ...done.
23.11.15 18:27:42.183  LISTENER    ( WARN    ) : handler: faillog (not ready) (ignore)
23.11.15 18:27:42.185  LISTENER    ( WARN    ) : handler: s4-connector (not ready) (ignore)
23.11.15 18:27:42.187  LISTENER    ( WARN    ) : Set Schema ID to 7
23.11.15 18:27:42.187  LISTENER    ( WARN    ) : initializing module replication
File: /var/lib/univention-ldap/ldap/DB_CONFIG
slapd: Kein Prozess gefunden
File: /var/lib/univention-ldap/ldap/DB_CONFIG
Starting ldap server(s): slapd ...done.
Restarting ldap server(s).
Stopping ldap server(s): slapd ...retry #1....done.
Starting ldap server(s): slapd ...done.
23.11.15 18:27:57.037  LISTENER    ( ERROR   ) : could not get DNs when initializing replication: Can't contact LDAP server
23.11.15 18:27:57.037  LISTENER    ( WARN    ) : initializing module nfs-homes
23.11.15 18:27:57.040  LISTENER    ( ERROR   ) : could not get DNs when initializing nfs-homes: Can't contact LDAP server
23.11.15 18:27:57.040  LISTENER    ( WARN    ) : initializing module cups-pdf
23.11.15 18:27:57.043  LISTENER    ( ERROR   ) : could not get DNs when initializing cups-pdf: Can't contact LDAP server
23.11.15 18:27:57.044  LISTENER    ( WARN    ) : initializing module keytab-member
23.11.15 18:27:57.047  LISTENER    ( ERROR   ) : could not get DNs when initializing keytab-member: Can't contact LDAP server
23.11.15 18:27:57.047  LISTENER    ( WARN    ) : initializing module cups-printers
23.11.15 18:27:57.050  LISTENER    ( ERROR   ) : could not get DNs when initializing cups-printers: Can't contact LDAP server
23.11.15 18:27:57.050  LISTENER    ( WARN    ) : initializing module gencertificate
23.11.15 18:27:57.053  LISTENER    ( ERROR   ) : could not get DNs when initializing gencertificate: Can't contact LDAP server
[...] ## repeats for each listener script.
23.11.15 18:27:57.517  LISTENER    ( WARN    ) : initializing module pkgdb-watch
23.11.15 18:27:57.517  LISTENER    ( ERROR   ) : could not get DNs when initializing pkgdb-watch: Can't contact LDAP server
Multifile: /etc/samba/smb.conf
+ exit_status=0
+ univention-config-registry set ldap/database/ldbm/dbsync=
Setting ldap/database/ldbm/dbsync
Multifile: /etc/ldap/slapd.conf
+ sleep 15
+ [ -f /etc/init.d/slapd ]
+ /etc/init.d/slapd restart
Restarting ldap server(s).
Stopping ldap server(s): slapd ...retry #1....done.
Starting ldap server(s): slapd ...done.
+ [ 0 != 0 ]
+ test -e /var/lib/univention-directory-listener/handlers/replication
+ cat /var/lib/univention-directory-listener/handlers/replication
+ res=0
+ [ 0 != 1 -a 0 != 3 ]
+ exit 1
Mon Nov 23 18:28:24 CET 2015: finish /usr/sbin/univention-join
=========================================================================

After that the local ldap server seems to be empty:
======================================================
root@slave12:~# univention-ldapsearch -x
ldap_bind: Invalid credentials (49)

root@slave12:~# ldapwhoami -ZZ -D uid=Administrator,cn=users,dc=ar41i1,dc=qa -h slave12 -p 7389 -w univention -x
ldap_bind: Invalid credentials (49)
======================================================

But the master has the current machine.secret:
======================================================
root@slave12:~# ldapwhoami -ZZ -D cn=slave12,cn=dc,cn=server,cn=computers,ou=school2,dc=ar41i1,dc=qa -h master10 -p 7389 -y /etc/machine.secret -x
dn:cn=slave12,cn=dc,cn=server,cn=computers,ou=school2,dc=ar41i1,dc=qa
======================================================
Comment 1 Arvid Requate univentionstaff 2016-01-19 18:41:52 CET
Created attachment 7419 [details]
management-console-module-schoolinstaller.log
Comment 2 Arvid Requate univentionstaff 2016-01-25 19:39:16 CET
Looks like downgrading univention-directory-listner to 10.0.0-2.282.201509231400 (i.e. reverting to version/errata=0) before starting the UCS@school installer on the DC Slave fixes this issue.

So it seems like the adjustment for Bug 39957 breaks something.
Comment 3 Stefan Gohmann univentionstaff 2016-01-27 06:36:41 CET
I'm currently unable to reproduce it. Also the test Jenkins tests run without problems at least without this problem.

Are you able to reproduce it?
Comment 5 Stefan Gohmann univentionstaff 2016-02-05 06:42:55 CET
Happens again in Jenkins:
http://jenkins.knut.univention.de:8080/job/UCSschool%204.1/job/UCSschool%204.1%20Multiserver/SambaVersion=s4/ws/join.log

Configure 03univention-directory-listener.inst Thu Feb  4 19:15:53 EST 2016
2016-02-04 19:15:53.428791817-05:00 (in joinscript_init)
Setting ldap/database/ldbm/dbsync
Multifile: /etc/ldap/slapd.conf
04.02.16 19:15:54.740  DEBUG_INIT
04.02.16 19:15:54.740  LISTENER    ( INFO    ) : purging cache
04.02.16 19:15:54.740  LDAP        ( INFO    ) : connecting to ldap://master203.autotest203.local:7389/
04.02.16 19:15:54.749  LDAP        ( INFO    ) : simple_bind as cn=slave2032,cn=dc,cn=server,cn=computers,ou=School2,dc=autotest203,dc=local
04.02.16 19:15:54.754  LISTENER    ( INFO    ) : connecting to notifier master203.autotest203.local:6669
04.02.16 19:15:54.755  LISTENER    ( INFO    ) : established connection to 10.210.151.160 port 6669
04.02.16 19:15:54.756  LISTENER    ( INFO    ) : connection okay to host master203.autotest203.local:7389
04.02.16 19:15:54.873  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/replication.py
04.02.16 19:15:55.228  LISTENER    ( INFO    ) : replication flatmode enabled by UCR: no
04.02.16 19:15:55.228  LISTENER    ( INFO    ) : replication flatmode activated: False
04.02.16 19:15:55.228  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/cups-printers.py
04.02.16 19:15:55.292  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/samba-shares.py
04.02.16 19:15:55.339  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/pkgdb-watch.py
04.02.16 19:15:55.502  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/nfs-shares.py
04.02.16 19:15:55.505  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/remove-old-homedirs.py
04.02.16 19:15:55.506  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/keytab.py
04.02.16 19:15:55.506  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/faillog.py
04.02.16 19:15:55.507  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/bind.py
04.02.16 19:15:55.509  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/umc-service-providers.py
04.02.16 19:15:55.509  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/ucsschool-s4-branch-site.py
UNIVENTION_DEBUG_BEGIN  : uldap.__open host=slave2032.autotest203.local port=7389 base=dc=autotest203,dc=local
04.02.16 19:15:57.032  LDAP        ( INFO    ) : establishing new connection with retry_max=11
04.02.16 19:15:57.047  LDAP        ( INFO    ) : bind binddn=cn=slave2032,cn=dc,cn=server,cn=computers,ou=School2,dc=autotest203,dc=local
UNIVENTION_DEBUG_END    : uldap.__open host=slave2032.autotest203.local port=7389 base=dc=autotest203,dc=local
04.02.16 19:15:57.048  LISTENER    ( ERROR   ) : import of filename=/usr/lib/univention-directory-listener/system/ucsschool-s4-branch-site.py failed
Traceback (most recent call last):
  File "/usr/lib/univention-directory-listener/system/ucsschool-s4-branch-site.py", line 134, in <module>
    raise ImportError("Error accessing LDAP via machine account: %s" % (ex,))
ImportError: Error accessing LDAP via machine account: {'desc': 'Invalid credentials'}
04.02.16 19:15:57.048  LISTENER    ( ERROR   ) : import of filename=/usr/lib/univention-directory-listener/system/ucsschool-s4-branch-site.py failed in module_import()
04.02.16 19:15:57.048  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/quota.py
04.02.16 19:15:57.049  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/nagios-client.py
04.02.16 19:15:57.051  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/nscd.py
04.02.16 19:15:57.051  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/nss.py
04.02.16 19:15:57.051  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/ldap_server.py
04.02.16 19:15:57.065  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/samba4-idmap.py
04.02.16 19:15:57.080  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/cups-pdf.py
04.02.16 19:15:57.080  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/univention-saml-servers.py
04.02.16 19:15:57.080  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/license_uuid.py
04.02.16 19:15:57.081  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/ucs-school-user-logonscript.py
04.02.16 19:15:57.101  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/pupilgroups.py
04.02.16 19:15:57.109  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/keytab-member.py
04.02.16 19:15:57.110  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/s4-connector.py
04.02.16 19:15:57.111  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/gencertificate.py
04.02.16 19:15:57.112  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/ldap_extension.py
04.02.16 19:15:57.113  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/nfs-homes.py
04.02.16 19:15:57.114  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/remove-old-sharedirs.py
04.02.16 19:15:57.115  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/udm_extension.py
04.02.16 19:15:57.127  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/well-known-sid-name-mapping.py
04.02.16 19:15:57.132  LISTENER    ( INFO    ) : importing handler /usr/lib/univention-directory-listener/system/hosteddomains.py
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : setting data for all handlers: key=basedn  value=dc=autotest203,dc=local
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : replication: listener passed key="basedn" value="dc=autotest203,dc=local"
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : pkgdb-watch: listener passed key="basedn" value="dc=autotest203,dc=local"
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : setting data for all handlers: key=binddn  value=cn=slave2032,cn=dc,cn=server,cn=computers,ou=School2,dc=autotest203,dc=local
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : replication: listener passed key="binddn" value="cn=slave2032,cn=dc,cn=server,cn=computers,ou=School2,dc=autotest203,dc=local"
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : pkgdb-watch: listener passed key="binddn" value="cn=slave2032,cn=dc,cn=server,cn=computers,ou=School2,dc=autotest203,dc=local"
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : setting data for all handlers: key=bindpw  value=<HIDDEN>
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : replication: listener passed key="bindpw" value="<HIDDEN>"
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : pkgdb-watch: listener passed key="bindpw" value="<HIDDEN>"
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : setting data for all handlers: key=ldapserver  value=master203.autotest203.local
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : replication: listener passed key="ldapserver" value="master203.autotest203.local"
04.02.16 19:15:57.133  LISTENER    ( WARN    ) : replication: ldap server changed to master203.autotest203.local
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : pkgdb-watch: listener passed key="ldapserver" value="master203.autotest203.local"
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : pkgdb-watch: ldap server changed to master203.autotest203.local
04.02.16 19:15:57.133  LISTENER    ( INFO    ) : pkgdb-watch: ldap reconnect triggered
UNIVENTION_DEBUG_BEGIN  : uldap.__open host=master203.autotest203.local port=7389 base=dc=autotest203,dc=local
04.02.16 19:15:57.165  LDAP        ( INFO    ) : establishing new connection with retry_max=11
04.02.16 19:15:57.179  LDAP        ( INFO    ) : bind binddn=cn=slave2032,cn=dc,cn=server,cn=computers,ou=School2,dc=autotest203,dc=local
UNIVENTION_DEBUG_END    : uldap.__open host=master203.autotest203.local port=7389 base=dc=autotest203,dc=local
04.02.16 19:15:57.192  LISTENER    ( INFO    ) : running handlers for cn=Subschema
04.02.16 19:15:57.192  LISTENER    ( WARN    ) : handler: replication (not ready) (ignore)
04.02.16 19:15:57.192  LISTENER    ( INFO    ) : replication: Running handler for: cn=Subschema
Restarting ldap server(s).
Stopping ldap server(s): slapd ...done.
Starting ldap server(s): slapd ...done.
04.02.16 19:15:58.895  LISTENER    ( INFO    ) : handler: replication (successful)
04.02.16 19:15:58.896  LISTENER    ( WARN    ) : handler: faillog (not ready) (ignore)
04.02.16 19:15:58.896  LISTENER    ( INFO    ) : handler: faillog (successful)
04.02.16 19:15:58.896  LISTENER    ( WARN    ) : handler: s4-connector (not ready) (ignore)
04.02.16 19:15:58.905  LISTENER    ( INFO    ) : handler: s4-connector (successful)
04.02.16 19:15:58.906  LISTENER    ( WARN    ) : Set Schema ID to 7
04.02.16 19:15:58.906  LISTENER    ( WARN    ) : initializing module replication
04.02.16 19:15:58.906  LISTENER    ( INFO    ) : call handler_clean for module replication
04.02.16 19:15:58.906  LISTENER    ( INFO    ) : removing replica's cache
File: /var/lib/univention-ldap/ldap/DB_CONFIG
04.02.16 19:16:00.042  LISTENER    ( INFO    ) : call handler_initialize for module replication
04.02.16 19:16:00.042  LISTENER    ( INFO    ) : REPLICATION:  initialize
04.02.16 19:16:00.042  LISTENER    ( INFO    ) : removing replica's cache
slapd: Kein Prozess gefunden
File: /var/lib/univention-ldap/ldap/DB_CONFIG
04.02.16 19:16:01.205  LISTENER    ( INFO    ) : initializing replica's cache
Starting ldap server(s): slapd ...done.
04.02.16 19:16:02.389  LISTENER    ( INFO    ) : remove old entries for module replication
04.02.16 19:16:02.391  LISTENER    ( INFO    ) : call cache_free_cursor for module replication
04.02.16 19:16:02.391  LISTENER    ( INFO    ) : initialize schema for module replication
04.02.16 19:16:02.392  LISTENER    ( INFO    ) : running handlers [replication] for cn=Subschema
04.02.16 19:16:02.393  LISTENER    ( INFO    ) : replication: Running handler for: cn=Subschema
Restarting ldap server(s).
Stopping ldap server(s): slapd ...retry #1....done.
Starting ldap server(s): slapd ...done.
04.02.16 19:16:14.117  LISTENER    ( INFO    ) : handler: replication (successful)
04.02.16 19:16:14.118  LISTENER    ( INFO    ) : module replication for relating objects
04.02.16 19:16:14.118  LISTENER    ( ERROR   ) : could not get DNs when initializing replication: Can't contact LDAP server
04.02.16 19:16:14.118  LISTENER    ( WARN    ) : initializing module cups-printers
04.02.16 19:16:14.118  LISTENER    ( INFO    ) : call handler_clean for module cups-printers
04.02.16 19:16:14.118  LISTENER    ( INFO    ) : call handler_initialize for module cups-printers
04.02.16 19:16:14.119  LISTENER    ( INFO    ) : remove old entries for module cups-printers
04.02.16 19:16:14.120  LISTENER    ( INFO    ) : call cache_free_cursor for module cups-printers
04.02.16 19:16:14.120  LISTENER    ( INFO    ) : initialize schema for module cups-printers
04.02.16 19:16:14.121  LISTENER    ( INFO    ) : running handlers [cups-printers] for cn=Subschema
04.02.16 19:16:14.121  LISTENER    ( INFO    ) : module cups-printers for relating objects
04.02.16 19:16:14.122  LISTENER    ( ERROR   ) : could not get DNs when initializing cups-printers: Can't contact LDAP server
Comment 6 Stefan Gohmann univentionstaff 2016-02-05 07:19:31 CET
While joining the school slave, the master restarts the slapd:

Master (syslog):
--------------------------------------------------------------------------------
Feb  4 19:15:35 slave2032 logger: /etc/init.d/slapd restart (pid: 12789, ppid:12464 univention-join)
Feb  4 19:15:35 slave2032 logger: /etc/init.d/slapd stop (pid: 12797, ppid:12789 slapd)
Feb  4 19:15:35 slave2032 logger: /etc/init.d/slapd start (pid: 12810, ppid:12789 slapd)
Feb  4 19:15:35 slave2032 slapd[12821]: @(#) $OpenLDAP: slapd  (Nov  6 2015 08:03:15) $#012#011root@ladda:/var/build/temp/tmp.bfW1EHuZcn/pbuilder/openldap-2.4.42+dfsg/debian/build/servers/slapd
Feb  4 19:15:53 slave2032 logger: /etc/init.d/slapd start (pid: 13155, ppid:13138 invoke-rc.d)
Feb  4 19:15:53 slave2032 slapd[13166]: @(#) $OpenLDAP: slapd  (Nov  6 2015 08:03:15) $#012#011root@ladda:/var/build/temp/tmp.bfW1EHuZcn/pbuilder/openldap-2.4.42+dfsg/debian/build/servers/slapd
Feb  4 19:15:57 slave2032 logger: /etc/init.d/slapd restart (pid: 13228, ppid:13223 univention-dire)
Feb  4 19:15:57 slave2032 logger: /etc/init.d/slapd stop (pid: 13236, ppid:13228 slapd)
Feb  4 19:15:57 slave2032 logger: /etc/init.d/slapd start (pid: 13249, ppid:13228 slapd)
Feb  4 19:15:57 slave2032 slapd[13260]: @(#) $OpenLDAP: slapd  (Nov  6 2015 08:03:15) $#012#011root@ladda:/var/build/temp/tmp.bfW1EHuZcn/pbuilder/openldap-2.4.42+dfsg/debian/build/servers/slapd
Feb  4 19:16:01 slave2032 logger: /etc/init.d/slapd start (pid: 13270, ppid:13223 univention-dire)
Feb  4 19:16:01 slave2032 slapd[13281]: @(#) $OpenLDAP: slapd  (Nov  6 2015 08:03:15) $#012#011root@ladda:/var/build/temp/tmp.bfW1EHuZcn/pbuilder/openldap-2.4.42+dfsg/debian/build/servers/slapd
Feb  4 19:16:01 slave2032 /usr/sbin/cron[4209]: (*system*univention-directory-policy) RELOAD (/etc/cron.d/univention-directory-policy)
--------------------------------------------------------------------------------

Master (system-stats.log)
--------------------------------------------------------------------------------
listener 13716  0.1  1.0 517992 38824 ?        S    19:15   0:01  |   \_ /usr/sbin/univention-directory-listener -F -b dc=autotest203,dc=local -m /usr/lib/univention-directory-listener/system -c /var/lib/univention-directory-listener -d 3 -x -ZZ -D cn=admin,dc=autotest203,dc=local -y /etc/ldap.secret
--------------------------------------------------------------------------------

Master (listener.log)
--------------------------------------------------------------------------------
04.02.16 19:16:00.957  LISTENER    ( INFO    ) : postrun handler: umc-service-providers (prepared=-1)
13258
04.02.16 19:16:00.963  LISTENER    ( INFO    ) : umc-service-providers: Reloading LDAP server.
Initiating graceful reload of ldap server(s).
Sending HUP to ldap server(s): slapd ...done.
Starting ldap server(s): slapd ...done.
Checking Schema ID: ...done.
--------------------------------------------------------------------------------

Slave (listener.log):
--------------------------------------------------------------------------------
04.02.16 19:16:01.205  LISTENER    ( INFO    ) : initializing replica's cache
Starting ldap server(s): slapd ...done.
04.02.16 19:16:02.389  LISTENER    ( INFO    ) : remove old entries for module replication
04.02.16 19:16:02.391  LISTENER    ( INFO    ) : call cache_free_cursor for module replication
04.02.16 19:16:02.391  LISTENER    ( INFO    ) : initialize schema for module replication
04.02.16 19:16:02.392  LISTENER    ( INFO    ) : running handlers [replication] for cn=Subschema
04.02.16 19:16:02.393  LISTENER    ( INFO    ) : replication: Running handler for: cn=Subschema
Restarting ldap server(s).
Stopping ldap server(s): slapd ...retry #1....done.
Starting ldap server(s): slapd ...done.
04.02.16 19:16:14.117  LISTENER    ( INFO    ) : handler: replication (successful)
04.02.16 19:16:14.118  LISTENER    ( INFO    ) : module replication for relating objects
04.02.16 19:16:14.118  LISTENER    ( ERROR   ) : could not get DNs when initializing replication: Can't contact LDAP server
--------------------------------------------------------------------------------

We should simply reconnect if the LDAP connection has been closed.

I'll move it to UCS.
Comment 7 Philipp Hahn univentionstaff 2016-02-05 10:20:22 CET
Created attachment 7452 [details]
Untested patch - compiles at least

    Initialization might take some time. If the connection to the remote (or
    local) LDAP servers is lost during that time, the initialization of the
    module is aborted.
    
    Add macro LDAP_RETRY to re-try failed commands if LDAP_SERVER_DOWN is
    returned.
    
    Wrap all LDAP search commands with said macro.
    
    Add UCR 'listener/ldap/retries' to make number of retires configurable.
Comment 8 Arvid Requate univentionstaff 2016-02-08 14:01:39 CET
> While joining the school slave, the master restarts the slapd

As far as I read this, the umc-service-providers listener on the master restarts the slapd. This must be triggered by some joinscript action from the slave. So the slave should probably wait at that point until the slapd is ok again? Otherwise there might be other timing dependent effects in other setups?
Comment 9 Philipp Hahn univentionstaff 2016-02-10 08:27:40 CET
(In reply to Philipp Hahn from comment #7)
> Created attachment 7452 [details]
> Untested patch - compiles at least
but had some serious flaws.

r67292 | Bug #40460 UDL: Retry LDAP queries on LDAP_SERVER_DOWN
 fixed patch applied
r67291 | Bug #40460 UDL: Document listener signals
 document SIGUSR1/USR2/HUP/TERM, which is useful while debugging

Package: univention-directory-listener
Version: 10.0.0-6.298.201602100812
Branch: ucs_4.1-0
Scope: errata4.1-0

r67293 | Bug #40460 UDL: Retry LDAP queries on LDAP_SERVER_DOWN YAML
 univention-directory-listener.yaml

QA: Testes 32k environemtn with slapd restart during initial join okay
Comment 10 Philipp Hahn univentionstaff 2016-02-10 17:29:42 CET
r67317 | Bug #40460 UDL: Retry LDAP queries on LDAP_SERVER_DOWN
 Fix spelling in UCRV.
 Limit max sleep to 31s

Package: univention-directory-listener
Version: 10.0.0-7.299.201602101726
Branch: ucs_4.1-0
Scope: errata4.1-0

r67318 | Bug #40460 UDL: Retry LDAP queries on LDAP_SERVER_DOWN YAML
 univention-directory-listener.yaml
Comment 11 Arvid Requate univentionstaff 2016-02-10 18:26:06 CET
Created attachment 7469 [details]
1.c

Maybe the   sleep((1 << _retry++) & 0x1f) from commit r67317 is not what you intended? See the attached example for an alternative:

Current: 
 1 2 4 8 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 8 16 0 0 0
Proposal: 
 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8
Comment 12 Arvid Requate univentionstaff 2016-02-10 20:27:28 CET
Ok, apart from the question in the last comment the patch works. Unfortunately the amd64 version is still not built (openjdk-7 built is on its way..).

Good: I stopped the Master slapd during initialization of replication.py and I see the binary exponential backoff in the join.log:
==============================================================================
23.11.15 20:44:39.432  LDAP        ( ERROR   ) : start_tls: Can't contact LDAP server
23.11.15 20:44:40.433  LDAP        ( ERROR   ) : start_tls: Can't contact LDAP server
23.11.15 20:44:42.434  LDAP        ( ERROR   ) : start_tls: Can't contact LDAP server
23.11.15 20:44:46.435  LDAP        ( ERROR   ) : start_tls: Can't contact LDAP server
==============================================================================

Then I continue the slapd and the module finishes initialization as intended.
This will be enough for the situation here.

Bad: In case the slapd doesn't come back during the waiting time (current default: about 31 seconds), then the join will continue as before without proper listener module initialization and the "joined" system will still be broken. Maybe we should rather abort the join in that case? I know, addressing this is too much to ask for this bug, but maybe we should fork another bug to fix that point too?
Comment 13 Philipp Hahn univentionstaff 2016-02-10 20:58:02 CET
(In reply to Arvid Requate from comment #11)
> Created attachment 7469 [details]
> 1.c
> 
> Maybe the   sleep((1 << _retry++) & 0x1f) from commit r67317 is not what you
> intended? See the attached example for an alternative:

Thanks, fixed:
 r67329 | Bug #40460 UDL: Retry LDAP queries on LDAP_SERVER_DOWN
 r67330 | Bug #40460 UDL: Retry LDAP queries on LDAP_SERVER_DOWN

Package: univention-directory-listener
Version: 10.0.0-9.301.201602102049
Branch: ucs_4.1-0
Scope: errata4.1-0

r67332 | Bug #40460 UDL: Retry LDAP queries on LDAP_SERVER_DOWN YAML
 univention-directory-listener.yaml

(In reply to Arvid Requate from comment #12)
> Unfortunately the amd64 version is still not built (openjdk-7 built is on
> its way..).

Forced build for amd64


> Bad: In case the slapd doesn't come back during the waiting time (current
> default: about 31 seconds), then the join will continue as before without
> proper listener module initialization and the "joined" system will still be
> broken. Maybe we should rather abort the join in that case? I know,
> addressing this is too much to ask for this bug, but maybe we should fork
> another bug to fix that point too?

03univention-directory-listener.inst has this code, which breaks the join to complete:
>if test -e "/var/lib/univention-directory-listener/handlers/replication"; then
>        res=`cat /var/lib/univention-directory-listener/handlers/replication`
>        if [ $res != "1" -a "$res" != "3" ]; then
>                exit 1
Comment 14 Arvid Requate univentionstaff 2016-02-10 21:40:24 CET
Ok, looks good, with listener/ldap/retries=8 I got for each module:
=============================================================================
23.11.15 22:00:23.337  LISTENER    ( WARN    ) : initializing module nfs-homes
23.11.15 22:00:23.348  LDAP        ( ERROR   ) : start_tls: Can't contact LDAP server
23.11.15 22:00:24.349  LDAP        ( ERROR   ) : start_tls: Can't contact LDAP server
23.11.15 22:00:26.351  LDAP        ( ERROR   ) : start_tls: Can't contact LDAP server
23.11.15 22:00:30.353  LDAP        ( ERROR   ) : start_tls: Can't contact LDAP server
23.11.15 22:00:38.354  LDAP        ( ERROR   ) : start_tls: Can't contact LDAP server
23.11.15 22:00:54.355  LDAP        ( ERROR   ) : start_tls: Can't contact LDAP server
23.11.15 22:01:26.356  LDAP        ( ERROR   ) : start_tls: Can't contact LDAP server
23.11.15 22:01:27.357  LDAP        ( ERROR   ) : start_tls: Can't contact LDAP server
23.11.15 22:01:29.357  LISTENER    ( ERROR   ) : could not get DNs when initializing nfs-homes: Can't contact LDAP server
=============================================================================

To test the extreme case of total failure I set retries to 0 and tried again and you are right, 03univention-directory-listener.inst fails in that case.

Advisory: Ok
Comment 15 Stefan Gohmann univentionstaff 2016-02-10 21:42:36 CET
(In reply to Arvid Requate from comment #12)
> Bad: In case the slapd doesn't come back during the waiting time (current
> default: about 31 seconds), then the join will continue as before without
> proper listener module initialization and the "joined" system will still be
> broken. Maybe we should rather abort the join in that case? I know,
> addressing this is too much to ask for this bug, but maybe we should fork
> another bug to fix that point too?

No, please not. In my last support situation, the first four modules succeeded including the replication module. After that, I could simply restart the listener and the missing modules finished like expected. That saved me a few hours.
Comment 16 Arvid Requate univentionstaff 2016-02-11 10:37:13 CET
> No, please not.

I was concerned about the inverse case: replication.py fails but then - due to this brilliant new retry feature - the Master LDAP server has enough time to recover to make the initialization of the following modules succeed. But I tested it and Philipp is right, in that case the join script aborts later, because other things fail.
Comment 17 Janek Walkenhorst univentionstaff 2016-02-11 13:59:08 CET
<http://errata.software-univention.de/ucs/4.1/105.html>