Univention Bugzilla – Bug 43450
slapd entered failed state in test case 25reconnect_uldap
Last modified: 2017-09-26 10:47:19 CEST
From syslog: Jan 30 20:20:52 master090 systemd[1]: Stopping LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol)... Jan 30 20:20:52 master090 systemd[1]: Stopped LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol). Jan 30 20:20:52 master090 systemd[1]: Starting LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol)... Jan 30 20:20:52 master090 slapd[23035]: LDAP server already running. Jan 30 20:20:52 master090 systemd[1]: slapd.service: control process exited, code=exited status=2 Jan 30 20:20:52 master090 systemd[1]: Failed to start LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol). Jan 30 20:20:52 master090 systemd[1]: Unit slapd.service entered failed state. ucs-test output: *** BEGIN *** ['/usr/bin/python2.7', '25reconnect_uldap'] *** *** 10_ldap/25reconnect_uldap *** Test reconnect mechanism of uldap *** *** START TIME: 2017-01-30 20:19:57 *** Setting ldap/client/retry/count ************************************************************************ Test: connect, search, stop, start, search ************************************************************************ Warning: Unit file of slapd.service changed on disk, 'systemctl daemon-reload' recommended. Warning: Unit file of slapd.service changed on disk, 'systemctl daemon-reload' recommended. ************************************************************************ Test: stop, connect - start after 9 seconds, search ************************************************************************ Warning: Unit file of slapd.service changed on disk, 'systemctl daemon-reload' recommended. Warning: Unit file of slapd.service changed on disk, 'systemctl daemon-reload' recommended. ************************************************************************ Test: connect, stop, - start after 9 seconds, search ************************************************************************ Warning: Unit file of slapd.service changed on disk, 'systemctl daemon-reload' recommended. Warning: Unit file of slapd.service changed on disk, 'systemctl daemon-reload' recommended. ************************************************************************ Test: stop, connect - start after 11 seconds (which is too late), search ************************************************************************ Warning: Unit file of slapd.service changed on disk, 'systemctl daemon-reload' recommended. Warning: Unit file of slapd.service changed on disk, 'systemctl daemon-reload' recommended. ************************************************************************ Test: connect, stop - start after 11 seconds (which is too late), search ************************************************************************ Job for slapd.service canceled. Warning: Unit file of slapd.service changed on disk, 'systemctl daemon-reload' recommended. invoke-rc.d: initscript slapd, action "start" failed. ### FAIL ### Search was successful ### ### CLEANUP Setting ldap/client/retry/count Warning: Unit file of slapd.service changed on disk, 'systemctl daemon-reload' recommended. Job for slapd.service failed. See 'systemctl status slapd.service' and 'journalctl -xn' for details. invoke-rc.d: initscript slapd, action "start" failed. *** END TIME: 2017-01-30 20:20:53 *** *** TEST DURATION (H:MM:SS.ms): 0:00:56.360147 *** *** END *** 1 ***
For this test case we could wait for a little bit more time but I think we need to fix it more generally. If the slapd failed to start we need to ensure that the next restart will be possible, e.g.: systemctl start slapd → failed systemctl restart slapd → must be possible Otherwise some basics will fail for example the LDAP extension registration. @Janek, Philipp: is it possible to configure this in systemd?
(In reply to Stefan Gohmann from comment #1) > For this test case we could wait for a little bit more time but I think we > need to fix it more generally. > > If the slapd failed to start we need to ensure that the next restart will be > possible, e.g.: > systemctl start slapd → failed > systemctl restart slapd → must be possible This should be possible. In this case the setup was: systemctl start slapd …check database systemctl stop slapd …no process → OK …start process → OK but killed by stop thus FAILED And then trying: systemctl stop slapd → no need to execute, already stopped Or trying: systemctl start slapd …slapd is running → FAILED‽ So the "start" case was "lying" in that it says something went wrong when the daemon is running just fine. Also the stop case seems to abort the start case but not cleanly, leading to a possible race. My suggested fix is log_action_msg "LDAP server already running" exit 0 . This should not be a problem with errors in starting, because if the start script fails it can be retried always. Only script that say FAILED when it went well or say OK when it went wrong could lead to problems. My guess is that the race is due to systemd killing the start script (TERM) but the script not killing its subprocesses. This may be because of the /lib/lsb/init-functions.d/40-systemd hook.
I would also remove the sleep 5 It is there since CVS times and seems to not have any purpose given that the loop after it has error handling and retry. This improves the performance if the init script and also reduces the chance of hitting the race in ucs-test/10_ldap/26reconnect_univention_ldapsearch
OK: ucs-4.2-0@76262 <https://refspecs.linuxbase.org/LSB_3.0.0/LSB-PDA/LSB-PDA/iniscrptact.html> FIXED: ucs-4.2-0@76263 → r76482 OK: systemctl status slapd.service OK: systemctl stop slapd.service OK: systemctl start slapd.service OK: systemctl restart slapd.service ~FAIL: killall slapd ; systemctl start slapd.service # NO slapd running This is a generic systemd SysV-init compatibility bug, which violates the "behave sensibly" requirement from <https://www.debian.org/doc/debian-policy/ch-opersys.html#s-writing-init>. (It is not OpenLDAP specific, as I can repeat it with clamd, too.) Calling old SysV-init scripts would start the service again as expected, but due to the /lib/lsb/init-functions.d/40-systemd shim systemd can't distinguish "type=oneshot" services from "type=simple/forking" services and has to treat the exit code 0 from the init script as "everything okay", even when the service crashes later and is no longer running. killall slapd ; _SYSTEMCTL_SKIP_REDIRECT=1 /etc/init.d/slapd start does show the difference as there slapd is restarted. We may either switch slapd completely to systemd, or at least supplement the generated slapd.service: /etc/systemd/system/slapd.service.d/custom.conf: [Service] Restart=on-failure # or "always", as slapd exit(0) when SIGTERM killed? PIDFile=/var/run/slapd/slapd.pid # init.d tries to parse that dynamically RemainAfterExit=no # systemd-generator uses wrong "yes" here, causing this issue
UCS 4.2 has been released: https://docs.software-univention.de/release-notes-4.2-0-en.html https://docs.software-univention.de/release-notes-4.2-0-de.html If this error occurs again, please use "Clone This Bug".