Univention Bugzilla – Bug 45173
systemd fails to start runsv-dir too slowly - other services fail to start afterwards
Last modified: 2020-07-03 20:55:54 CEST
After boot several services fail to start - restarting them manually fixes the problem: # systemctl status univention-directory-listener.service ● univention-directory-listener.service - LSB: Univention Directory Listener Daemon Loaded: loaded (/etc/init.d/univention-directory-listener) Active: failed (Result: exit-code) since Mi 2017-08-09 13:41:42 CEST; 12s ago Process: 5501 ExecStop=/etc/init.d/univention-directory-listener stop (code=exited, status=0/SUCCESS) Process: 6297 ExecStart=/etc/init.d/univention-directory-listener start (code=exited, status=1/FAILURE) Aug 09 13:41:41 master42 systemd[1]: Starting LSB: Univention Directory Listener Daemon... Aug 09 13:41:42 master42 univention-directory-listener[6297]: Starting Univention Directory Listener Daemon: univention-directory-listener failed! Aug 09 13:41:42 master42 systemd[1]: univention-directory-listener.service: control process exited, code=exited status=1 I can reproduce this by adding a "sleep 10" to "/usr/sbin/runsvdir-start" and doing systemctl start univention-runit.service ; systemctl status univention-runit.service ; systemctl start univention-directory-listener.service ; systemctl status univention-directory-listener.service after stopping all runsv related services for testing. Analysis: - systemd runs "/etc/init.d/univention-runit start", which uses "start-stop-daemon "--background --make-pidfile --pidfile" to start "/usr/sbin/runsvdir", which is a shell script and which finally execs "runsvdir". - "univention-runit.service" is "Type=forking" and has "PIDFile=/var/run/univention-runit.pid", which is filled in by "start-stop-daemon" when the first intermediate shell is forked. - So for systemd the service is ready as soon as the shell is forked and the init script has exited. - But runsv is not yet ready, as first several intermediate processes have finished re-exec-uting subsequent prograns. - UDL is started as soon as systemd considers RIT ready, which uses "sv" from its init script to talk to the not-yet-ready runsv process. Resolutions: - This bug will be fixed when Bug #43686, Bug #43686, Bug #43688, Bug #3689 are fixed. - runsvdir could use <https://www.freedesktop.org/software/systemd/man/sd_notify.html> - printf '[Service]\nExecStartPost=/bin/sleep 5\n' >/lib/systemd/system/univention-runit.service.d/bug.conf
By updating from UCS 4.1 to 4.2 for a school enterprise customer, this happened again in the customer testing environment and also in my development environment. The workaround (sleep 5) works for me.
The workaround is not always successful
(In reply to Andreas Peichert from comment #1) > By updating from UCS 4.1 to 4.2 for a school enterprise customer, this > happened again in the customer testing environment and also in my > development environment. In this case, it looks like the system is too slow to start all the services parallel. Thus, "sv start" runs into a timeout which is by default seven seconds. The result is that the services are running but systemd recognized the init script return code 1 and shows the services as failure. It works for me after changing the init scripts to run svn -w 60 start ... BTW, 20 seconds weren't enough on this system. Anyway, I can also reproduce the original issue. Since, there is an UCS@school init script on this system (univention-reload-service) which uses "sv up" (it is like "sv -w 0 start"). This script is configured as rc2.d/S02... which is directly after runit. I needed to move it to rc2.d/S03... I'm not complete sure but my guess is that the clock is changed during the boot on the test system.
Replace runsv scripts with systemd units?
(In reply to Daniel Tröder from comment #4) > Replace runsv scripts with systemd units? Yeah, that would be the best. The problem is, I need a solution now. If I add the patch from Bug #46290 for the univention-reload-service and increase the sleep workaround for univention-runit to 30 seconds, the services are all started: mkdir /lib/systemd/system/univention-runit.service.d/ printf '[Service]\nExecStartPost=/bin/sleep 30\n' >/lib/systemd/system/univention-runit.service.d/bug45173.conf If I use only 10 seconds for the ExecStartPost sleep for univention-runit, some services are shown as failed in systemd. And that is a problem because the first restart of the service doesn't work.
(In reply to Stefan Gohmann from comment #5) > (In reply to Daniel Tröder from comment #4) > > Replace runsv scripts with systemd units? > > Yeah, that would be the best. The problem is, I need a solution now. I've moved a card into the basis board ideas to get priority on it after the release of UCS 4.3.
FYI: After an upgrade from UCS-4.2-3 to UCS-4.3-0~ bind9 no longer was running (making the repository server unresolvable): $ journalctl --this-boot Feb 21 11:38:30 master42 systemd[1]: Started LSB: Univention iptables --wait configuration. Feb 21 11:38:31 master42 root[32245]: Unknown bind9 status: down: univention-bind: 642s Feb 21 11:38:31 master42 root[32247]: Unknown bind9 status: down: univention-bind-proxy: 642s Feb 21 11:38:31 master42 systemd[1]: Stopping LSB: Univention Directory Listener Daemon... Feb 21 11:38:36 master42 systemd[1]: Stopped LSB: Univention Directory Listener Daemon. Feb 21 11:38:36 master42 systemd[1]: Starting LSB: Univention Directory Listener Daemon... Feb 21 11:38:36 master42 univention-directory-listener[32273]: Stopping Univention Directory Listener Daemon: univention-directory-listener. Feb 21 11:38:36 master42 univention-directory-listener[32282]: Starting Univention Directory Listener Daemon: univention-directory-listener. Feb 21 11:38:36 master42 systemd[1]: Started LSB: Univention Directory Listener Daemon. Feb 21 11:38:41 master42 systemd[1]: Reloading. # systemctl status bind9 ● bind9.service - LSB: bind9 Domain Name Server (DNS) Loaded: loaded (/etc/init.d/bind9; generated; vendor preset: enabled) Active: active (exited) since Mon 2018-02-19 21:22:29 CET; 1 day 14h ago Docs: man:systemd-sysv-generator(8) Tasks: 0 (limit: 4915) Memory: 0B CPU: 0 CGroup: /system.slice/bind9.service Feb 19 21:22:26 master42 systemd[1]: Starting LSB: bind9 Domain Name Server (DNS)... Feb 19 21:22:29 master42 bind9[1001]: Starting bind9 Domain Name Server (DNS): ldap proxy. Feb 19 21:22:29 master42 systemd[1]: Started LSB: bind9 Domain Name Server (DNS). # systemctl restart bind9 # systemctl status bind9 ● bind9.service - LSB: bind9 Domain Name Server (DNS) Loaded: loaded (/etc/init.d/bind9; generated; vendor preset: enabled) Active: active (exited) since Wed 2018-02-21 11:52:23 CET; 1s ago Docs: man:systemd-sysv-generator(8) Process: 6388 ExecStop=/etc/init.d/bind9 stop (code=exited, status=0/SUCCESS) Process: 6400 ExecStart=/etc/init.d/bind9 start (code=exited, status=0/SUCCESS) CPU: 318ms Feb 21 11:52:22 master42 systemd[1]: Stopped LSB: bind9 Domain Name Server (DNS). Feb 21 11:52:22 master42 systemd[1]: Starting LSB: bind9 Domain Name Server (DNS)... Feb 21 11:52:23 master42 systemd[1]: Started LSB: bind9 Domain Name Server (DNS). Feb 21 11:52:23 master42 bind9[6400]: Starting bind9 Domain Name Server (DNS): ldap proxy. # tail /var/log/univention/updater.log Starting dist-update at Wed Feb 21 11:14:45 2018...
This issue has been filed against UCS 4.2. UCS 4.2 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed. If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.