Bug 45173 - systemd fails to start runsv-dir too slowly - other services fail to start afterwards
systemd fails to start runsv-dir too slowly - other services fail to start af...
Status: RESOLVED WONTFIX
Product: UCS
Classification: Unclassified
Component: General
UCS 4.2
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
: systemd
Depends on:
Blocks: 3689 43686 43688
  Show dependency treegraph
 
Reported: 2017-08-09 14:17 CEST by Philipp Hahn
Modified: 2020-07-03 20:55 CEST (History)
4 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 3: Will affect average number of installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.171
Enterprise Customer affected?: Yes
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2017072421000194
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2017-08-09 14:17:51 CEST
After boot several services fail to start - restarting them manually fixes the problem:
# systemctl status univention-directory-listener.service 
● univention-directory-listener.service - LSB: Univention Directory Listener Daemon
   Loaded: loaded (/etc/init.d/univention-directory-listener)
   Active: failed (Result: exit-code) since Mi 2017-08-09 13:41:42 CEST; 12s ago
  Process: 5501 ExecStop=/etc/init.d/univention-directory-listener stop (code=exited, status=0/SUCCESS)
  Process: 6297 ExecStart=/etc/init.d/univention-directory-listener start (code=exited, status=1/FAILURE)

Aug 09 13:41:41 master42 systemd[1]: Starting LSB: Univention Directory Listener Daemon...
Aug 09 13:41:42 master42 univention-directory-listener[6297]: Starting Univention Directory Listener Daemon: univention-directory-listener failed!
Aug 09 13:41:42 master42 systemd[1]: univention-directory-listener.service: control process exited, code=exited status=1


I can reproduce this by adding a "sleep 10" to "/usr/sbin/runsvdir-start" and doing
 systemctl start univention-runit.service ; systemctl status univention-runit.service ; systemctl start univention-directory-listener.service  ; systemctl status univention-directory-listener.service
after stopping all runsv related services for testing.

Analysis:
- systemd runs "/etc/init.d/univention-runit start", which uses "start-stop-daemon "--background --make-pidfile --pidfile" to start "/usr/sbin/runsvdir", which is a shell script and which finally execs "runsvdir".
- "univention-runit.service" is "Type=forking" and has "PIDFile=/var/run/univention-runit.pid", which is filled in by "start-stop-daemon" when the first intermediate shell is forked.
- So for systemd the service is ready as soon as the shell is forked and the init script has exited.
- But runsv is not yet ready, as first several intermediate processes have finished re-exec-uting subsequent prograns.
- UDL is started as soon as systemd considers RIT ready, which uses "sv" from its init script to talk to the not-yet-ready runsv process.

Resolutions:
- This bug will be fixed when Bug #43686, Bug #43686, Bug #43688, Bug #3689 are fixed.
- runsvdir could use <https://www.freedesktop.org/software/systemd/man/sd_notify.html>
- printf '[Service]\nExecStartPost=/bin/sleep 5\n' >/lib/systemd/system/univention-runit.service.d/bug.conf
Comment 1 Andreas Peichert univentionstaff 2018-01-25 16:12:54 CET
By updating from UCS 4.1 to 4.2 for a school enterprise customer, this happened again in the customer testing environment and also in my development environment. 

The workaround (sleep 5) works for me.
Comment 2 Andreas Peichert univentionstaff 2018-02-13 13:48:18 CET
The workaround is not always successful
Comment 3 Stefan Gohmann univentionstaff 2018-02-14 07:17:30 CET
(In reply to Andreas Peichert from comment #1)
> By updating from UCS 4.1 to 4.2 for a school enterprise customer, this
> happened again in the customer testing environment and also in my
> development environment. 

In this case, it looks like the system is too slow to start all the services parallel. Thus, "sv start" runs into a timeout which is by default seven seconds. The result is that the services are running but systemd recognized the init script return code 1 and shows the services as failure.

It works for me after changing the init scripts to run
 svn -w 60 start ...

BTW, 20 seconds weren't enough on this system.

Anyway, I can also reproduce the original issue. Since, there is an UCS@school init script on this system (univention-reload-service) which uses "sv up" (it is like "sv -w 0 start"). This script is configured as rc2.d/S02... which is directly after runit. I needed to move it to rc2.d/S03...

I'm not complete sure but my guess is that the clock is changed during the boot on the test system.
Comment 4 Daniel Tröder univentionstaff 2018-02-14 09:56:50 CET
Replace runsv scripts with systemd units?
Comment 5 Stefan Gohmann univentionstaff 2018-02-15 08:23:57 CET
(In reply to Daniel Tröder from comment #4)
> Replace runsv scripts with systemd units?

Yeah, that would be the best. The problem is, I need a solution now.

If I add the patch from Bug #46290 for the univention-reload-service and increase the sleep workaround for univention-runit to 30 seconds, the services are all started:

mkdir /lib/systemd/system/univention-runit.service.d/
printf '[Service]\nExecStartPost=/bin/sleep 30\n' >/lib/systemd/system/univention-runit.service.d/bug45173.conf

If I use only 10 seconds for the ExecStartPost sleep for univention-runit, some services are shown as failed in systemd. And that is a problem because the first restart of the service doesn't work.
Comment 6 Stefan Gohmann univentionstaff 2018-02-15 08:26:33 CET
(In reply to Stefan Gohmann from comment #5)
> (In reply to Daniel Tröder from comment #4)
> > Replace runsv scripts with systemd units?
> 
> Yeah, that would be the best. The problem is, I need a solution now.

I've moved a card into the basis board ideas to get priority on it after the release of UCS 4.3.
Comment 7 Philipp Hahn univentionstaff 2018-02-21 12:08:02 CET
FYI: After an upgrade from UCS-4.2-3 to UCS-4.3-0~ bind9 no longer was running (making the repository server unresolvable):

$ journalctl --this-boot
Feb 21 11:38:30 master42 systemd[1]: Started LSB: Univention iptables --wait configuration.
Feb 21 11:38:31 master42 root[32245]: Unknown bind9 status: down: univention-bind: 642s
Feb 21 11:38:31 master42 root[32247]: Unknown bind9 status: down: univention-bind-proxy: 642s
Feb 21 11:38:31 master42 systemd[1]: Stopping LSB: Univention Directory Listener Daemon...
Feb 21 11:38:36 master42 systemd[1]: Stopped LSB: Univention Directory Listener Daemon.
Feb 21 11:38:36 master42 systemd[1]: Starting LSB: Univention Directory Listener Daemon...
Feb 21 11:38:36 master42 univention-directory-listener[32273]: Stopping Univention Directory Listener Daemon: univention-directory-listener.
Feb 21 11:38:36 master42 univention-directory-listener[32282]: Starting Univention Directory Listener Daemon: univention-directory-listener.
Feb 21 11:38:36 master42 systemd[1]: Started LSB: Univention Directory Listener Daemon.
Feb 21 11:38:41 master42 systemd[1]: Reloading.

# systemctl status bind9
● bind9.service - LSB: bind9 Domain Name Server (DNS)
   Loaded: loaded (/etc/init.d/bind9; generated; vendor preset: enabled)
   Active: active (exited) since Mon 2018-02-19 21:22:29 CET; 1 day 14h ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 0 (limit: 4915)
   Memory: 0B
      CPU: 0
   CGroup: /system.slice/bind9.service
Feb 19 21:22:26 master42 systemd[1]: Starting LSB: bind9 Domain Name Server (DNS)...
Feb 19 21:22:29 master42 bind9[1001]: Starting bind9 Domain Name Server (DNS): ldap proxy.
Feb 19 21:22:29 master42 systemd[1]: Started LSB: bind9 Domain Name Server (DNS).

# systemctl restart bind9

# systemctl status bind9
● bind9.service - LSB: bind9 Domain Name Server (DNS)
   Loaded: loaded (/etc/init.d/bind9; generated; vendor preset: enabled)
   Active: active (exited) since Wed 2018-02-21 11:52:23 CET; 1s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 6388 ExecStop=/etc/init.d/bind9 stop (code=exited, status=0/SUCCESS)
  Process: 6400 ExecStart=/etc/init.d/bind9 start (code=exited, status=0/SUCCESS)
      CPU: 318ms 
Feb 21 11:52:22 master42 systemd[1]: Stopped LSB: bind9 Domain Name Server (DNS).
Feb 21 11:52:22 master42 systemd[1]: Starting LSB: bind9 Domain Name Server (DNS)...
Feb 21 11:52:23 master42 systemd[1]: Started LSB: bind9 Domain Name Server (DNS).
Feb 21 11:52:23 master42 bind9[6400]: Starting bind9 Domain Name Server (DNS): ldap proxy.

# tail /var/log/univention/updater.log
Starting dist-update at Wed Feb 21 11:14:45 2018...
Comment 8 Ingo Steuwer univentionstaff 2020-07-03 20:55:54 CEST
This issue has been filed against UCS 4.2.

UCS 4.2 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed.

If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.