Bug 43874 - Failed services are not detected as stopped and prevented from being re-started by systemd
Failed services are not detected as stopped and prevented from being re-start...
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: General
UCS 4.2
Other Linux
: P5 normal (vote)
: UCS 4.2
Assigned To: Philipp Hahn
Janek Walkenhorst
: interim-3, systemd
Depends on: 43313 43450 43760
Blocks: 43330
  Show dependency treegraph
 
Reported: 2017-03-15 13:51 CET by Philipp Hahn
Modified: 2017-09-26 10:47 CEST (History)
2 users (show)

See Also:
What kind of report is it?: Development Internal
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:
hahn: Patch_Available+


Attachments
systemd: track pidfile (12.27 KB, patch)
2017-03-15 13:51 CET, Philipp Hahn
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2017-03-15 13:51:06 CET
Created attachment 8542 [details]
systemd: track pidfile

The conversion to LSB init scripts at Bug #38438 was incomplete: Similar to Bug #43313 we have many other services with legacy LSB init scripts, which don't use the 'pidfile' extension. For those /lib/systemd/system-generators/systemd-sysv-generator generates service files with "RemainAfterExit=no". This leads to services consider still running while their process failed to start or terminated early.

This work-around is required until all legacy System V init scripts have been converted to systemd.

Services run through runit are ignored, as the System V init scripts just tell runsvdir to start/stop the services and thus are not further monitored by systemd.
Comment 1 Philipp Hahn univentionstaff 2017-03-16 12:16:11 CET
Some more background: There are two kind of init scripts:
1. oneshot services like mounting file systems and starting the network. They change some state, but usually do not start any daemons explicitly. Many such init scripts don't have a stop action.
2. init scripts starting daemons, which run until they're explicitly stopped. Usually the PID of such daemons is written to some file, so the stop action can explicitly kill that process. (killing by name is not recommended, as they can kill unrelated processes started by users).
With systemd it is no longer recommended that daemons fork themselves into the background (this is the same with runit), as this allows systemd to get the PID of the daemon more easily. For still forking daemons systemd uses the Linux cgroups mechanism to capture forked child processes, which are then associated with the corresponding service unit. But even when using that mechanism, systemd does not know which process must be running to consider the service as active.
As such there is the systemd option "RemainAfterExit" to control the systemd behavior: For oneshot services it should be "yes", for daemon services "no".
The difference can be seen when the daemon dies unexpectedly or is killed not through the init scripts: systemd then still considers the system as running and "start" does not start the daemon again and "status" still returns 0 as the service is still considered active. This breaks all kind of scripts which use "status" to check the state of services.
For legacy init scripts the "systemd-sysv-generator" generates systemd service units in /run/systemd/generator.late/. The logic for "RemainAfterExit" is very simple, namely it is set to "yes" if a line with "# pidfile: /..." is found.
As such add that line to all legacy init scripts to create a better systemd service unit.

Another problem is that some services were converted to systemd already, namely:
 /lib/systemd/system/cron.service
 /lib/systemd/system/openbsd-initd.service
The shadow our modified System V init scripts, which are as such no longer used. Instead of adding that line, they were dropped as the only modification was added UCRV */autostart support, which was moved to a systemd specific mechanism by Bug #43470.

r77807 | Bug #43874: Work-around diverted univention-service needed by systemd
r77796 | Bug #43874 pxe: track pidfile
r77795 | Bug #43874 runit: track pidfile
r77805 | Bug #43874 base: track pidfile
r77804 | Bug #43874 base: Handle UCRV */autostart through systemd
...

Package: univention-runit
Version: 8.0.0-7A~4.2.0.201703151914

Package: univention-net-installer
Version: 10.0.0-3A~4.2.0.201703151954

Package: univention-samba
Version: 11.0.1-5A~4.2.0.201703152005

Package: univention-management-console
Version: 9.0.64-1A~4.2.0.201703151936

Package: univention-postgrey
Version: 5.0.0-2A~4.2.0.201703151927

Package: univention-mail-dovecot
Version: 3.0.0-3A~4.2.0.201703151924

Package: univention-mail-postfix
Version: 11.0.0-4A~4.2.0.201703151926

Package: univention-s4-connector
Version: 11.0.6-11A~4.2.0.201703152003

Package: univention-ad-connector
Version: 11.0.6-7A~4.2.0.201703151942

Package: univention-docker
Version: 2.0.0-5A~4.2.0.201703151915

Package: univention-nagios
Version: 10.0.1-2A~4.2.0.201703151941

Package: univention-novnc
Version: 1.0.0-5A~4.2.0.201703152012

Package: univention-base-files
Version: 6.0.0-11A~4.2.0.201703160819

Package: univention-heimdal
Version: 10.0.0-7A~4.2.0.201703161137

Package: univention-samba4
Version: 6.0.9-9A~4.2.0.201703161139
Comment 2 Janek Walkenhorst univentionstaff 2017-03-19 20:11:34 CET
(In reply to Philipp Hahn from comment #1)
> r77807 | Bug #43874: Work-around diverted univention-service needed by
> systemd
OK
> r77796 | Bug #43874 pxe: track pidfile
OK
> r77795 | Bug #43874 runit: track pidfile
OK
> r77805 | Bug #43874 base: track pidfile
OK
> r77804 | Bug #43874 base: Handle UCRV */autostart through systemd
OK
Comment 3 Stefan Gohmann univentionstaff 2017-04-04 18:29:37 CEST
UCS 4.2 has been released:
 https://docs.software-univention.de/release-notes-4.2-0-en.html
 https://docs.software-univention.de/release-notes-4.2-0-de.html

If this error occurs again, please use "Clone This Bug".