Univention Bugzilla – Bug 48232
Journald should be restarted when watchdog steps in
Last modified: 2020-07-03 20:52:39 CEST
journald writes it's data every five minutes to disc. Default value in /etc/systemd/journald.conf: SyncIntervalSec=5m In case there is some high load on the system (or at least only under /var, ie mails) AND lots of log messages arrive journald might need more than a minute to write it's data to the disc. But when writing it does not do watchdog pings which leads to systemd-watchdog assuming the journald service is dead. So it gets killed: Nov 18 13:56:36 mailsrv systemd[1]: systemd-journald.service watchdog timeout (limit 1min)! I have not confirmed yet, but it looks like journald gets killed, but not restarted, which is bad. At least systemd should restart journald.
Possible workarounds so far: 1. Optimize disk speed on /var/journal (ie SSD or physically separate storage) 2. Increase interval: SyncIntervalSec=1m but there is not ucr variable currently 3. optimize filesystem by options (ie data=writeback, not recommended)
Possible solutions: A. Increase timeout value from 1min to ie 3min- is it hard coded as I did not find any possibility to set this value. B. At least start the process again after it got killed.
Which UCS release / systemd package version? * https://github.com/systemd/systemd/issues/1804 * https://github.com/systemd/systemd/issues/6283 root@member55:~# grep WatchdogSec /lib/systemd/system/systemd-journald.service WatchdogSec=3min root@member55:~# lsb_release -r Release: 4.3-2 errata344
version/erratalevel: 425 version/patchlevel: 4 version/releasename: Lesum version/version: 4.2
Ok, UCS 4.2-x has systemd version 215-17. There's a Debian bug tracker entry discussing this issue, where people report that it has been resolved (or significantly improved) in 227-3: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=805042 So, we may want to either backport a newer systemd version or (more probable) apply the upstream patch https://github.com/systemd/systemd/commit/4de2402b603ea2f518f451d06f09e15aeae54fab which seems to have fixed https://github.com/systemd/systemd/issues/1804
This issue has been filed against UCS 4.2. UCS 4.2 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed. If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.