Bug 48232 – Journald should be restarted when watchdog steps in

Bug 48232 - Journald should be restarted when watchdog steps in


Summary:	Journald should be restarted when watchdog steps in

Status:	RESOLVED WONTFIX

Product:	UCS
Classification:	Unclassified
Component:	Upstream packages
Version:	UCS 4.2
Hardware:	Other Linux

Importance:	P5 normal (vote)
Target Milestone:	---
Assigned To:	UCS maintainers
QA Contact:	UCS maintainers

URL:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2018-11-28 17:06 CET by Christian Völker
Modified:	2020-07-03 20:52 CEST (History)
CC List:	2 users (show)

See Also:
What kind of report is it?:	Bug Report
What type of bug is this?:	5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?:	3: Will affect average number of installed domains
How will those affected feel about the bug?:	3: A User would likely not purchase the product
User Pain:	0.257
Enterprise Customer affected?:	Yes
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:	2018112821000588
Bug group (optional):
Max CVSS v3 score:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Christian Völker

2018-11-28 17:06:54 CET

journald writes it's data every five minutes to disc. Default value in /etc/systemd/journald.conf:

SyncIntervalSec=5m

In case there is some high load on the system (or at least only under /var, ie mails) AND lots of log messages arrive journald might need more than a minute to write it's data to the disc.

But when writing it does not do watchdog pings which leads to systemd-watchdog assuming the journald service is dead. So it gets killed:

Nov 18 13:56:36 mailsrv systemd[1]: systemd-journald.service watchdog timeout (limit 1min)!

I have not confirmed yet, but it looks like journald gets killed, but not restarted, which is bad.

At least systemd should restart journald.

Comment 1 Christian Völker

2018-11-28 17:10:02 CET

Possible workarounds so far:

1. Optimize disk speed on /var/journal (ie SSD or physically separate storage)

2. Increase interval: SyncIntervalSec=1m but there is not ucr variable currently

3. optimize filesystem by options (ie data=writeback, not recommended)

Comment 2 Christian Völker

2018-11-28 17:11:18 CET

Possible solutions:

A. Increase timeout value from 1min to ie 3min- is it hard coded as I did not find any possibility to set this value.

B. At least start the process again after it got killed.

Comment 3 Arvid Requate

2018-11-28 18:29:59 CET

Which UCS release / systemd package version?

* https://github.com/systemd/systemd/issues/1804
* https://github.com/systemd/systemd/issues/6283

root@member55:~# grep WatchdogSec /lib/systemd/system/systemd-journald.service 
WatchdogSec=3min
root@member55:~# lsb_release -r
Release:        4.3-2 errata344

Comment 4 Christian Völker

2018-11-29 07:50:50 CET

version/erratalevel: 425
version/patchlevel: 4
version/releasename: Lesum
version/version: 4.2

Comment 5 Arvid Requate

2018-11-29 13:08:39 CET

Ok, UCS 4.2-x has systemd version 215-17. There's a Debian bug tracker entry discussing this issue, where people report that it has been resolved (or significantly improved) in 227-3:

 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=805042

So, we may want to either backport a newer systemd version or (more probable) apply the upstream patch

 https://github.com/systemd/systemd/commit/4de2402b603ea2f518f451d06f09e15aeae54fab

which seems to have fixed

 https://github.com/systemd/systemd/issues/1804

Comment 6 Ingo Steuwer

2020-07-03 20:52:39 CEST

This issue has been filed against UCS 4.2.

UCS 4.2 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed.

If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.