Bug 52749 - NMI received for unknown reason 20 on CPU 0.
NMI received for unknown reason 20 on CPU 0.
Status: NEW
Product: UCS
Classification: Unclassified
Component: Kernel
UCS 4.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2021-02-05 17:07 CET by Sönke Schwardt-Krummrich
Modified: 2021-04-29 11:27 CEST (History)
1 user (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.057
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sönke Schwardt-Krummrich univentionstaff 2021-02-05 17:07:14 CET
My UCS 5 primary is printing these messages every 30 seconds:

Message from syslogd@primary150 at Feb  5 17:03:37 ...
 kernel:[ 1069.210358] Uhhuh. NMI received for unknown reason 20 on CPU 0.

Message from syslogd@primary150 at Feb  5 17:03:37 ...
 kernel:[ 1069.210360] Do you have a strange power saving mode enabled?

Message from syslogd@primary150 at Feb  5 17:03:37 ...
 kernel:[ 1069.210360] Dazed and confused, but trying to continue

Message from syslogd@primary150 at Feb  5 17:04:07 ...
 kernel:[ 1099.210713] Uhhuh. NMI received for unknown reason 20 on CPU 0.

Message from syslogd@primary150 at Feb  5 17:04:07 ...
 kernel:[ 1099.210715] Do you have a strange power saving mode enabled?

Message from syslogd@primary150 at Feb  5 17:04:07 ...
 kernel:[ 1099.210715] Dazed and confused, but trying to continue

Message from syslogd@primary150 at Feb  5 17:04:37 ...
 kernel:[ 1129.211108] Uhhuh. NMI received for unknown reason 00 on CPU 0.

Message from syslogd@primary150 at Feb  5 17:04:37 ...
 kernel:[ 1129.211110] Do you have a strange power saving mode enabled?

Message from syslogd@primary150 at Feb  5 17:04:37 ...
 kernel:[ 1129.211111] Dazed and confused, but trying to continue
Comment 1 Philipp Hahn univentionstaff 2021-02-05 17:30:38 CET
This might be a bug in out 5.0-0+2020-12-15_generic-unsafe_amd64 template, which uses `q35` but is missing /domain/features/apic/@eoi="on".

https://unix.stackexchange.com/questions/216925/nmi-received-for-unknown-reason-20-do-you-have-a-strange-power-saving-mode-ena
Comment 2 Philipp Hahn univentionstaff 2021-04-29 11:27:01 CEST
(In reply to Philipp Hahn from comment #1)
> This might be a bug in out 5.0-0+2020-12-15_generic-unsafe_amd64 template,
> which uses `q35` but is missing /domain/features/apic/@eoi="on".

This never fixed it for me, only temporarily.
Further searching revealed <https://qemu-devel.nongnu.narkive.com/TCZgR67N/patch-hw-acpi-tco-c-fix-tco-timer-stop>, which is a bug in the Qemu Q35 model not resetting the "iTCO_wdt" (Intel Total Cost of Ownership WatchDog Timer)

Fix
===
Probably fixed by v2.9.0-rc0~56^2~1 and v2.9.0-rc4~11^2 in-top, while Debian-9-Stretch = UCS-4.4 only has 2.8

```
ucr set kernel/blacklist='nouveau;iTCO_wdt'
```

Links
=====
* <https://unix.stackexchange.com/questions/327192/unknown-nmi-reason-20-and-30-on-a-vm/647254#647254>
* <http://xen1.knut.univention.de:8000/packages/source/qemu/>