Bug 50414 - UVMM CPU model "Skylake-Client" does not work - missing CPU features "hle", "rtm"
UVMM CPU model "Skylake-Client" does not work - missing CPU features "hle", "...
Status: CLOSED WORKSFORME
Product: UCS
Classification: Unclassified
Component: Virtualization - KVM
UCS 4.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
Erik Damrose
:
Depends on: 49695
Blocks:
  Show dependency treegraph
 
Reported: 2019-10-25 17:07 CEST by Philipp Hahn
Modified: 2023-06-28 10:46 CEST (History)
6 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.143
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2019-10-25 17:07:51 CEST
The PCs used for our technical training have a "Skylake" CPU. If a VM is created and "Skylake-Client [IBRS]" is configured, the VM fails to start with Qemu complaining about the missing features supported since generation "Haswell":
  hle: Hardware Lock Elision
  rtm: Restricted Transactional Memory
The VM only starts with "SandyBridge" or older.

See <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773189> for the original bugf report and <https://www.redhat.com/archives/libvir-list/2014-December/msg00950.html> for the reason, why it was not fixed.

For UCS we simply should remove those not working CPU modules and need to add an option to explicitly configure the CPU based on features.

Technical Training 2019-10-34/25
Comment 1 Sönke Schwardt-Krummrich univentionstaff 2020-08-18 15:25:40 CEST
Summary:
In QEMU, certain CPU features are permanently assigned to the CPU types. For example, the CPU type "Skylake" provides the two CPU features "hle" and "rtm".

Now Intel has released a firmware update (new microcode for the CPUs) and disabled these two CPU features (hle and rtm), which becomes active after a reboot after the firmware update. 

This leads to the following behavior:
1) New instances where "Skylake-Client [IBRS]" is selected as CPU will not start and give an error message (other CPU types ("SandyBridge" or older) are still bootable).
2) Shut down instances (with and without saved state), where "Skylake-Client [IBRS]" was selected as CPU, cannot be started again either.

For new instances or instances without saved state (normal VM shutdown), the XML definition of the instance can be adjusted accordingly and these two features can be explicitly deactivated. The VM then starts again.

For instances whose state was also saved, the XML configuration cannot easily be adjusted with board means, because qemu in UCS is currently too old. Philipp spoke of more complex tricks that can still be used if necessary.

The current solution is to discard the saved state (CPU+RAM) for these instances (i.e. *no* clean shutdown of the VM), and then adjust the XML configuration. This may result in data loss if, for example, applications with unsaved data were still active during hibernate.

As a result, existing instances cannot be restarted after a Suspend-To-Disk or complete shutdown.
If instances are switched off, the XML definition of the instance can be easily adapted. With suspended instances this becomes more complicated and may result in data loss (from the saved state).
Comment 3 Sönke Schwardt-Krummrich univentionstaff 2020-08-19 10:59:30 CEST
A short test showed, that the following CPU models are NOT affected:

Sandy Bridge
Sandy Bridge [IBRS]
Ivy Bridge
Ivy Bridge [IBRS]
Haswell [noTSX]
Haswell [noTSX,IBRS]
Broadwell [noTSX]
Broadwell [noTSX,IBRS]

The following CPU models seem to be affected:

Haswell
Haswell [IBRS]
Broadwell
Broadwell [IBRS]
Skylake Client
Skylake Client [IBRS]

> The VM only starts with "SandyBridge" or older.

Correction: "Ivy Bridge" and older seem to not affected.
Comment 4 Erik Damrose univentionstaff 2020-08-19 11:29:42 CEST
I will set "Who will be affected by this bug?" to 1, until now the issue is reported for our internal environment only.
Comment 5 Philipp Hahn univentionstaff 2020-08-24 18:39:27 CEST
(In reply to Sönke Schwardt-Krummrich from comment #1)
> Summary:
> In QEMU, certain CPU features are permanently assigned to the CPU types. For
> example, the CPU type "Skylake" provides the two CPU features "hle" and
> "rtm".
> 
> Now Intel has released …

Yes and no: Haswell, Broadwell, Skylake were releases 2014 and were buggy from the start, so Intel disabled "hle" and "rtm" to prevent "data corruption". No µCode was ever released to fix that feature.
libvirt defined the models for those CPUs based on the _original_ features when "hle" and "rtm" were still enabled. Choosing those original models does not work as those features are now disabled because of the µCode update.

Later generations then contained a _fixed_ implementations and the feature was re-enabled then. But 2019 the TAA vulvernatibily was discovered, in which those re-enabled instructions can be used to start a side-channel attack to get access to data of other users. One mitigation is to disable "rtm" and "hle" again, but this time not because of "data corruption" but for "security issues". If you don't care about the security implications you can re-enable it with "tsx=on" on the Linux kernel command line.

Our version of libvirt is too old to already know the new CPUs, so it models them as an old CPU with all known extra features on top. This is both used to guarantee compatibility for live migration between different hosts but also to allow restoring the saved state from "managedsave" and "snapshot": libvirt/Qemu does now know if the guest used "hle" or "rtm", so it requires those features to be present when the saved VM state is restored. If the µCode update disabled those features in between libvirt/Qemu would not be able to guarantee, that the VM would still run okay. As ab Admin you then get this ugly error message and can decide to either cold-start the VM or re-enable tsx again manually. That is all you can do for already created VM states.

For snapshots or managed-save states created after the µCode update this again is not a problem as
1. the VM would not cold-boot at all if you selected one of the models like "Haswell, Broadwell, Skylae" as they require "hle" and "rtm".
2. If you use mode="host-model" libvirt will automatically select the noTSX variants as the features are no longer supported by the host CPU.


Ubuntu added its own set of CPU definitions to at least make the model configurable, e.g. provide a model where "hle" and "rtm" are always disabled: <https://git.qemu.org/?p=qemu.git;a=commitdiff;h=2061735ff09f9d5e67c501a96227b470e7de69b1;hp=996970236c00f244ed9518238fef480725a40ff2>

Using this allows live-migration again between hosts with different µCode state.

https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/TAA_MCEPSC_i915
https://errata.software-univention.de/#/?erratum=4.4x344
https://errata.software-univention.de/#/?erratum=4.4x401


What we can do:
- Remove those 6 models from UVMM which will never work anyway
- and/or add code to only show the CPU models usable on the host (virsh domcapabilities)
- provide better error message and link to SDB article describing the issue and listening the option
  - virsh managedsave-remove
  - virsh snapshot-revert --force $VM $SNAP ¹
  - virsh edit $VM

¹: Our version of libvirt has a 2nd bug where the snapshot meta data only stores the XML definition of the "running" VM, not the "offline" VM. With "<cpu mode="host-model"/>" this leads to the problem, that even after a "snapshort-revert" the *expanded* CPU configuration from the "running" configuration is copied back to the "offline" definition, which then will still contain the "<feature policy='require' name='rtm'/>…" statements which prevent the snapshot from being restored. The whole <cpu…/>-block must be replaced manually with the "<cpu mode="host-model"/>" to again have libvirt model the host CPU, now with "rtm" and "hle" being absent.
Comment 7 Lukas Zumvorde univentionstaff 2020-08-25 11:37:26 CEST
With what package and version is the Microcode Upgrade introduced? We have a customer that wants to know in order to prevent an upgrade of those packages if possible. 

Another question is how one can find out if one is affected if one has used the "default" CPU setting for the VM.
Comment 9 Philipp Hahn univentionstaff 2020-08-25 13:44:24 CEST
(In reply to Lukas Zumvorde from comment #7)
> With what package and version is the Microcode Upgrade introduced? We have a
> customer that wants to know in order to prevent an upgrade of those packages
> if possible. 

See <https://errata.software-univention.de/#/?version=4.4-x&package=intel-microcode&package=linux> - it depends on both the µCode update and the Linux kernel.
If you need more details you have to check your self which exact CPU family/model/stepping you have and if it is affected by any of the µCode updates. Due to missing test HW I cannot tell more.

A quick check is
  grep -e hle -e rtm /proc/cpuinfo
but other features may lead to similar issues. If it find those two features your MIGHT me affected, but nothing is confirmed until you install the update and see if the flags changed.
Even better is to compare the output of `virsh domcapabilities`: If you see any change expect problems.

Again: this is a combination of Linux Kernel AND µCode - give `tsx=on` a try. I have no HW to test that myself.

> Another question is how one can find out if one is affected if one has used
> the "default" CPU setting for the VM.

virsh [-c "$HOST"] dumpxml "$VM" | xmllint --xpath /domain/cpu -

- "no entry" → not affected
- <cpu mode="host-passthrough"/> → not affected
- Nehalem* | Westmere* | IvyBridge* → not affected (too old)
- Haswell-noTSX | Broadwell-noTSX → not affected (TSX already disabled)
- Haswell | Broadwell | SandyBridge → MAYBE affected (may use TSX)
- <cpu mode="host-model"/> → MAYBE affected (depends on host CPU)
- …

The same must be checked for all snapshots (virsh snapshot-dumpxml $VM $SNAP) and all saved states (virsh managedsave-dumpxml $VM) (our libvirt is too old for that).
Comment 10 Philipp Hahn univentionstaff 2020-08-28 10:11:56 CEST
The Linux Kernel has a mitigation against TAA since 2019-11, Bug #50486 <https://errata.software-univention.de/#/?erratum=4.4x342>.

I successfully confirmed that using "grub/append=tsx=on" re-enables TSX and TSX is usable again by libvirt/qemu (virsh domcapabilities).

So the problem is not new and is not triggered by the µCode update - contrary the previous or future µCode updates will add new controls to the CPU which allow the Host/Guest-OS and libvirt/Qemu-Hypervisor to use alternatives mitigations, which allow TSX to remain enabled. The Linux kernel already has different alternatives, see <https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html>.
Comment 11 Ingo Steuwer univentionstaff 2020-09-21 13:27:25 CEST
we've documented the potential problems and ways to solve them here:
https://help.univention.com/t/error-starting-domain-guest-cpu-doesnt-match-specification/16041

As of now I don't see a generic way to detect whether a machine or one of it's snapshots is affected on one of the potential hypervisors. So I suggest to close this Bug as "worksforme".

@Erik - may I ask you to do QA or identity someone who will QA & close the bug report?