Bug 21386 - No migration possible between different CPU generations
No migration possible between different CPU generations
Product: UCS
Classification: Unclassified
Component: Virtualization - UVMM
UCS 4.3
Other Linux
: P3 normal (vote)
: UCS 4.3-2-errata
Assigned To: Philipp Hahn
Jürn Brodersen
Depends on: 47857
Blocks: 47923 48535 48536 49425
  Show dependency treegraph
Reported: 2011-01-31 18:00 CET by Philipp Hahn
Modified: 2019-05-07 16:26 CEST (History)
4 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.240
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support: Yes
Ticket number: 2017072121000252
Bug group (optional): Large environments, Usability
Max CVSS v3 score:

Parse Xen hw_caps CPUID (8.18 KB, text/plain)
2011-02-01 12:02 CET, Philipp Hahn

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2011-01-31 18:00:57 CET
Am Ticket #2011012510001527 ist aufgefallen, daß es Probleme mit der Migration zwischen Systemen mit unterschiedlichen CPUs gibt:

# ssh xenXXXXXXXX0[89] xm info | sed -ne 's/hw_caps.*: //p'
          ^                     ^    ^^

libvirt <http://libvirt.org/formatdomain.html#elementsCPU> bietet
bereits Support für das Einschränken der CPUID
<http://www.sandpile.org/ia32/cpuid.htm>, mit Kvm tut das auch, bei
Xen ist das nicht per libvirt nutzbar, aber zumindest im klassischen
Xen-Xm-Format kann man das angeben:


#   Configure guest CPUID responses:
#cpuid=[ '1:ecx=xxxxxxxxxxx00xxxxxxxxxxxxxxxxxxx,
#           eax=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ]
# Each successive character represent a lesser-significant bit:
#  '1' -> force the corresponding bit to 1
#  '0' -> force to 0
#  'x' -> Get a safe value (pass through and mask with the default
#  'k' -> pass through the host bit value
#  's' -> as 'k' but preserve across save/restore and migration
#   Configure host CPUID consistency checks, which must be satisfied
for this
#   VM to be allowed to run on this host's processor type:
#cpuid_check=[ '1:ecx=xxxxxxxxxxxxxxxxxxxxxxxxxx1xxxxx' ]
# - Host must have VMX feature flag set
# The format is similar to the above for 'cpuid':
#  '1' -> the bit must be '1'
#  '0' -> the bit must be '0'
#  'x' -> we don't care (do not check)
#  's' -> the bit must be the same as on the host that started this VM

Wünschenswert wäre es, wenn UVMM die CPUID von allen Host in einer Gruppe ausliest, sie mit Erklärung auflistet und ähnlich wie libvirts "cpu-baseline", vgl. <http://www.libvirt.org/html/libvirt-libvirt.html#virConnectBaselineCPU> und <http://www.libvirt.org/html/libvirt-libvirt.html#virDomainXMLFlags> VIR_DOMAIN_XML_UPDATE_CPU eine Möglichkeit bietet, den Standardwert für neuen Domains zu definieren.
Comment 1 Philipp Hahn univentionstaff 2011-02-01 12:02:10 CET
Created attachment 3000 [details]
Parse Xen hw_caps CPUID

Zum Parser den hw_caps Information folgendes ausführen:
  xm info | sed -ne 's/^hw_caps *: //p' | ./2011012510001527.py
Comment 2 Philipp Hahn univentionstaff 2014-02-12 09:06:52 CET
1. UVMM should warn before migration a VM between incompatible CPUs.

2. By default UVMM should use a restricted CPU feature set to guarantee migration between all hosts of the domain. For performance tuning changing the default should be documented in an extended document.
(automatically determining the best CPU set is considered too complex for a normal administrator and too error prone.)

PS: As of 2014-02-12 libvirt-xen still doesn't seem to support VIR_CPU_MODE_*

The problem appeared again at a different customer.
Comment 3 Philipp Hahn univentionstaff 2014-12-02 11:54:53 CET
Asked for again: UVMM should block migration is the CPUs are incompatible.
Comment 4 Stefan Gohmann univentionstaff 2016-04-25 07:52:09 CEST
This issue has been filed against UCS 2.4.

UCS 2.4 is out of maintenance and many UCS components have vastly changed in
later releases. Thus, this issue is now being closed.

If this issue still occurs in newer UCS versions, please use "Clone this bug".
In this case please provide detailed information on how this issue is affecting
Comment 5 Philipp Hahn univentionstaff 2017-07-21 17:11:28 CEST
Still a problem with KVM.
OpenStack fixed it here: <https://bugs.launchpad.net/nova/+bug/1082414>.
Comment 6 Philipp Hahn univentionstaff 2018-06-22 16:25:58 CEST
The easiest thing is to set 
  <cpu mode="host-model"/>

for which libvirt will insert its view of the host CPU into the XML while the VM is running. If such a VM is migrated to an incompatible host, migrate() will show an error:

virsh # uri 
virsh # migrate --domain phahn_cpu_migration --desturi qemu+tls://utby.knut.univention.de/system?pkipath=/home/phahn/.pki/libvirt --live --persistent --undefinesource --verbose
error: the CPU is incompatible with host CPU: Host CPU does not provide required features: pclmuldq, smx, fma, pcid, x2apic, movbe, tsc-deadline, aes, xsave, osxsave, avx, f16c, rdrand, arat, fsgsbase, tsc_adjust, bmi1, avx2, smep, bmi2, erms, invpcid, xsaveopt, pdpe1gb, abm

mode="host-model" has one big disadvantage, namely that there are known cases where libvirt will create a virtual CPU which does not exist in reality and which will make the guest OS crash. The long story short is, that the set of usable CPU features depends on the host CPU *AND* the Qemu version *AND* Linux kernel. Only Qemu-2.9 and libvirt-3.2 ask each other to get rid of that problem.

So we need to make it configurable which CPU to use:
- "host-passthrough" for maximum performance
- "host-model" for save migration
- custom cpu model from `virsh cpu-models x86_64`

Links to read:
* <https://wiki.libvirt.org/page/TodoPreMigrationChecks>
* <https://bugzilla.redhat.com/show_bug.cgi?id=1055002>
* <https://bugzilla.redhat.com/show_bug.cgi?id=824989>
* <https://libvirt.org/formatdomain.html#elementsCPU>
Comment 7 Philipp Hahn univentionstaff 2018-07-04 12:54:39 CEST
Links to read:
* <https://www.berrange.com/posts/2018/06/29/cpu-model-configuration-for-qemu-kvm-on-x86-hosts/>
* <https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08422.html>

- Add a UCRV to disable migration by default
- Add UI in UMC-UVMM to select:
   "host-passthrough" for maximum performance in single-host environments
   "host-model" for 99% in multi-host-environments requiring migration
   "custom": very long list of models and flags depending on Qemu/libvirt/µCode/HW version
   "default": unspecified as currently

- As libvirt might create a invalid "host-model", we may need a mechanism to provide an override?
- Or update libvirt and qemu to a later version, which works "as expected by the customer".
Comment 8 Philipp Hahn univentionstaff 2018-08-22 07:36:24 CEST
I talked to the customer and UCS-4.3-x is okay for them as they're currently updating their environment to UCS-4.3 and have waited long enough for this feature, so they are willing to wait some more to get the improved version of UVMMd.

The idea is as following:
- uvmmd will fetch both inactive (and active) domain XML.
- the active domain XML is only needed to get the VNC port if the VM is running
- if the inactive domain XML is missing <cpu mode='host-model'/>, uvmmd will automatically add it. It will take effect on the next (shutdown+)start.
- this will be the default behavior, but a UCRV will allow that to be turned off or even to remove that line if it exists. This is for those situations where host-model breaks or is not required (single-host).
- care must be takren to not mix the active and interactive XML, as during run-time the active domain XML contains the concrete model; using the active domain XML to define an interactive domain XML will not reset the CPU model to "host-model" to be filled in next time the domain is started.
- We will ignore the new features IBRS,IBPB,STIBP,SSB provided by newer micro-code updates for now. libvirt neither lists them in "capabilities" nor "domcapabilities", so there is no remote mechanism using only libvirtd to detect those and enable them when available.
Comment 9 Philipp Hahn univentionstaff 2018-10-02 14:46:17 CEST
[4.3-2] 7b0a03f869 Bug #21386: Merge branch 'phahn/21386-uvmm-cpu-migrate' into 4.3-2
[4.3-2] 7d448075e1 Bug #45721 UVMM: Handle backup exception
[4.3-2] dafe54d7de Bug #45721 UVMM: Handle broken UVMM connection
[4.3-2] 925f492833 Bug #21386 UVMM: Handle connection close exception
[4.3-2] 803a932581 Bug #21386 UVMM: Close files through context
[4.3-2] e54a02d530 Bug #21386 UVMM: Add more debug
[4.3-2] 7ea8b9ffda Bug #21386 UVMM: Switch to absolute imports
[4.3-2] daf82260bf Bug #21386 UVMM: Switch to EnvironmentError
[4.3-2] b736476893 Bug #21386 UVMM: Use native logger string substitution
[4.3-2] d6adbd49e3 Bug #21386 UVMM: Fix exception printing
[4.3-2] d17927eb93 Bug #21386 UVMM: Convert legacy exception arguments
[4.3-2] c924d39cb2 Bug #21386 UVMM: Exception renaming
[4.3-2] e19284251d Bug #21386 UVMM: Code cleanup
[4.3-2] bba38d99df Bug #21386 UVMM: Fix storage exception
[4.3-2] 55d2b3ceed Bug #21386 UVMM: Assert compatible CPU during live migration
[4.3-2] c50d53dbd4 Bug #21386 UVMM: Use listAll() methods
[4.3-2] 806b90619f Bug #21386 UVMM: Handle transitioned domains
[4.3-2] 8d90c7278b Bug #21386 UVMM: Un-private _update_xml
[4.3-2] 157279c3f2 Bug #21386 UVMM: Split update_expensive into parts
[4.3-2] 81c80e1ca1 Bug #21386 UVMM: Split xml2obj into parts
[4.3-2] 81ce1a12e6 Bug #21386 UVMM: Switch to new event model
[4.3-2] 43fb33e9d2 Bug #21386 UVMM: Unify handler deregistration
[4.3-2] 63725b2dc0 Bug #21386 UVMM: Remove leftover supports_suspend|snapshot flags
[4.3-2] 2e67ce188e Bug #21386 UVMM: Remove import fallback
[4.3-2] d3421471b0 Bug #21386 UVMM: Remove unused Data_StoragePool
[4.3-2] ff69c39f63 Bug #21386 UVMM: Simplify media change detection
[4.3-2] 72f06ed1c0 Bug #21386 UVMM: Document UCRV uvmm/umc/autoupdate/interval

Package: univention-virtual-machine-manager-daemon
Version: 7.0.0-11A~
Branch: ucs_4.3-0
Scope: errata4.3-2

TODO: Write more documentation
Comment 10 Philipp Hahn univentionstaff 2018-10-02 16:08:48 CEST
[4.3-2] 17506eb5e4 Bug #21386: univention-virtual-machine-manager-daemon 7.0.0-11A~
 .../univention-virtual-machine-manager-daemon.yaml       | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)
Comment 11 Philipp Hahn univentionstaff 2018-10-05 17:02:11 CEST
[4.3-2] 8c3869ed43 Bug #21386 UVMM: Fix spelling mistakes in UCR variable descriptions.
 .../univention-virtual-machine-manager-daemon/debian/changelog    | 6 ++++++
 ...al-machine-manager-daemon.univention-config-registry-variables | 8 ++++----
 2 files changed, 10 insertions(+), 4 deletions(-)

Package: univention-virtual-machine-manager-daemon
Version: 7.0.0-12A~
Branch: ucs_4.3-0
Scope: errata4.3-2

[4.3-2] 7783f9143a Bug #21386: univention-virtual-machine-manager-daemon 7.0.0-12A~
 doc/errata/staging/univention-virtual-machine-manager-daemon.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 12 Philipp Hahn univentionstaff 2018-10-05 17:32:11 CEST
[4.3-2] 272abf1a0c Bug #21386: univention-virtual-machine-manager-daemon 7.0.0-12A~
 .../univention-virtual-machine-manager-daemon.yaml |  6 +--
 doc/manual/idm-cloud-en.xml                        |  2 +-
 doc/manual/uvmm-en.xml                             | 55 ++++++++++++++++++++++
 3 files changed, 59 insertions(+), 4 deletions(-)
Comment 13 Jürn Brodersen univentionstaff 2018-10-08 10:20:06 CEST
(In reply to Philipp Hahn from comment #12)
> [4.3-2] 272abf1a0c Bug #21386: univention-virtual-machine-manager-daemon
> 7.0.0-12A~
>  .../univention-virtual-machine-manager-daemon.yaml |  6 +--
>  doc/manual/idm-cloud-en.xml                        |  2 +-
>  doc/manual/uvmm-en.xml                             | 55
> ++++++++++++++++++++++
>  3 files changed, 59 insertions(+), 4 deletions(-)

Documentation moved to bug 47923
Comment 14 Jürn Brodersen univentionstaff 2018-10-09 10:41:08 CEST
What I tested:
Migration between different CPUs
  No error if cpu model not in dom description -> OK
  Error if incompatible cpu model in dom description -> OK
  No error if compatible cpu model in dom description -> OK

  Not set -> No changes -> OK
  always -> overrides changes -> OK
  missing -> doesn't override changes -> OK
  remove -> removes host-model -> OK
  qemu process restarted in case the host-model was set -> OK

Overall UVVM functionality
  snapshots -> OK
  vnc -> OK
  network -> OK
  No regressions noticed -> OK

-> Verified
Comment 15 Arvid Requate univentionstaff 2018-10-10 12:31:34 CEST