Univention Bugzilla – Bug 50124
WinPE does not work if HyperV Enlightenment is activated
Last modified: 2023-06-28 10:46:28 CEST
If HyperV Enlightenment is activated the boot of the Windows PE fails. In this case the PE stops. If the server boots into the recovery system due to an error (which is only a "pimed" PE) the PE crashes and hangs a reboot loop, because the system tries to load the recovery PE again and again. With a Server 2019, the problem with the PE (system simply hangs during boot) obviously also occurs during normal startup of the VM and not only with the PE. Also here it helped to remove the Hyper-V Enlightenment. After that the server started again. The HyperV Enlightenment feature is absolutely necessary for the live-migration in the customer environment
Hyper-V can be disabled via the UVMM UMC module. "Windows Server 2019" was never tested by me or as far as I know from anyone else here at Univention. Please try different CPU models as crashing VMs are most often caused by Qemu providing an artificial CPU with a combination of features / CPU model / CPU level not matching any real CPU from Intel / AMD - OSs then often take the wrong choice for the low level HW modules. For example: The default Qemu CPU identifies itself as an "Core 2 Duo" CPU with lots of extra CPU features from more modern CPUs. If "Hyper-V Enlightment" is enabled, this even adds more features. Windows for example than assumes, that as Hyper-V is available, the CPU generation must be at least from "201x", which all have feature X - but if that X is not provides from Qemu, the guest OS will crash as soon as it uses feature X. Someone should first check if it is really "Hyper-V" or some missing other CPU feature required by Windows(PE).
Sounds to me like this should be analyzed more deeply in the environment where the problem occures. Or is it easily reporduceable in our test environments?
(qemu) info status VM status: paused (shutdown) (qemu) x/10i 0xfffff8025992b4d0 0xfffff8025992b4d0: mov %ecx,0x8(%rsp) 0xfffff8025992b4d4: push %rbx 0xfffff8025992b4d5: sub $0x50,%rsp 0xfffff8025992b4d9: mov %ecx,%ebx 0xfffff8025992b4db: mov %ebx,%ecx 0xfffff8025992b4dd: callq 0xfffff80259852b90 (qemu) x/1i 0xfffff80259852b90 0xfffff80259852b90: int3 (qemu) info cpus * CPU #0: pc=0xfffff80259852b90 thread_id=317 This may be caused by Bug #21860, which added patch patches/libvirt/4.4-0-0-ucs/3.0.0-4+deb9u3-errata4.4-0/0022-Bug-21860-Default-to-kvm32.quilt: It changes the default CPU model from "qemu{32,64}" to "kvm{32,64}" to get PSE-36 working again for pae kernels. If "Hyper-V Enlightenment" is _not_ enabled, `libvirt` passes no explicit `-cpu XXX` to `qemu` and thus `qemu` defaults to `qemu64`. With "Hyper-V Enlightenment" enabled, `libvirt` must pick a CPU to add the Hyper-V-Features on top - it picks 'kvm64' as the base due to the above mentioned patch. 'kvm64' is a "Family 15 Model 5" based CPU, e.g. pre-"Core"-CPU! 'qemu64' is a "Faimily 6 Model 6" based CPU, e.g. post-"Core"-CPU! <https://www.gigxp.com/windows-server-2019-system-requirements/> list the following minimum > Processor requirements: > A minimum of 1.4 GHz 64-bit EMT64 or AMD64 processor. Quad Core Recommended for production systems. > Support for security features like NX Bit and DEP (Data Execution Prevention) > The processor should support CMPXCHG16b, LAHF/SAHF, and PrefetchWNeeds > Needs to Support EPT or NPT (Second Level Address Translation) So 'kvm64' is too old. Strangely trying to start a VM with 'qemu64' fails with: > virsh # start phahn_qa36-ucs44-32b > error: Failed to start domain phahn_qa36-ucs44-32b > error: the CPU is incompatible with host CPU: Host CPU does not provide required features: svm Please note that 'svm' is the ADM feature, the Intel one is called 'vmx' - and this is an Intel host! Manually starting the VM with `qemu-system-x86_64 -cpu qemu64 ...` on the other hand works. See <https://www.redhat.com/archives/libvir-list/2016-May/msg01940.html> for a similar report. So both "qemu64" and "kvm64" are bad choices as a default and should be changed! It might not be the original problem of the customer, but at least something is fishy here and complicated reproducing the problem locally here at Univention.
I remove the waiting support flag, because the customer did not respond to our demand. The customer might not use our virtual environment anymore.
This issue has been filed against UCS 4.3. UCS 4.3 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed. If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.