Bug 48024 - SMP Windows VM crash (after live migration) - STOP 0x00000101
SMP Windows VM crash (after live migration) - STOP 0x00000101
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Virtualization - KVM
UCS 4.3
Other Linux
: P5 normal (vote)
: UCS 4.3-3-errata
Assigned To: Philipp Hahn
Jürn Brodersen
https://fosdem.org/2019/schedule/even...
:
Depends on:
Blocks: 47617 50536
  Show dependency treegraph
 
Reported: 2018-10-19 14:43 CEST by Philipp Hahn
Modified: 2019-11-25 17:15 CET (History)
1 user (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.400
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2018102521000291
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2018-10-19 14:43:16 CEST
Windows VMs (with at least 2 vCPUs) may crash during normal operation or after live migration when the host system is too busy to serve interrupts in time. Windows will then crash with a blue screen and error 
  Stop 0x00000101
  "A clock interrupt was not received on a secondary processor within the allocated time interval"

This probably will happen a lot more with post-copy enabled (Bug #47617), as then the VM already starts running on the target VM and fetching dirty pages from the source host will take a very long time if the migration is still ongoing, reducing the available network bandwidth and increasing network latency by magnitude.

According to <http://blog.wikichoon.com/2014/07/enabling-hyper-v-enlightenments-with-kvm.html> this happens because Windows is not "enlightened" that it is running as a VM. For example Windows then runs a periodic timer to check the CPUs for liveness (Linux does the same with its SoftLockup detection). With the "Hyper-V Enlightment" enabled Windows disables that (or changes the timeouts) and uses Hypervisor specific functions to get the current time similar to what Linux does when running as a VM. Similar with VirtIO.

These "Hyper-V Enlightment" features need to be enabled in libvirt XML:
<domain ...>
...
 <features>  
  <hyperv>  
   <relaxed state='on'/>  
   <vapic state='on'/>  
   <spinlocks state='on' retries='8191'/>  
  </hyperv>  
 <features/>  
...  
 <clock ...>  
  <timer name='hypervclock' present='yes'/>  
 </clock>  
...
</domain>

I'm not aware that those additional features will break any Linux VM, so we can enable them for all (new) VMs or extend the UVMM profiles for Windows to only enable those features explicitly if requested.
Comment 2 Philipp Hahn univentionstaff 2018-10-25 12:11:42 CEST
Customer reported that with cpu=kmv64 Windows crashed during reboot and keeps rebooting in an endless loop.
Changing cpu=host-model fixed it for me.
Comment 4 Jürn Brodersen univentionstaff 2018-12-11 10:44:35 CET
What I tested:
Check if machines start with Hyperv option:
  Win 7 with: OK
  Win 7 without: OK
  Win 8 with: OK
  Win 8 without: OK
  Win 10 with: OK
  Win 10 without: OK
  Ubuntu 1810 with: OK
  Ubuntu 1810 without: OK
  UCS 4.3 with: OK
  UCS 4.3 without: OK

YAML: missing?

-> Reopen for yaml and merge (Everything else is good)
Comment 5 Philipp Hahn univentionstaff 2018-12-11 11:39:52 CET
(In reply to Jürn Brodersen from comment #4)
> YAML: missing?

Lost is rebasing 

> -> Reopen for yaml and merge (Everything else is good)

[juern/uvmm] 29c90a7c3a Bug #48024 uvmm: Hyper-V Enlightment YAML
 doc/errata/staging/univention-virtual-machine-manager-daemon.yaml | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)
Comment 6 Philipp Hahn univentionstaff 2018-12-11 14:17:14 CET
[4.3-3] f99717e953 Bug #47617, Bug #47741, Bug #36661, Bug #48199, Bug #48024, Bug #45498, Bug #35196

Package: univention-virtual-machine-manager-daemon
Version: 7.0.0-17A~4.3.0.201812111413
Branch: ucs_4.3-0
Scope: errata4.3-3

[4.3-3] 582fb65dce Bug #47617: univention-virtual-machine-manager-daemon 7.0.0-17A~4.3.0.201812111413
 doc/errata/staging/univention-virtual-machine-manager-daemon.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 7 Arvid Requate univentionstaff 2018-12-12 13:45:40 CET
<http://errata.software-univention.de/ucs/4.3/382.html>