Bug 44084 - Fix QEMU suspend2disk / live-migration / running-snapshot issues
Fix QEMU suspend2disk / live-migration / running-snapshot issues
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Virtualization - KVM
UCS 4.2
All Linux
: P3 normal (vote)
: UCS 4.2-3-errata
Assigned To: Philipp Hahn
Erik Damrose
http://sdb.univention.de/solution_id_...
:
Depends on: 38877
Blocks: 46217
  Show dependency treegraph
 
Reported: 2017-03-27 14:23 CEST by Philipp Hahn
Modified: 2018-02-05 13:22 CET (History)
4 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 3: Will affect average number of installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.257
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2017102021000319
Bug group (optional):
Max CVSS v3 score:


Attachments
Check BIOS (1.13 KB, text/plain)
2017-08-24 16:14 CEST, Philipp Hahn
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2017-03-27 14:23:14 CEST
+++ This bug was initially created as a clone of Bug #44083 +++
+++ This bug was initially created as a clone of Bug #38877 +++
With QEMU-2.8 as used in UCS-4.2-0 the
- suspend2disk (managedsave) state
- live-migration from previous UCS releases to UCS-4.2
- restoration of snapshots of running VMs
does not work reliably: Either the state is not restored completely and the guest-OS is stuck in some endless loop, or the guest OS crashes immediately and the VM is rebootet.

After the UCS-4.2 release this issue should be investigated and fixed.
Comment 1 Philipp Hahn univentionstaff 2017-04-12 17:19:50 CEST
Package: qemu
Version: 1:2.8.1+dfsg-0~bpo8+0A~4.2.0.201704121713
Branch: ucs_4.2-0
Scope: errata4.2-0
Comment 2 Philipp Hahn univentionstaff 2017-04-21 09:10:39 CEST
There is a new "QEMU 2.8.1.1 CVE update" fixing
- virtfs/virtio-9p shared directories (CVE-2017-7471)
Comment 3 Philipp Hahn univentionstaff 2017-08-22 17:01:25 CEST
I was able to get Bug #38877 comment 12 again.

seabios c68aff5 and b837e6 look like good candidates.
seabios_1.10.2-1 was copied from Debian-Stretch to ucs_4.1-0-errata4.1-4 and ucs_4.2-0-errata4.2-1

r82417 | Bug #44084: seabios.yaml (4.2-1)
r82418 | Bug #44084: seabios.yaml (4.1-4)
Comment 4 Philipp Hahn univentionstaff 2017-08-24 15:41:05 CEST
r82465 | Bug #44084 up: Require seabios_1.10.2-1 before updating

cd /var/univention/buildsystem2/test_mirror/ftp/4.2/maintained/4.2-0/all/
cp -f ~/GIT2/base/univention-updater/script/preup.sh .
repo-ng-sign-release-file -i preup.sh -o preup.sh.gpg -k 6B8BFD3C -p /etc/archive-keys/ucs4.0.txt
cd /var/univention/buildsystem2/mirror/testing/4.2/maintained/4.2-0/all
cp -f ~/GIT2/base/univention-updater/script/preup.sh .
repo-ng-sign-release-file -i preup.sh -o preup.sh.gpg -k 6B8BFD3C -p /etc/archive-keys/ucs4.0.txt

TODO-after-QA: /var/univention/buildsystem2/mirror/ftp/4.2/maintained/4.2-0/all
Comment 5 Philipp Hahn univentionstaff 2017-08-24 16:14:13 CEST
Created attachment 9142 [details]
Check BIOS

The attached Python script iterates over all running VMs, extracts the last 64 KiB of the BIOS before the 1 MiB and 4 GiB memory areas, and prints the last 32 Bytes including the Python hash():
- the 4 GiB hash should be constant even for reboots
- the 1 MiB hash is different due to the corruption

After more testing (and QA) we can add this to <http://sdb.univention.de/solution_id_1384.html>.
We will not add any check to UVMM for now.
Comment 6 Erik Damrose univentionstaff 2017-09-04 12:41:37 CEST
r82604 + 82605 move to 4.2-2-errata
Comment 7 Erik Damrose univentionstaff 2017-09-13 11:34:58 CEST
./check_bios hashes for seabios version
1.7.0-1 (e414): 684f2ef8f1380328
1.9.3-2~bpo8+1 (4.2-0): ba70fceadfb0adaa
1.10.2-1 (e421): 5812fbb675d5325a

Tests TODO:
    Migration to/from seabios version 1.9.3-2~bpo8+1 (UCS 4.2-0) to 1.10.2-1 (unreleased 4.2-1 errata)

OK:
    From 4.1 to 4.2
    * migrate running ucs 4.2 instance
    * migrate running win10 instance

Fail: 
    * Win7 instance (tested with 2 different ones) migration failed
    ** tested with states running, paused, suspended => after migration, the instance uses 100% cpu and is frozen (no gui and no network response)
    ** tested with ucs 4.2 qemu 1:2.8+dfsg-3~bpo8+1A~4.2.0.201703271321 (4.2-0) and 1:2.8.1+dfsg-0~bpo8+0A~4.2.0.201704121713 (unreleased 4.2-0-errata)
Comment 8 Philipp Hahn univentionstaff 2017-10-26 12:54:25 CEST
We need to release the SeaBIOS update ASAP, as this blocks 00026: After migrating VM in state "shutdown" and starting them on a new UCS-4.2-2 host, the broken 1.9 version will get used, which will hang on the next reboot and then requires user intervention! (Even after updating /domain/os/type/@machine='pc-i440fx-2.8')

I successfully installed Win7, Win8, Win10, UCS-42 using the new SeaBIOS-1.10 successfully and migrated them live successfully between UCS-4.2-2 host with the new SeaBIOS-1.10

(In reply to Philipp Hahn from comment #0)
> With QEMU-2.8 as used in UCS-4.2-0 the
> - suspend2disk (managedsave) state
> - live-migration from previous UCS releases to UCS-4.2
> - restoration of snapshots of running VMs
> does not work reliably: Either the state is not restored completely and the
> guest-OS is stuck in some endless loop, or the guest OS crashes immediately
> and the VM is rebootet.

For 00026 it is sufficient to update SeaBIOS with UCS-4.2 and do the "turned-off" migration, so live migration is not required for them.
The issues for VMState remain: If a customer complains, we can clone since bug once more.
Comment 9 Philipp Hahn univentionstaff 2017-10-26 12:55:21 CEST
PS: The warning remain for now:
ssh omar sed -ne '/^check_qemu/,/^check_qemu/p' /var/univention/buildsystem2/mirror/ftp/4.2/maintained/4.2-?/all/preup.sh
Comment 10 Philipp Hahn univentionstaff 2017-10-26 15:30:44 CEST
$ qemu-system-x86_64 -M \?
Supported machines are:
pc                   Standard PC (i440FX + PIIX, 1996) (alias of pc-i440fx-2.8)
pc-i440fx-2.8        Standard PC (i440FX + PIIX, 1996) (default)
pc-i440fx-2.7        Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.6        Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.5        Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.4        Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.3        Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.2        Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.1        Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.0        Standard PC (i440FX + PIIX, 1996)
pc-i440fx-1.7        Standard PC (i440FX + PIIX, 1996)
pc-i440fx-1.6        Standard PC (i440FX + PIIX, 1996)
pc-i440fx-1.5        Standard PC (i440FX + PIIX, 1996)
pc-i440fx-1.4        Standard PC (i440FX + PIIX, 1996)
pc-1.3               Standard PC (i440FX + PIIX, 1996)
pc-1.2               Standard PC (i440FX + PIIX, 1996)
pc-1.1               Standard PC (i440FX + PIIX, 1996)
pc-1.0               Standard PC (i440FX + PIIX, 1996)
pc-0.15              Standard PC (i440FX + PIIX, 1996)
pc-0.14              Standard PC (i440FX + PIIX, 1996)
pc-0.13              Standard PC (i440FX + PIIX, 1996)
pc-0.12              Standard PC (i440FX + PIIX, 1996)
pc-0.11              Standard PC (i440FX + PIIX, 1996)
pc-0.10              Standard PC (i440FX + PIIX, 1996)
q35                  Standard PC (Q35 + ICH9, 2009) (alias of pc-q35-2.8)
pc-q35-2.8           Standard PC (Q35 + ICH9, 2009)
pc-q35-2.7           Standard PC (Q35 + ICH9, 2009)
pc-q35-2.6           Standard PC (Q35 + ICH9, 2009)
pc-q35-2.5           Standard PC (Q35 + ICH9, 2009)
pc-q35-2.4           Standard PC (Q35 + ICH9, 2009)
isapc                ISA-only PC
none                 empty machine
xenfv                Xen Fully-virtualized PC
xenpv                Xen Para-virtualized PC

For 1.x (pc-1.1 .. pc-1.3, pc-i440fx-1.4 .. pc-i440fx-1.7):
strace -e t=open ./qemu-system-x86_64 -nographic -m 1G -M pc-i440fx-1.7,accel=kvm -kernel /boot/vmlinuz-`uname -r` -initrd /boot/initrd.img-`uname -r` -append 'console=ttyS0 quiet break=top'
...
open("/usr/share/seabios/bios.bin", O_RDONLY) = 10

For 2.x (pc-i440fx-2.0 .. px-i440fx-2.8):
strace -e t=open ./qemu-system-x86_64 -nographic -m 1G -M pc-i440fx-2.0,accel=kvm -kernel /boot/vmlinuz-`uname -r` -initrd /boot/initrd.img-`uname -r` -append 'console=ttyS0 quiet break=top'
open("/usr/share/seabios/bios-256k.bin", O_RDONLY) = 13


hw/i386/pc_piix.c:
  431 static void pc_i440fx_machine_options(MachineClass *m)
...
  436     m->default_machine_opts = "firmware=bios-256k.bin";
...
 573 static void pc_i440fx_1_7_machine_options(MachineClass *m)
...
 578     m->default_machine_opts = NULL;
Comment 11 Erik Damrose univentionstaff 2017-12-06 13:27:43 CET
OK: seabios 1.10.2-1 in 4.1-5, 4.2-3
tested VM OSes: ucs 4.2, win7, win10, win2k8, win2k12

OK: Livemigration 4.1->4.1 and 4.2->4.2 with new seabios version; snapshots, reboots
OK: Migration of shutdown VMs 4.1->4.2
OK: changed 4.2-0 preup script.

I adapted the support article about the QEMU update issues. I removed the part about changing the virtual machine hardware configuration, as that is not required anymore with the new seabios version.

https://help.univention.com/t/qemu-suspend-to-disk-and-live-migration-issues-with-ucs-4-2/6498
Comment 12 Arvid Requate univentionstaff 2017-12-06 15:40:15 CET
<http://errata.software-univention.de/ucs/4.2/233.html>
Comment 13 Arvid Requate univentionstaff 2017-12-06 16:59:11 CET
<http://errata.software-univention.de/ucs/4.1/486.html>