Univention Bugzilla – Bug 38877
Update libvirt and qemu-kvm
Last modified: 2017-04-04 18:29:43 CEST
The libvirt and qemu-kvm packages should be updated to more up-to-date upstream versions. It must be possible to revert to old snapshots created with UCS 4.0. See also Bug #24702 and Bug #35768.
This feature has been dropped from the UCS 4.1 roadmap.
We need to address the upgrade problem from Bug #24702 and Bug #35768 again since UCS 4.2 will be skipped with the Jessie packages.
virsh # start ucs32-64 error: Failed to start domain ucs32-64 error: internal error: early end of file from monitor: possible problem: 2016-12-22T15:27:54.084591Z qemu-system-x86_64: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 10000 in != 20000 2016-12-22T15:27:54.084663Z qemu-system-x86_64: Ack, bad migration stream! 2016-12-22T15:27:54.084682Z qemu-system-x86_64: Illegal RAM offset 770632e62696000 qemu: warning: error while loading state for instance 0x0 of device 'ram' 2016-12-22T15:27:54.084724Z qemu-system-x86_64: load of migration failed: Invalid argument
The good news: The "Qemu VM Save Stream" contains the content of the previous ROM files, so it should™ be enougth to create empty files of the size used in UCS«4.2. Qemu used those paths to find its ROM (and other) files: # strings /usr/bin/qemu-system-x86_64 | grep -e /usr/share/ -e /usr/lib/ /usr/share/qemu:/usr/share/seabios:/usr/lib/ipxe/qemu The path can be changed with $ qemu-system-x86_64 --help | grep -e -L -L path set the directory for the BIOS, VGA BIOS and keymap The path can also be changed through the property "romfile": $ qemu-system-x86_64 --help | grep -e property -global driver.property=value -global driver=driver,property=property,value=value The bad news are: - libvirtd has not way to specify "-L" or "-global" (except the <qemu:commandline>) - there is no easy way to get the original ROM size from the QEVM, as the format is very qemu-version dependent (and only loosely structured) - there is no version info in the XML data to determine the Qemu version (expect the domain/os/type/@machine='pc-i440fx-2.1' attribute, which can be considered only as a hint) - the ROM files were all renamed: mv virtio-net.rom pxe-virtio.rom mv rtl8139.rom pxe-rtl8139.rom mv pcnet32.rom pxe-pcnet.rom mv eepro100.rom pxe-eepro100.rom mv e1000_82540.rom pxe-e1000.rom mv ns8390.rom pxe-ne2k_pci.rom My idea is to (more or less) use the machine-info to switch between the historic ROMs and the current ROMs when loading a VM. libvirt must be extended to do this magic, which is WIP. For now I imported the Debian-Stretch versions, as they will get the longest maintenance from Debian (and Upstream): Package: qemu Version: 1:2.8+dfsg-2~bpo8+1A~4.2.0.201702161429 Branch: ucs_4.2-0 Package: libvirt Version: 3.0.0-2A~4.2.0.201702172052 Branch: ucs_4.2-0
r17202 | Bug #38877: libvirt systemd Use older systemd from Debian-Jessie - re-introduces <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=774237> Package: libvirt Version: 3.0.0-2A~4.2.0.201702200932 Branch: ucs_4.2-0
While QEMU used a default of VGA_RAM_SIZE=8M until it was dropped by <https://www.redhat.com/archives/libvir-list/2014-August/msg00649.html> and/or got replaced by …, KVM used 9K which gets rounded up to 16M. There was a minor period in 0.11-rc0 between <https://anonscm.debian.org/cgit/collab-maint/qemu-kvm.git/commit/?id=fbe1b5953d061c77c07b91e4eb555c92195308d0> and <https://anonscm.debian.org/cgit/collab-maint/qemu-kvm.git/commit/?id=b0136de5e33a64123392a1e3ffac611e6140b39a>), where that was broken and only 8M were used. The value of /domain/devices/video/model/@vram is ignored for type='cirrus' until <https://bugzilla.redhat.com/show_bug.cgi?id=1076098>. I analysed all our saved VMs and only found vram_size=16 MiB and VirtIO-ROM=64 KiB: ldapsearch -LLLo ldif-wrap=no '(univentionService=KVM Testenv Host)' cn| sed -ne 's/^cn: //p'| xargs -rn1 -I HOST ssh -l root HOST ' for s in /var/lib/libvirt/qemu/save/*.save do ~phahn/src/VIRT/qemu-analyse-savevm "$s" 2>/dev/null| head -n50 done'| sed -rne "s/.*idstr=.+(vga\.vram|virtio-net).*/\1/;T;N;s/[0-9a-f]{16}:[ +-]*//g;s/\n/\t/;p"| sort -u ################# virtio-net length=0000000000010000 (65,536) vga.vram length=0000000001000000 (16,777,216) With a newer machine="pc-i440fx-2.1" the EFI images are used, which are 256 KiB! Now I'm stuck again with again with Bug #29355 comment 6: > 25064@1489677264.463601:qemu_loadvm_state_section 48 > Unknown savevm section type 48 FYI: The following QEMU versions were used in the following UCS releases: 0.11.1: ucs2.4-0+virtuualization 0.12.4: sec2.4-1 … 0.14.0: ucs2.4-2 … 0.14.1: ucs2.4-3 … ucs3.0-X 1.1.2: ucs3.1-0 … ucs4.1-X 2.8.0: ucs4.2.0 … Package: univention-kvm-compat Version: 1.0.0-1A~4.2.0.201703081659 Branch: ucs_4.2-0
Oh "joy": "piix4_pm" serialized by kvm-1.1.2 can not be loaded by qemu-2.8 >qemu_loadvm_state_section 4@0x00364c3b that is the correct start of 0000:00:01.3/piix4_pm >qemu_loadvm_state_section 48@0x00364da2 that is 12 bytes into 0000:00:01.2/uhci ^ ord('0')=48=0x30 <https://www.linux-kvm.org/images/6/6e/Kvm-forum-2013-migration-checker.pdf> It's commit b0b873a07872f7ab7f66f259c73fb9dd42aa66a9, which is incompatibel with qemu-kvm-1.1 r17415 | Bug #38877 qemu: Apply patches r17417 | Bug #38877 qemu: Refresh patches r17418 | Bug #38877 qemu: Split patch r17420 | Bug #38877 qemu: Split patch repo_admin.py -U -p qemu -d jessie-backports -r 4.2 # <http://metadata.ftp-master.debian.org/changelogs/main/q/qemu/qemu_2.8+dfsg-3~bpo8+1_changelog> Package: qemu Version: 1:2.8+dfsg-3~bpo8+1A~4.2.0.201703201330 Version: 1:2.8+dfsg-3~bpo8+1A~4.2.0.201703201346 Version: 1:2.8+dfsg-3~bpo8+1A~4.2.0.201703201359 Branch: ucs_4.2-0 r77991 | Bug #38877 qemu/libvirt changelog
(In reply to Philipp Hahn from comment #7) > Oh "joy": "piix4_pm" serialized by kvm-1.1.2 can not be loaded by qemu-2.8 > > >qemu_loadvm_state_section 4@0x00364c3b > that is the correct start of 0000:00:01.3/piix4_pm > >qemu_loadvm_state_section 48@0x00364da2 > that is 12 bytes into 0000:00:01.2/uhci > ^ ord('0')=48=0x30 > > <https://www.linux-kvm.org/images/6/6e/Kvm-forum-2013-migration-checker.pdf> > > It's commit b0b873a07872f7ab7f66f259c73fb9dd42aa66a9, which is incompatibel > with qemu-kvm-1.1 The incompatibility was introduced by 23910d3f669d46073b403876e30a7314599633af in qemu, which changed "gpe" to be an array[4] instead of single "struct gpe_regs" without changing the version number. I see no way to detect that incompatibility automatically, so hard-code the the compatibility to qemu-kvm-1.1 as used previously in UCS<=4.1
r78098 | Bug #38877 qemu: Force recommended package "univention-kvm-compat" to be maintained r78101 | Bug #38877 dvd: Add "univention-kvm-compat" Package: univention-dvd Version: 2.0.0-9A~4.2.0.201703221015 Branch: ucs_4.2-0
r78108 | Bug #38877 virtio: Update VirtIO windows driver Package: univention-kvm-virtio Version: 6.0.0-2A~4.2.0.201703221225 Branch: ucs_4.2-0
r78114 | Bug #38877: VirtIO Changelog
Eric observed a case, where a UCS-4.1-4 VM did not reboot after installation from DVD: [ 1691.343037] reboot: Restarting system virsh # qemu-monitor-command --hmp --domain ucs41-64-test info registers EAX=9e000000 EBX=80008086 ECX=00000030 EDX=00000cfc ESI=00000cf8 EDI=00000000 EBP=00000cfc ESP=00006fb8 EIP=000ef78f EFL=00000097 [--S-APC] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f7490 00000037 IDT= 000f74ce 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 virsh # qemu-monitor-command --hmp --domain ucs41-64-test info mtree 00000000000ec000-00000000000effff (prio 1, RW): alias pam-ram @pc.ram 00000000000ec000-00000000000effff 00000000000ec000-00000000000effff (prio 1, RW): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled] 00000000000ec000-00000000000effff (prio 1, R-): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled] 00000000000ec000-00000000000effff (prio 1, RW): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled] virsh # qemu-monitor-command --hmp --domain ucs41-64-test xp /3i $eip 0x00000000000ef78f: mov $0xcf8,%esi 0x00000000000ef794: mov $0x9e000000,%eax 0x00000000000ef799: jle 0xef78f 0x00000000000ef79b: This clearly is an endless loop as nothing inside the loop modifies the flags. virsh # qemu-monitor-command --hmp --domain ucs41-64-test xp /32xb 0xef780 00000000000ef780: 0x89 0xc3 0x8d 0x43 0xff 0x66 0x83 0xf8 00000000000ef788: 0xfd 0x0f 0x87 0xdc 0x00 0x00 0x00 0xbe 00000000000ef790: 0xf8 0x0c 0x00 0x00 0xb8 0x00 0x00 0x00 00000000000ef798: 0x9e 0x7e 0xf4 0xef 0xb2 0xfe 0x66 0xed ^^^^^^^^^^^^^^ # xxd -s 0xf780 -l 32 -g 1 -c 8 /usr/share/seabios/bios.bin 000f780: 89 c3 8d 43 ff 66 83 f8 ...C.f.. 000f788: fd 0f 87 dc 00 00 00 be ........ 000f790: f8 0c 00 00 b8 00 00 00 ........ 000f798: 80 89 f2 ef b2 fe 66 ed ......f. ^^^^^^^^ So somethings seems to have changed those 3 bytes in the shadow RAM copy of the BIOS ROM. # objdump -D -b binary -mi386 -Maddr32,data32 --start-address=0xf780 --stop-address=0xf7a0 /usr/share/seabios/bios.bin 0000f780 <.data+0xf780>: f780: 89 c3 mov %eax,%ebx f782: 8d 43 ff lea -0x1(%ebx),%eax f785: 66 83 f8 fd cmp $0xfffd,%ax f789: 0f 87 dc 00 00 00 ja 0xf86b f78f: be f8 0c 00 00 mov $0xcf8,%esi f794: b8 00 00 00 80 mov $0x80000000,%eax f799: 89 f2 mov %esi,%edx f79b: ef out %eax,(%dx) f79c: b2 fe mov $0xfe,%dl f79e: 66 ed in (%dx),%ax This disassembly look much more sane.
Many tests already work, but reopen due to the issue in the comment above. The failing reboot happened at 3 different VMs, on 2 different hosts. It does not happen on every reboot. We are now trying to see if it does occur if we use the bios.bin from ucs 4.1...
UCS 4.1 VM: Restoring a snapshot taken on UCS 4.1: The VM reboots Win 7 VM (with virtio drivers): Restoring a snapshot taken on UCS 4.1: The VM hangs (no mouse movement), but the qemu process uses 100% cpu
Using gdb with QEMU this is my last finding: # kvm -name 4.1-sysetup-cloned-on-4.1 -machine pc-1.1,accel=kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid 72ab1a74-b5e5-4846-b59a-c3faf7a86daf -no-user-config -nodefaults -chardev stdio,id=charmonitor -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/4.1-sysetup-cloned-on-4.1_vda.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev bridge,id=hostnet0,br=br0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:fe:0a:a0,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/log/libvirt/qemu/cw4.1-sysetup-cloned-on-4.1.console -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -sdl -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -loadvm running-4.1 -S (qemu) gdbserver # univention-install linux-image-4.1.0-ucs207-amd64-dbg # gdb /usr/lib/debug/lib/modules/4.1.0-ucs207-amd64/vmlinux -ex 'set architecture i386:x86-64:intel' -ex 'target remote :1234' (gdb) thread apply all bt #0 0xffffffff8105f1c2 in native_safe_halt () at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/arch/x86/include/asm/irqflags.h:49 #1 0xffffffff81020bac in arch_safe_halt () at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/arch/x86/include/asm/paravirt.h:111 #2 default_idle () at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/arch/x86/kernel/process.c:341 #3 0xffffffff810b7fb4 in cpuidle_idle_call () at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/kernel/sched/idle.c:195 #4 cpu_idle_loop () at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/kernel/sched/idle.c:249 #5 cpu_startup_entry (state=<optimized out>) at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/kernel/sched/idle.c:297 #6 0xffffffff81590fba in rest_init () at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/init/main.c:409 #7 0xffffffff8191c093 in start_kernel () at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/init/main.c:678 #8 0xffffffff8191b5d6 in x86_64_start_reservations (real_mode_data=<optimized out>) at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/arch/x86/kernel/head64.c:195 #9 0xffffffff8191b720 in x86_64_start_kernel (real_mode_data=<optimized out>) at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/arch/x86/kernel/head64.c:184 #10 0x0000000000000000 in ?? () (qemu) info tlb ... ffffffff81000000: 0000000001000000 -GPDA---- ffffffff81200000: 0000000001200000 -GPDA---- ffffffff81400000: 0000000001400000 -G-DA---- <https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt> ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0 So $pc=ffffffff8105f1c2 translates to $ printf "0x%'016x\n" $((0xffffffff8105f1c2 - 0xffffffff80000000)) 0x00000001.05f.1c2 (qemu) info mtree 0000000000000000-7ffffffffffffffe (prio 0, RW): system 0000000000000000-000000003fffffff (prio 0, RW): alias ram-below-4g @pc.ram 0000000000000000-000000003fffffff
Feedback from QEMU developer "davidgiluk": Might be a SMM injected 3fault not handled by old SeaBIOS and still pending; maybe fixed by <git:fc3a1fd7>
Created attachment 8656 [details] tripel-fault caused by SMM <http://www.linux-kvm.org/page/Perf_events#Tracing_events> David confirmed that it looks like the bug he fixed. Debug build confirms it fixed! r17430 | Bug #38877 qemu: Fix 3fault by SMM Package: qemu Version: 1:2.8+dfsg-3~bpo8+1A~4.2.0.201703271321 Branch: ucs_4.2-0
(In reply to Erik Damrose from comment #13) > Many tests already work, but reopen due to the issue in the comment above. > > The failing reboot happened at 3 different VMs, on 2 different hosts. It > does not happen on every reboot. We are now trying to see if it does occur > if we use the bios.bin from ucs 4.1... ... (In reply to Erik Damrose from comment #14) > UCS 4.1 VM: Restoring a snapshot taken on UCS 4.1: The VM reboots > Win 7 VM (with virtio drivers): Restoring a snapshot taken on UCS 4.1: The > VM hangs (no mouse movement), but the qemu process uses 100% cpu The issue about "vmstate not migrateable" is split into Bug #44083 for now and Bug #44804 for later
OK: All agreed features work, updates with adaptions according to bug #44086 work, new domains installed with 4.2 work. Verified (In reply to Philipp Hahn from comment #18) > The issue about "vmstate not migrateable" is split into Bug #44083 for now > and Bug #44804 for later
UCS 4.2 has been released: https://docs.software-univention.de/release-notes-4.2-0-en.html https://docs.software-univention.de/release-notes-4.2-0-de.html If this error occurs again, please use "Clone This Bug".