Bug 38877 - Update libvirt and qemu-kvm
Update libvirt and qemu-kvm
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Virtualization - KVM
UCS 4.1
Other Linux
: P5 enhancement (vote)
: UCS 4.2
Assigned To: Philipp Hahn
Erik Damrose
: interim-3
Depends on: 43875
Blocks: 44084
  Show dependency treegraph
 
Reported: 2015-07-10 07:58 CEST by Stefan Gohmann
Modified: 2017-04-04 18:29 CEST (History)
2 users (show)

See Also:
What kind of report is it?: Feature Request
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional): Release Goal
Max CVSS v3 score:


Attachments
tripel-fault caused by SMM (5.95 KB, text/plain)
2017-03-27 14:12 CEST, Philipp Hahn
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Gohmann univentionstaff 2015-07-10 07:58:46 CEST
The libvirt and qemu-kvm packages should be updated to more up-to-date upstream versions. 

It must be possible to revert to old snapshots created with UCS 4.0. See also Bug #24702 and Bug #35768.
Comment 1 Stefan Gohmann univentionstaff 2015-09-23 12:10:54 CEST
This feature has been dropped from the UCS 4.1 roadmap.
Comment 2 Stefan Gohmann univentionstaff 2016-08-19 07:40:57 CEST
We need to address the upgrade problem from Bug #24702 and Bug #35768 again since UCS 4.2 will be skipped with the Jessie packages.
Comment 3 Philipp Hahn univentionstaff 2016-12-22 16:29:55 CET
virsh # start ucs32-64
error: Failed to start domain ucs32-64
error: internal error: early end of file from monitor: possible problem:
2016-12-22T15:27:54.084591Z qemu-system-x86_64: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 10000 in != 20000
2016-12-22T15:27:54.084663Z qemu-system-x86_64: Ack, bad migration stream!
2016-12-22T15:27:54.084682Z qemu-system-x86_64: Illegal RAM offset 770632e62696000
qemu: warning: error while loading state for instance 0x0 of device 'ram'
2016-12-22T15:27:54.084724Z qemu-system-x86_64: load of migration failed: Invalid argument
Comment 4 Philipp Hahn univentionstaff 2017-02-18 09:58:00 CET
The good news: The "Qemu VM Save Stream" contains the content of the previous ROM files, so it should™ be enougth to create empty files of the size used in UCS«4.2.

Qemu used those paths to find its ROM (and other) files:
# strings /usr/bin/qemu-system-x86_64 | grep -e /usr/share/ -e /usr/lib/
/usr/share/qemu:/usr/share/seabios:/usr/lib/ipxe/qemu

The path can be changed with
$ qemu-system-x86_64 --help | grep -e -L
-L path         set the directory for the BIOS, VGA BIOS and keymap

The path can also be changed through the property "romfile":
$ qemu-system-x86_64 --help | grep -e property
-global driver.property=value
-global driver=driver,property=property,value=value

The bad news are:
- libvirtd has not way to specify "-L" or "-global" (except the <qemu:commandline>)
- there is no easy way to get the original ROM size from the QEVM, as the format is very qemu-version dependent (and only loosely structured)
- there is no version info in the XML data to determine the Qemu version (expect the domain/os/type/@machine='pc-i440fx-2.1' attribute, which can be considered only as a hint)
- the ROM files were all renamed:
mv virtio-net.rom pxe-virtio.rom
mv rtl8139.rom pxe-rtl8139.rom
mv pcnet32.rom pxe-pcnet.rom
mv eepro100.rom pxe-eepro100.rom
mv e1000_82540.rom pxe-e1000.rom
mv ns8390.rom pxe-ne2k_pci.rom


My idea is to (more or less) use the machine-info to switch between the historic ROMs and the current ROMs when loading a VM.
libvirt must be extended to do this magic, which is WIP.

For now I imported the Debian-Stretch versions, as they will get the longest maintenance from Debian (and Upstream):

Package: qemu
Version: 1:2.8+dfsg-2~bpo8+1A~4.2.0.201702161429
Branch: ucs_4.2-0

Package: libvirt
Version: 3.0.0-2A~4.2.0.201702172052
Branch: ucs_4.2-0
Comment 5 Philipp Hahn univentionstaff 2017-02-20 09:57:19 CET
r17202 | Bug #38877: libvirt systemd
 Use older systemd from Debian-Jessie - re-introduces  <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=774237>

Package: libvirt
Version: 3.0.0-2A~4.2.0.201702200932
Branch: ucs_4.2-0
Comment 6 Philipp Hahn univentionstaff 2017-03-16 16:56:55 CET
While QEMU used a default of VGA_RAM_SIZE=8M until it was dropped by <https://www.redhat.com/archives/libvir-list/2014-August/msg00649.html> and/or got replaced by …, KVM used 9K which gets rounded up to 16M.
There was a minor period in 0.11-rc0 between <https://anonscm.debian.org/cgit/collab-maint/qemu-kvm.git/commit/?id=fbe1b5953d061c77c07b91e4eb555c92195308d0> and <https://anonscm.debian.org/cgit/collab-maint/qemu-kvm.git/commit/?id=b0136de5e33a64123392a1e3ffac611e6140b39a>), where that was broken and only 8M were used.

The value of /domain/devices/video/model/@vram is ignored for type='cirrus' until <https://bugzilla.redhat.com/show_bug.cgi?id=1076098>.

I analysed all our saved VMs and only found vram_size=16 MiB and VirtIO-ROM=64 KiB:
 ldapsearch -LLLo ldif-wrap=no '(univentionService=KVM Testenv Host)' cn|
 sed -ne 's/^cn: //p'|
 xargs -rn1 -I HOST ssh -l root HOST '
 for s in /var/lib/libvirt/qemu/save/*.save
 do
  ~phahn/src/VIRT/qemu-analyse-savevm "$s" 2>/dev/null|
  head -n50
 done'|
 sed -rne "s/.*idstr=.+(vga\.vram|virtio-net).*/\1/;T;N;s/[0-9a-f]{16}:[ +-]*//g;s/\n/\t/;p"|
 sort -u
#################
  virtio-net     length=0000000000010000 (65,536)
  vga.vram       length=0000000001000000 (16,777,216)

With a newer machine="pc-i440fx-2.1" the EFI images are used, which are 256 KiB!

Now I'm stuck again with again with Bug #29355 comment 6:
> 25064@1489677264.463601:qemu_loadvm_state_section 48
> Unknown savevm section type 48


FYI: The following QEMU versions were used in the following UCS releases:
 0.11.1: ucs2.4-0+virtuualization
 0.12.4: sec2.4-1 …
 0.14.0: ucs2.4-2 …
 0.14.1: ucs2.4-3 … ucs3.0-X
 1.1.2:  ucs3.1-0 … ucs4.1-X
 2.8.0:  ucs4.2.0 …

Package: univention-kvm-compat
Version: 1.0.0-1A~4.2.0.201703081659
Branch: ucs_4.2-0
Comment 7 Philipp Hahn univentionstaff 2017-03-20 16:31:00 CET
Oh "joy": "piix4_pm" serialized by kvm-1.1.2 can not be loaded by qemu-2.8

>qemu_loadvm_state_section 4@0x00364c3b
that is the correct start of 0000:00:01.3/piix4_pm
>qemu_loadvm_state_section 48@0x00364da2
that is 12 bytes into 0000:00:01.2/uhci
                            ^ ord('0')=48=0x30

<https://www.linux-kvm.org/images/6/6e/Kvm-forum-2013-migration-checker.pdf>

It's commit b0b873a07872f7ab7f66f259c73fb9dd42aa66a9, which is incompatibel with qemu-kvm-1.1

r17415 | Bug #38877 qemu: Apply patches
r17417 | Bug #38877 qemu: Refresh patches
r17418 | Bug #38877 qemu: Split patch
r17420 | Bug #38877 qemu: Split patch

repo_admin.py -U -p qemu -d jessie-backports -r 4.2 # <http://metadata.ftp-master.debian.org/changelogs/main/q/qemu/qemu_2.8+dfsg-3~bpo8+1_changelog>

Package: qemu
Version: 1:2.8+dfsg-3~bpo8+1A~4.2.0.201703201330
Version: 1:2.8+dfsg-3~bpo8+1A~4.2.0.201703201346
Version: 1:2.8+dfsg-3~bpo8+1A~4.2.0.201703201359
Branch: ucs_4.2-0

r77991 | Bug #38877 qemu/libvirt changelog
Comment 8 Philipp Hahn univentionstaff 2017-03-21 09:41:22 CET
(In reply to Philipp Hahn from comment #7)
> Oh "joy": "piix4_pm" serialized by kvm-1.1.2 can not be loaded by qemu-2.8
> 
> >qemu_loadvm_state_section 4@0x00364c3b
> that is the correct start of 0000:00:01.3/piix4_pm
> >qemu_loadvm_state_section 48@0x00364da2
> that is 12 bytes into 0000:00:01.2/uhci
>                             ^ ord('0')=48=0x30
> 
> <https://www.linux-kvm.org/images/6/6e/Kvm-forum-2013-migration-checker.pdf>
> 
> It's commit b0b873a07872f7ab7f66f259c73fb9dd42aa66a9, which is incompatibel
> with qemu-kvm-1.1

The incompatibility was introduced by 23910d3f669d46073b403876e30a7314599633af in qemu, which changed "gpe" to be an array[4] instead of single "struct gpe_regs" without changing the version number. I see no way to detect that incompatibility automatically, so hard-code the the compatibility to qemu-kvm-1.1 as used previously in UCS<=4.1
Comment 9 Philipp Hahn univentionstaff 2017-03-22 11:08:25 CET
r78098 | Bug #38877 qemu: Force recommended package "univention-kvm-compat" to be maintained

r78101 | Bug #38877 dvd: Add "univention-kvm-compat"

Package: univention-dvd
Version: 2.0.0-9A~4.2.0.201703221015
Branch: ucs_4.2-0
Comment 10 Philipp Hahn univentionstaff 2017-03-22 12:33:38 CET
r78108 | Bug #38877 virtio: Update VirtIO windows driver

Package: univention-kvm-virtio
Version: 6.0.0-2A~4.2.0.201703221225
Branch: ucs_4.2-0
Comment 11 Philipp Hahn univentionstaff 2017-03-22 12:47:07 CET
r78114 | Bug #38877: VirtIO Changelog
Comment 12 Philipp Hahn univentionstaff 2017-03-22 14:02:47 CET
Eric observed a case, where a UCS-4.1-4 VM did not reboot after installation from DVD:
[ 1691.343037] reboot: Restarting system

virsh # qemu-monitor-command --hmp --domain ucs41-64-test info registers
EAX=9e000000 EBX=80008086 ECX=00000030 EDX=00000cfc
ESI=00000cf8 EDI=00000000 EBP=00000cfc ESP=00006fb8
EIP=000ef78f EFL=00000097 [--S-APC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f7490 00000037
IDT=     000f74ce 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000

virsh # qemu-monitor-command --hmp --domain ucs41-64-test info mtree
    00000000000ec000-00000000000effff (prio 1, RW): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
    00000000000ec000-00000000000effff (prio 1, RW): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
    00000000000ec000-00000000000effff (prio 1, R-): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
    00000000000ec000-00000000000effff (prio 1, RW): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]

virsh # qemu-monitor-command --hmp --domain ucs41-64-test xp /3i $eip
0x00000000000ef78f:  mov    $0xcf8,%esi
0x00000000000ef794:  mov    $0x9e000000,%eax
0x00000000000ef799:  jle    0xef78f
0x00000000000ef79b:

This clearly is an endless loop as nothing inside the loop modifies the flags.

virsh # qemu-monitor-command --hmp --domain ucs41-64-test xp /32xb 0xef780
00000000000ef780: 0x89 0xc3 0x8d 0x43 0xff 0x66 0x83 0xf8
00000000000ef788: 0xfd 0x0f 0x87 0xdc 0x00 0x00 0x00 0xbe
00000000000ef790: 0xf8 0x0c 0x00 0x00 0xb8 0x00 0x00 0x00
00000000000ef798: 0x9e 0x7e 0xf4 0xef 0xb2 0xfe 0x66 0xed
                  ^^^^^^^^^^^^^^

# xxd -s 0xf780 -l 32 -g 1 -c 8 /usr/share/seabios/bios.bin
000f780: 89 c3 8d 43 ff 66 83 f8  ...C.f..
000f788: fd 0f 87 dc 00 00 00 be  ........
000f790: f8 0c 00 00 b8 00 00 00  ........
000f798: 80 89 f2 ef b2 fe 66 ed  ......f.
         ^^^^^^^^

So somethings seems to have changed those 3 bytes in the shadow RAM copy of the BIOS ROM.

# objdump -D -b binary -mi386 -Maddr32,data32 --start-address=0xf780 --stop-address=0xf7a0 /usr/share/seabios/bios.bin
0000f780 <.data+0xf780>:
    f780:       89 c3                   mov    %eax,%ebx
    f782:       8d 43 ff                lea    -0x1(%ebx),%eax
    f785:       66 83 f8 fd             cmp    $0xfffd,%ax
    f789:       0f 87 dc 00 00 00       ja     0xf86b
    f78f:       be f8 0c 00 00          mov    $0xcf8,%esi
    f794:       b8 00 00 00 80          mov    $0x80000000,%eax
    f799:       89 f2                   mov    %esi,%edx
    f79b:       ef                      out    %eax,(%dx)
    f79c:       b2 fe                   mov    $0xfe,%dl
    f79e:       66 ed                   in     (%dx),%ax

This disassembly look much more sane.
Comment 13 Erik Damrose univentionstaff 2017-03-22 16:43:45 CET
Many tests already work, but reopen due to the issue in the comment above.

The failing reboot happened at 3 different VMs, on 2 different hosts. It does not happen on every reboot. We are now trying to see if it does occur if we use the bios.bin from ucs 4.1...
Comment 14 Erik Damrose univentionstaff 2017-03-22 17:06:58 CET
UCS 4.1 VM: Restoring a snapshot taken on UCS 4.1: The VM reboots
Win 7 VM (with virtio drivers): Restoring a snapshot taken on UCS 4.1: The VM hangs (no mouse movement), but the qemu process uses 100% cpu
Comment 15 Philipp Hahn univentionstaff 2017-03-24 17:22:50 CET
Using gdb with QEMU this is my last finding:
# kvm -name 4.1-sysetup-cloned-on-4.1 -machine pc-1.1,accel=kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid 72ab1a74-b5e5-4846-b59a-c3faf7a86daf -no-user-config -nodefaults -chardev stdio,id=charmonitor -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/4.1-sysetup-cloned-on-4.1_vda.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev bridge,id=hostnet0,br=br0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:fe:0a:a0,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/log/libvirt/qemu/cw4.1-sysetup-cloned-on-4.1.console -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -sdl -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -loadvm running-4.1 -S
(qemu) gdbserver
# univention-install linux-image-4.1.0-ucs207-amd64-dbg
# gdb /usr/lib/debug/lib/modules/4.1.0-ucs207-amd64/vmlinux -ex 'set architecture i386:x86-64:intel' -ex 'target remote :1234'
(gdb) thread apply all bt
#0  0xffffffff8105f1c2 in native_safe_halt ()
    at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/arch/x86/include/asm/irqflags.h:49
#1  0xffffffff81020bac in arch_safe_halt ()
    at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/arch/x86/include/asm/paravirt.h:111
#2  default_idle () at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/arch/x86/kernel/process.c:341
#3  0xffffffff810b7fb4 in cpuidle_idle_call ()
    at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/kernel/sched/idle.c:195
#4  cpu_idle_loop () at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/kernel/sched/idle.c:249
#5  cpu_startup_entry (state=<optimized out>)
    at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/kernel/sched/idle.c:297
#6  0xffffffff81590fba in rest_init () at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/init/main.c:409
#7  0xffffffff8191c093 in start_kernel () at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/init/main.c:678
#8  0xffffffff8191b5d6 in x86_64_start_reservations (real_mode_data=<optimized out>)
    at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/arch/x86/kernel/head64.c:195
#9  0xffffffff8191b720 in x86_64_start_kernel (real_mode_data=<optimized out>)
    at /var/build/temp/tmp.6yb3RZw8F4/pbuilder/linux-4.1.6/arch/x86/kernel/head64.c:184
#10 0x0000000000000000 in ?? ()

(qemu) info tlb
...
ffffffff81000000: 0000000001000000 -GPDA----
ffffffff81200000: 0000000001200000 -GPDA----
ffffffff81400000: 0000000001400000 -G-DA----

<https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt>
ffffffff80000000 - ffffffff9fffffff (=512 MB)  kernel text mapping, from phys 0

So $pc=ffffffff8105f1c2 translates to  
$ printf "0x%'016x\n" $((0xffffffff8105f1c2 - 0xffffffff80000000))
0x00000001.05f.1c2

(qemu) info mtree
0000000000000000-7ffffffffffffffe (prio 0, RW): system
  0000000000000000-000000003fffffff (prio 0, RW): alias ram-below-4g @pc.ram 0000000000000000-000000003fffffff
Comment 16 Philipp Hahn univentionstaff 2017-03-27 11:19:22 CEST
Feedback from QEMU developer "davidgiluk": Might be a SMM injected 3fault not handled by old SeaBIOS and still pending; maybe fixed by <git:fc3a1fd7>
Comment 17 Philipp Hahn univentionstaff 2017-03-27 14:12:16 CEST
Created attachment 8656 [details]
tripel-fault caused by SMM

<http://www.linux-kvm.org/page/Perf_events#Tracing_events>

David confirmed that it looks like the bug he fixed.
Debug build confirms it fixed!

r17430 | Bug #38877 qemu: Fix 3fault by SMM

Package: qemu
Version: 1:2.8+dfsg-3~bpo8+1A~4.2.0.201703271321
Branch: ucs_4.2-0
Comment 18 Philipp Hahn univentionstaff 2017-03-27 17:11:52 CEST
(In reply to Erik Damrose from comment #13)
> Many tests already work, but reopen due to the issue in the comment above.
> 
> The failing reboot happened at 3 different VMs, on 2 different hosts. It
> does not happen on every reboot. We are now trying to see if it does occur
> if we use the bios.bin from ucs 4.1...
...
(In reply to Erik Damrose from comment #14)
> UCS 4.1 VM: Restoring a snapshot taken on UCS 4.1: The VM reboots
> Win 7 VM (with virtio drivers): Restoring a snapshot taken on UCS 4.1: The
> VM hangs (no mouse movement), but the qemu process uses 100% cpu

The issue about "vmstate not migrateable" is split into Bug #44083 for now and Bug #44804 for later
Comment 19 Erik Damrose univentionstaff 2017-03-30 14:55:11 CEST
OK: All agreed features work, updates with adaptions according to bug #44086 work, new domains installed with 4.2 work.
Verified

(In reply to Philipp Hahn from comment #18)
> The issue about "vmstate not migrateable" is split into Bug #44083 for now
> and Bug #44804 for later
Comment 20 Stefan Gohmann univentionstaff 2017-04-04 18:29:43 CEST
UCS 4.2 has been released:
 https://docs.software-univention.de/release-notes-4.2-0-en.html
 https://docs.software-univention.de/release-notes-4.2-0-de.html

If this error occurs again, please use "Clone This Bug".