Bug 31032 - libvirt SEGV during domain define
libvirt SEGV during domain define
Status: CLOSED WONTFIX
Product: UCS
Classification: Unclassified
Component: Virtualization - Xen
UCS 3.1
Other Linux
: P5 normal (vote)
: UCS 3.x
Assigned To: UCS maintainers
:
: 34114 (view as bug list)
Depends on: 31371
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-11 15:57 CEST by Philipp Hahn
Modified: 2023-06-28 10:46 CEST (History)
3 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional): Troubleshooting
Max CVSS v3 score:


Attachments
LIBVIRT_DEBUG=1 virsh define ucs31-64-steuwer.xml (89.00 KB, text/plain)
2013-04-11 16:19 CEST, Philipp Hahn
Details
script as workaround using python to import an xml (1.85 KB, application/octet-stream)
2013-05-21 14:30 CEST, Ingo Steuwer
Details
keep-alive double-free with libvirt-1.2.3 (24.47 KB, text/plain)
2014-04-02 10:20 CEST, Philipp Hahn
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2013-04-11 15:57:53 CEST
Defining a (new or existing) domain crashed virsh with a glic memory corruption:
# virsh define ucs31-64-steuwer.xml 
Domain ucs31-64-steuwer defined from ucs31-64-steuwer.xml

*** glibc detected *** virsh: malloc(): memory corruption: 0x00007f8b40017e60 ***
======= Backtrace: =========
/lib/libc.so.6(+0x71e16)[0x7f8b478dde16]
/lib/libc.so.6(+0x74ead)[0x7f8b478e0ead]
/lib/libc.so.6(__libc_malloc+0x70)[0x7f8b478e2c70]
/lib/libc.so.6(__vasprintf_chk+0x3b)[0x7f8b4794fbdb]
/usr/lib/libvirt.so.0(virVasprintf+0x28)[0x7f8b4aa62ba8]
/usr/lib/libvirt.so.0(virLogMessage+0x178)[0x7f8b4aa54208]
/usr/lib/libvirt.so.0(virNetTLSContextFree+0x5c)[0x7f8b4ab4e25c]
/usr/lib/libvirt.so.0(+0x1215bd)[0x7f8b4ab1c5bd]
/usr/lib/libvirt.so.0(+0x1216db)[0x7f8b4ab1c6db]
/usr/lib/libvirt.so.0(+0xe0a05)[0x7f8b4aadba05]
/usr/lib/libvirt.so.0(+0xe0bf8)[0x7f8b4aadbbf8]
/usr/lib/libvirt.so.0(virUnrefDomain+0xc8)[0x7f8b4aadbed8]
/usr/lib/libvirt.so.0(virDomainFree+0xbb)[0x7f8b4ab07feb]
/usr/lib/libvirt.so.0(+0x17010c)[0x7f8b4ab6b10c]
/usr/lib/libvirt.so.0(+0x1709bc)[0x7f8b4ab6b9bc]
/usr/lib/libvirt.so.0(+0x170d00)[0x7f8b4ab6bd00]
/usr/lib/libvirt.so.0(+0x51e15)[0x7f8b4aa4ce15]
/usr/lib/libvirt.so.0(virEventRunDefaultImpl+0x45)[0x7f8b4aa4ba45]
virsh[0x41eb52]
/usr/lib/libvirt.so.0(+0x64ba6)[0x7f8b4aa5fba6]
/lib/libpthread.so.0(+0x68ca)[0x7f8b485708ca]
/lib/libc.so.6(clone+0x6d)[0x7f8b4793bb6d]
======= Memory map: ========
00400000-00450000 r-xp 00000000 fe:00 1610687521                         /usr/bin/virsh
0064f000-00651000 rw-p 0004f000 fe:00 1610687521                         /usr/bin/virsh
008d7000-00920000 rw-p 00000000 00:00 0                                  [heap]
7f8b40000000-7f8b400a6000 rw-p 00000000 00:00 0 
7f8b400a6000-7f8b44000000 ---p 00000000 00:00 0 
7f8b45736000-7f8b4574c000 r-xp 00000000 fe:00 537081246                  /lib/libgcc_s.so.1
7f8b4574c000-7f8b4594b000 ---p 00016000 fe:00 537081246                  /lib/libgcc_s.so.1
7f8b4594b000-7f8b4594c000 rw-p 00015000 fe:00 537081246                  /lib/libgcc_s.so.1
7f8b4594c000-7f8b4594d000 ---p 00000000 00:00 0 
7f8b4594d000-7f8b4614d000 rw-p 00000000 00:00 0 
7f8b4614d000-7f8b4614e000 ---p 00000000 00:00 0 
7f8b4614e000-7f8b4694e000 rw-p 00000000 00:00 0 
7f8b4694e000-7f8b46990000 r-xp 00000000 fe:00 537081479                  /lib/libncurses.so.5.7
7f8b46990000-7f8b46b8f000 ---p 00042000 fe:00 537081479                  /lib/libncurses.so.5.7
7f8b46b8f000-7f8b46b94000 rw-p 00041000 fe:00 537081479                  /lib/libncurses.so.5.7
7f8b46b94000-7f8b46c14000 r-xp 00000000 fe:00 537081227                  /lib/libm-2.11.3.so
7f8b46c14000-7f8b46e14000 ---p 00080000 fe:00 537081227                  /lib/libm-2.11.3.so
7f8b46e14000-7f8b46e15000 r--p 00080000 fe:00 537081227                  /lib/libm-2.11.3.so
7f8b46e15000-7f8b46e16000 rw-p 00081000 fe:00 537081227                  /lib/libm-2.11.3.so
7f8b46e16000-7f8b46e19000 r-xp 00000000 fe:00 1074340695                 /usr/lib/libgpg-error.so.0.4.0
7f8b46e19000-7f8b47018000 ---p 00003000 fe:00 1074340695                 /usr/lib/libgpg-error.so.0.4.0
7f8b47018000-7f8b47019000 rw-p 00002000 fe:00 1074340695                 /usr/lib/libgpg-error.so.0.4.0
7f8b47019000-7f8b47030000 r-xp 00000000 fe:00 1074374365                 /usr/lib/libz.so.1.2.3.4
7f8b47030000-7f8b4722f000 ---p 00017000 fe:00 1074374365                 /usr/lib/libz.so.1.2.3.4
7f8b4722f000-7f8b47230000 rw-p 00016000 fe:00 1074374365                 /usr/lib/libz.so.1.2.3.4
7f8b47230000-7f8b47240000 r-xp 00000000 fe:00 1074341381                 /usr/lib/libtasn1.so.3.1.9
7f8b47240000-7f8b4743f000 ---p 00010000 fe:00 1074341381                 /usr/lib/libtasn1.so.3.1.9
7f8b4743f000-7f8b47440000 rw-p 0000f000 fe:00 1074341381                 /usr/lib/libtasn1.so.3.1.9
7f8b47440000-7f8b4744d000 r-xp 00000000 fe:00 537101844                  /lib/libudev.so.0.9.3
7f8b4744d000-7f8b4764c000 ---p 0000d000 fe:00 537101844                  /lib/libudev.so.0.9.3
7f8b4764c000-7f8b4764d000 r--p 0000c000 fe:00 537101844                  /lib/libudev.so.0.9.3
7f8b4764d000-7f8b4764e000 rw-p 0000d000 fe:00 537101844                  /lib/libudev.so.0.9.3
7f8b4764e000-7f8b4766a000 r-xp 00000000 fe:00 537101831                  /lib/libselinux.so.1
7f8b4766a000-7f8b47869000 ---p 0001c000 fe:00 537101831                  /lib/libselinux.so.1
7f8b47869000-7f8b4786a000 r--p 0001b000 fe:00 537101831                  /lib/libselinux.so.1
7f8b4786a000-7f8b4786b000 rw-p 0001c000 fe:00 537101831                  /lib/libselinux.so.1
7f8b4786b000-7f8b4786c000 rw-p 00000000 00:00 0 
7f8b4786c000-7f8b479c5000 r-xp 00000000 fe:00 537081225                  /lib/libc-2.11.3.so
7f8b479c5000-7f8b47bc4000 ---p 00159000 fe:00 537081225                  /lib/libc-2.11.3.so
7f8b47bc4000-7f8b47bc8000 r--p 00158000 fe:00 537081225                  /lib/libc-2.11.3.so
7f8b47bc8000-7f8b47bc9000 rw-p 0015c000 fe:00 537081225                  /lib/libc-2.11.3.so
7f8b47bc9000-7f8b47bce000 rw-p 00000000 00:00 0 
7f8b47bce000-7f8b47bd0000 r-xp 00000000 fe:00 537081226                  /lib/libdl-2.11.3.so
7f8b47bd0000-7f8b47dd0000 ---p 00002000 fe:00 537081226                  /lib/libdl-2.11.3.so
7f8b47dd0000-7f8b47dd1000 r--p 00002000 fe:00 537081226                  /lib/libdl-2.11.3.so
7f8b47dd1000-7f8b47dd2000 rw-p 00003000 fe:00 537081226                  /lib/libdl-2.11.3.so
7f8b47dd2000-7f8b47e0d000 r-xp 00000000 fe:00 537102672                  /lib/libreadline.so.6.1
7f8b47e0d000-7f8b4800d000 ---p 0003b000 fe:00 537102672                  /lib/libreadline.so.6.1
7f8b4800d000-7f8b48015000 rw-p 0003b000 fe:00 537102672                  /lib/libreadline.so.6.1
7f8b48015000-7f8b48016000 rw-p 00000000 00:00 0 
7f8b48016000-7f8b4815d000 r-xp 00000000 fe:00 1074366692                 /usr/lib/libxml2.so.2.7.8
7f8b4815d000-7f8b4835c000 ---p 00147000 fe:00 1074366692                 /usr/lib/libxml2.so.2.7.8
7f8b4835c000-7f8b48365000 rw-p 00146000 fe:00 1074366692                 /usr/lib/libxml2.so.2.7.8
7f8b48365000-7f8b48367000 rw-p 00000000 00:00 0 
7f8b48367000-7f8b48369000 r-xp 00000000 fe:00 537081230                  /lib/libutil-2.11.3.so
7f8b48369000-7f8b48568000 ---p 00002000 fe:00 537081230                  /lib/libutil-2.11.3.so
7f8b48568000-7f8b48569000 r--p 00001000 fe:00 537081230                  /lib/libutil-2.11.3.so
7f8b48569000-7f8b4856a000 rw-p 00002000 fe:00 537081230                  /lib/libutil-2.11.3.so
7f8b4856a000-7f8b48581000 r-xp 00000000 fe:00 537081231                  /lib/libpthread-2.11.3.so
7f8b48581000-7f8b48780000 ---p 00017000 fe:00 537081231                  /lib/libpthread-2.11.3.soAborted

The new definition takes effect, even when the crash occurs.
The same in Python does not crash:
 xml = open('ucs31-64-steuwer.xml').read()
 import  libvirt
 conn = libvirt.open('xen+unix:///')
 dom = conn.defineXML(xml)

To me it looks like a bug in the default event loop implementation, which virsh registers and uses by default. The Python snippet on the other hand does not explicitly register one, so it doesn't see the "Domain Undefine" event, which can be seen in the stack trace above.
But a first test didn't reproduce that.


# Based on <http://libvirt.org/git/?p=libvirt.git;a=blob;f=examples/domain-events/events-python/event-test.py>
import libvirt
libvirt.virEventRegisterDefaultImpl()

def virEventLoopNativeRun():
  while True:
    libvirt.virEventRunDefaultImpl()

import threading
eventLoopThread = threading.Thread(target=virEventLoopNativeRun, name="libvirtEventLoop")
eventLoopThread.setDaemon(True)
eventLoopThread.start()

EVENTS = ("Defined", "Undefined", "Started", "Suspended", "Resumed", "Stopped", "Shutdown", "PMSuspended")
DETAIL = eventStrings = (("Added", "Updated"), ("Removed",), ("Booted", "Migrated", "Restored", "Snapshot", "Wakeup"), ("Paused", "Migrated", "IOError", "Watchdog", "Restored", "Snapshot", "API error"), ("Unpaused", "Migrated", "Snapshot"), ("Shutdown", "Destroyed", "Crashed", "Migrated", "Saved", "Failed", "Snapshot"), ("Finished",), ("Memory", "Disk"),)
REASONS = ("Error", "End-of-file", "Keepalive", "Client",)

def myDomainEventCallback(conn, dom, event, detail, opaque):
  print "myDomainEventCallback1 EVENT: Domain %s(%s) %s %s" % (
    dom.name(), dom.ID(), EVENTS[event], DETAIL[event][detail])

xml = open('ucs31-64-steuwer.xml').read()
conn = libvirt.open('xen+unix:///')

conn.domainEventRegister(myDomainEventCallback, None)

dom = conn.defineXML(xml)
Comment 1 Philipp Hahn univentionstaff 2013-04-11 16:19:03 CEST
Created attachment 5167 [details]
LIBVIRT_DEBUG=1 virsh define ucs31-64-steuwer.xml
Comment 2 Philipp Hahn univentionstaff 2013-04-12 08:29:29 CEST
Similar: <https://www.redhat.com/archives/libvir-list/2011-July/msg00547.html>
Some possible ML candidates to look at for a fix:
  [PATCH 0/4] Some RPC event handler ref counting fixes
  [PATCH 0/5] Fix misc problems in the RPC code
Comment 3 Philipp Hahn univentionstaff 2013-05-21 14:06:33 CEST
(In reply to comment #0)
> To me it looks like a bug in the default event loop implementation, which virsh
> registers and uses by default.
> The Python snippet on the other hand does not
> explicitly register one, so it doesn't see the "Domain Undefine" event, which
> can be seen in the stack trace above.

See Bug #31371 for some issues with the Python event-loop-implementation.
See Bug #31370 for some issues with expired TLS UCS-2.4 certificates.
Comment 4 Stefan Gohmann univentionstaff 2013-05-21 14:27:29 CEST
The customer uses a workaround. A erratum is not required.
Comment 5 Ingo Steuwer univentionstaff 2013-05-21 14:30:39 CEST
Created attachment 5233 [details]
script as workaround using python to import an xml

The attached python script uses the "libvirt" python module to import a given XML file. The libvirt module doesn't crash as it doesn't handle the broken events...
Comment 6 Philipp Hahn univentionstaff 2014-02-11 16:56:00 CET
"virsh define" crashed (multiple times) at a different customer using Xen-4.1.3 from UCS-3.2:
traps: virsh[2812] general protection ip:7f93b5f1ff64 sp:7f93b42f9140 error:0 in libpthread-2.11.3.so[7f93b5f17000+17000]

This might be the same problem as original reported here, but might also be a different one.

Running exactly the command a second time worked fine.
Comment 7 Philipp Hahn univentionstaff 2014-02-13 11:51:32 CET
*** Bug 34114 has been marked as a duplicate of this bug. ***
Comment 8 Tim Petersen univentionstaff 2014-02-21 09:30:51 CET
Also saw similar things when using virsh edit...Perhaps counter effects...


virsh: pthread_mutex_lock.c:62: __pthread_mutex_lock: Zusicherung »mutex->__data.__owner
== 0« nicht erfüllt.
Abgebrochen
root@server:~# virsh edit instance
Domain instance XML configuration edited.

virsh: tpp.c:63: __pthread_tpp_change_priority: Zusicherung »new_prio == -1 || (new_prio
>= __sched_fifo_min_prio && new_prio <= __sched_fifo_max_prio)« nicht erfüllt.
Abgebrochen
root@server:~# virsh edit instance
Domain instance XML configuration edited.

Speicherzugriffsfehler
root@server:~# virsh edit instance
Domain instance XML configuration edited.

root@server:~# virsh edit instance
Domain instance XML configuration edited.

Speicherzugriffsfehler
---------------------------------------------------------------------------

[88321.330655] virsh[28556]: segfault at 51 ip 00007f6c5a9c7f64 sp 00007f6c58da1140 error
4 in libpthread-2.11.3.so[7f6c5a9bf000+17000]
[88359.397576] traps: virsh[28923] general protection ip:7fa8e85fbf64 sp:7fa8e69d5140
error:0 in libpthread-2.11.3.so[7fa8e85f3000+17000]
Comment 9 Stefan Gohmann univentionstaff 2014-02-25 07:32:57 CET
I think we should fix it even if does not have a negative effect.
Comment 10 Philipp Hahn univentionstaff 2014-04-02 10:19:32 CEST
This only seems to happen when virsh directly interacts with XendD itself; so far I've been unable to reproduce it when an intermediate libvirtd is used:

export LIBVIRT_DEBUG=1 LIBVIRT_LOG_FILTERS=2:event LIBVIRT_LOG_OUTPUTS=1:stderr
OK: virsh undefine ucs32-64-segv ; virsh -c xen+unix:// define ucs32-64-segv.xml
FAIL: virsh undefine ucs32-64-segv ; virsh -c xen:// define ucs32-64-segv.xml

In that case it's caused by virsh using inotify to monitor /var/lib/xend/domains/, where XenD stores its data for manged domains: xenUnifiedOpen() calls xenInotifyOpen(), which registers an event handler using the generic libvirt event loop. It gets a *pointer* to the "conn"ection object, but the internal reference counter is *not incremented*, because libvirt uses reference counting to track the external users, but not the internal references.
After defining the new domain the last reference to the domain is removed, which the also removes the last *external* reference to the connection. This is used by libvirt to close the connection and free the resources.
During shutdown all pending events are processed: here the inotify-event is received and now uses the already freed "conn"ection, which SEGVs.

Easiest workaround: export VIRSH_DEFAULT_CONNECT_URI=xen+unix://

The Xen driver in libvirt was changed to only execute inside libvirtd with v1.0.5-83-g61b7a87, but a similar bug still exists in current v1.2.3/GIT with keep-alive:
$ grep unref ~/BUG/31032_virsh-define-segv.log
virUnrefDomain:276 : unref domain 0x7f4ec4003fe0 ucs32-64-segv 1
virReleaseDomain:246 : unref connection 0x917650 2
virUnrefDomain:276 : unref domain 0x934460 ucs32-64-segv 1
virReleaseDomain:246 : unref connection 0x917650 2
virUnrefConnect:145 : unref connection 0x917650 1
virUnrefDomain:276 : unref domain 0x7f4ec4004060 ucs32-64-segv 1
virReleaseDomain:246 : unref connection 0x917650 1

Notice that there are two lines for "unref connection ... 1"!
Comment 11 Philipp Hahn univentionstaff 2014-04-02 10:20:08 CEST
Created attachment 5847 [details]
keep-alive double-free with libvirt-1.2.3
Comment 12 Philipp Hahn univentionstaff 2014-04-02 10:38:59 CEST
tagging 3.x as the "virsh define" actually works and the SEGV is only cosmetic.
Comment 13 Philipp Hahn univentionstaff 2014-10-21 15:15:54 CEST
core-virsh-32741-1412776854:           ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'virsh define /nfs/xenconfig/vmxml/vmxml-xnti06951

$ LANG=C date -d @1412776854
Wed Oct  8 16:00:54 CEST 2014

$ dpkg-query -W libvirt\*
libvirt-bin     0.9.12-5.123.201303061845
libvirt0        0.9.12-5.123.201303061845
Comment 14 Philipp Hahn univentionstaff 2017-04-21 16:24:02 CEST
UCS-3.x is OoM.
Xen is OoM.