Bug 20476 - libvirt crash SIGSEGV remoteDomainEventQueueFlush
libvirt crash SIGSEGV remoteDomainEventQueueFlush
Status: CLOSED WONTFIX
Product: UCS
Classification: Unclassified
Component: Virtualization - UVMM
UCS 2.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: Bugzilla Mailingliste
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-10-21 15:55 CEST by Philipp Hahn
Modified: 2023-06-28 10:45 CEST (History)
3 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2010-10-21 15:55:25 CEST
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x40efc950 (LWP 2043)]
remoteDomainEventQueueFlush (timer=<value optimized out>, opaque=<value optimized out>)
    at /tmp/buildd/libvirt-0.8.3/src/remote/remote_driver.c:10108
10108       priv->domainEventDispatching = 1;
(gdb) print priv
$1 = (struct private_data *) 0x43b130
(gdb) bt
#0  remoteDomainEventQueueFlush (timer=<value optimized out>, opaque=<value optimized out>)
    at /tmp/buildd/libvirt-0.8.3/src/remote/remote_driver.c:10108
#1  0x00007f1cdbd42668 in libvirt_virEventInvokeTimeoutCallback (self=<value optimized out>, 
    args=<value optimized out>) at /tmp/buildd/libvirt-0.8.3/python/libvirt-override.c:2970
#2  0x000000000047d914 in PyEval_EvalFrame (f=0x124ef50) at ../Python/ceval.c:3568
#3  0x000000000047de47 in PyEval_EvalFrame (f=0xc47eb0) at ../Python/ceval.c:3651
#4  0x000000000047de47 in PyEval_EvalFrame (f=0xa7c4e0) at ../Python/ceval.c:3651
#5  0x000000000047de47 in PyEval_EvalFrame (f=0xa979b0) at ../Python/ceval.c:3651
#6  0x000000000047de47 in PyEval_EvalFrame (f=0xa7c210) at ../Python/ceval.c:3651
#7  0x000000000047e7c5 in PyEval_EvalCodeEx (co=0x7f1cdc02d960, globals=<value optimized out>, 
    locals=<value optimized out>, args=0x7f1cde994068, argcount=0, kws=0xb318e0, kwcount=0, defs=0x0, defcount=0, 
    closure=0x0) at ../Python/ceval.c:2741
#8  0x00000000004ca5b6 in function_call (func=0x7f1cdc032c80, arg=0x7f1cde994050, kw=0xaa16f0)
    at ../Objects/funcobject.c:548
#9  0x0000000000416000 in PyObject_Call (func=0x43b130, arg=0x8948d024, kw=0x3de8ee8948df8948)
    at ../Objects/abstract.c:1795
#10 0x000000000047b7e0 in PyEval_EvalFrame (f=0xaba810) at ../Python/ceval.c:3845
#11 0x000000000047de47 in PyEval_EvalFrame (f=0xa2a980) at ../Python/ceval.c:3651
#12 0x000000000047e7c5 in PyEval_EvalCodeEx (co=0x7f1cde91aea0, globals=<value optimized out>, 
    locals=<value optimized out>, args=0x7f1cdc02f628, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, 
    closure=0x0) at ../Python/ceval.c:2741
#13 0x00000000004ca4d7 in function_call (func=0x7f1cde8b01b8, arg=0x7f1cdc02f610, kw=0x0)
    at ../Objects/funcobject.c:548
#14 0x0000000000416000 in PyObject_Call (func=0x43b130, arg=0x8948d024, kw=0x3de8ee8948df8948)
    at ../Objects/abstract.c:1795
#15 0x000000000041c5e7 in instancemethod_call (func=0x7f1cde8b01b8, arg=0x7f1cdc02f610, kw=0x0)
    at ../Objects/classobject.c:2532
#16 0x0000000000416000 in PyObject_Call (func=0x43b130, arg=0x8948d024, kw=0x3de8ee8948df8948)
    at ../Objects/abstract.c:1795
#17 0x0000000000477871 in PyEval_CallObjectWithKeywords (func=0x7f1cde8ee730, arg=0x7f1cde994050, kw=0x0)
    at ../Python/ceval.c:3435
#18 0x00000000004ad37d in t_bootstrap (boot_raw=0xb5aa30) at ../Modules/threadmodule.c:434
#19 0x00007f1cde5c1fc7 in start_thread () from /lib/libpthread.so.0
#20 0x00007f1cddcad59d in clone () from /lib/libc.so.6
#21 0x0000000000000000 in ?? ()

Vermutlich ist hier "priv" ungültig, da die Felder unsinnige Daten enthalten, insbesondere die Zeiger auf die sasl*()-Funktionen:
(gdb) print *priv
$16 = {lock = {lock = {__data = {__lock = 611092812, __count = 1955155176, __owner = -1991643100, 
        __nusers = 1552500981, __kind = -1991716828, __spins = 1238901868, __list = {__prev = 0x4ce02464894cd689, 
          __next = 0x48ec8348f8247c89}}, 
      __size = "L\211l$�L\211t$�I\211�H\211\\$�H\211l$�I\211�L\211d$�L\211|$�H\203�H", 
      __align = 8397327540136216908}}, sock = 142508360, errfd = 7369120, watch = 407341892, pid = 545229644, 
  uses_tls = -951568780, is_secure = -1375721401, session = 0x4818245c8b480043, 
  type = 0x24648b4c20246c8b <Address 0x24648b4c20246c8b out of bounds>, counter = 1821068328, 
  localUses = -1957941212, hostname = 0x40247c8b4c382474 <Address 0x40247c8b4c382474 out of bounds>, 
  debugLog = 0xfffc67e948c48348, saslconn = 0x801f0fff, 
  saslDecoded = 0xc36348d321c38944 <Address 0xc36348d321c38944 out of bounds>, saslDecodedLength = 1074040136, 
  saslDecodedOffset = 3341585737, saslEncoded = 0xfff8548087d8b48 <Address 0xfff8548087d8b48 out of bounds>, 
  saslEncodedLength = 61316, saslEncodedOffset = 4265166848, 
  buffer = "\017\204�\000\000\000H;=�\017/\000H\211l$\020t\023H9U\000\017\204�\000\000\000H�D$\020\000\000\000\000\215\004\233E\211�A\215T\004\001\211�D!�H\215\004@I\215,�H\213}\bH\205�\017\204\224\000\000\000I9�\211�u@�\224\000\000\000\017\037\204\000\000\000\000\000H;=a\017/\000\220t[\215\004\233A��\005A\215T\004\001\211�D!�H\215\004@I\215,�H\213}\bH\205�tTI9�t[\211�L9u\000u�H;=(\017/\000t�L\211�D\211D$\b��u\000\000\205�D\213D$\bu4H\213}\bH;=\005\017/\000u�H\203|"..., bufferLength = 2202664959, bufferOffset = 2336819437, callbackList = 0x6d8b49f024748d, 
  domainEvents = 0x3de8ee8948df8948, eventFlushTimer = 1224735199, domainEventDispatching = -266042231, 
  wakeupSendFD = -385875968, wakeupReadFD = -2604, waitDispatch = 0x245c8b4908ed8349, streams = 0x8b49f024748d4df0}



Die Debugging-Symbole vorher installieren:
  apt-get install libc6-dbg python2.4-dbg libvirt0-dbg
und die Quellen nach /tmp/buildd/ entpacken
  ucr set repository/online/sources=yes
  mkdir -p /tmp/buildd
  cd /tmp/buildd
  apt-get source libvirt
Debuggen per:
  univention-virtual-machine-manager-daemon -s 0.0.0.0 -v -v -v  &
  gdb -p `pgrep -f /usr/bin/python.*univention-virtual-machine-manager-daemon`

libvirt arbeitet ereignisorientiert: es werden zwar ggf. mehrere Threads verwendet, diese reihen aber ihre Arbeitspakete in eine globale Event-Loop ein. Diese wird per virEventRegisterImpl() registriert.

Events auf der Remote-Seite werden über die Socket-Verbindung gemultiplext.
Dazu registriert der lokale libvirtd in doRemoteOpen() die Funktion remoteDomainEventFired() für die Socket-Verbindung. Sobald dort etwas lesbares ansteht, wird von remoteIOHandleInput() die Daten deserialisiert und an processCallDispatch() übergeben, was Events an processCallDispatchMessage() weiterreicht. Dort wird jeder Event-Typ von einer eigenen Funktion remoteDomainReadEvent*() geparst und für die spätere Bearbeitung an die "priv->domainEvents" Warteschlange gehängt.
Diese wird dann beim nächsten Durchlauf durch die Main-Event-Loop durch remoteDomainEventQueueFlush() abgearbeitet.
Comment 1 Philipp Hahn univentionstaff 2010-10-21 16:24:05 CEST
UVMMd gibt kurz zuvor noch folgende Meldungen aus, was darauf hin deutet, daß die Verbindung kurz zuvor geschlossen wurde (Meldung "... broken? ...").

libvir: error : Unknown failure
2010-10-21 16:13:25,175 - uvmmd.node - ERROR - Exception Unknown failure: Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/univention/uvmm/node.py", line 553, in domain_callback
    domStat.update( dom )
  File "/usr/lib/python2.4/site-packages/univention/uvmm/node.py", line 279, in update
    info = domain.info()
  File "/usr/lib/python2.4/site-packages/libvirt.py", line 766, in info
    if ret is None: raise libvirtError ('virDomainGetInfo() failed', dom=self)
libvirtError: Unknown failure

libvir: Xen error : Domain not found: xenUnifiedDomainLookupByName
2010-10-21 16:13:42,486 - uvmmd.node - WARNING - 'xen://xen4.opendvdi.local/' broken? next check in 0:00:30.000. Domain not found: xenUnifiedDomainLookupByName

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x41a1a950 (LWP 6275)]
remoteDomainEventQueueFlush (timer=<value optimized out>, opaque=<value optimized out>)
    at /tmp/buildd/libvirt-0.8.3/src/remote/remote_driver.c:10108
10108       priv->domainEventDispatching = 1;

Was ich mir hier vorstellen kann ist, daß durch die Änderung von Bug #20024 jetzt noch Etwas über den Kanal übertragen wird, was dann von libvirt noch eingereiht wird. Danach wird vermutlich die Verbindung geschlossen, so daß dann "priv" ungültig ist.
Comment 2 Stefan Gohmann univentionstaff 2010-10-25 14:00:54 CEST
(In reply to comment #1)
> Was ich mir hier vorstellen kann ist, daß durch die Änderung von Bug #20024
> jetzt noch Etwas über den Kanal übertragen wird, was dann von libvirt noch
> eingereiht wird. Danach wird vermutlich die Verbindung geschlossen, so daß dann
> "priv" ungültig ist.

Da die Änderung nicht umgesetzt wurde, kann dieser Bug eigentlich wieder zu, oder?
Comment 3 Stefan Gohmann univentionstaff 2016-04-25 07:52:00 CEST
This issue has been filed against UCS 2.4.

UCS 2.4 is out of maintenance and many UCS components have vastly changed in
later releases. Thus, this issue is now being closed.

If this issue still occurs in newer UCS versions, please use "Clone this bug".
In this case please provide detailed information on how this issue is affecting
you.