Bug 28017 - Aktualisierung von Server bleibt aus - Thread stirbt
Aktualisierung von Server bleibt aus - Thread stirbt
Status: CLOSED WORKSFORME
Product: UCS
Classification: Unclassified
Component: Virtualization - UVMM
UCS 4.2
amd64 Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
:
Depends on: 33741
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-23 11:23 CEST by Philipp Hahn
Modified: 2023-06-28 10:45 CEST (History)
0 users

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.057
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional): Troubleshooting
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2012-07-23 11:23:22 CEST
Intern aufgetreten auf laiva/skepp, als mehrere Snapshot gelöscht wurden und die VM von innen heruntergefahren wurde. Danach wurde die VM nicht als beendet angezeigt, obwohl sie es war.

Ursache war, daß der UVMMd-Thread auf "laiva" für die Aktualisierung des Servers "skepp" sich beendet hatte:

/var/log/univention/virtual-machine-manager-daemon.log:
2012-07-23 11:01:18,793 - uvmmd.node - DEBUG - timer_callback: qemu://utby.knut.univention.de/system
2012-07-23 11:01:19,972 - uvmmd.node - DEBUG - timer_callback: qemu://skepp.knut.univention.de/system
2012-07-23 11:01:20,733 - uvmmd.node - DEBUG - timer_callback: qemu://sniglar.knut.univention.de/system
...
2012-07-23 11:01:29,431 - uvmmd.node - WARNING - Timeout waiting for update.
...
2012-07-23 11:01:34,669 - uvmmd.node - DEBUG - timer_callback: qemu://utby.knut.univention.de/system
2012-07-23 11:01:36,108 - uvmmd.node - DEBUG - timer_callback: qemu://sniglar.knut.univention.de/system
Comment 1 Philipp Hahn univentionstaff 2012-07-23 11:41:08 CEST
Vermutlich hervorgerufen durch die "Cannot aquire state change lock"-Problematik in libvirt/qemu.

(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f4ed8fcf9ee in _L_cond_lock_1028 () from /lib/libpthread.so.0
#2  0x00007f4ed8fcf81b in __pthread_mutex_cond_lock (mutex=0x7f4ec4028f40) at ../nptl/pthread_mutex_lock.c:61
#3  0x00007f4ed8fca296 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:236
#4  0x00007f4edaf67ed6 in virCondWait (c=0x7f4ec4028f40, m=0x80) at /var/build/temp/tmp.un5h20lA25/pbuilder/libvirt-0.8.7/src/util/threads-pthread.c:108
#5  0x0000000000469286 in qemuMonitorSend (mon=0x7f4ec4028f40, msg=<value optimized out>) at /var/build/temp/tmp.un5h20lA25/pbuilder/libvirt-0.8.7/src/qemu/qemu_monitor.c:729
#6  0x000000000046b293 in qemuMonitorCommandWithHandler (mon=<value optimized out>, cmd=0x4b9159 "info balloon", passwordHandler=0, passwordOpaque=<value optimized out>, 
    scm_fd=-1, reply=0x7f4ec1ffaa70) at /var/build/temp/tmp.un5h20lA25/pbuilder/libvirt-0.8.7/src/qemu/qemu_monitor_text.c:233
#7  0x000000000046ead3 in qemuMonitorTextGetBalloonInfo (mon=0x7f4ec4028f40, currmem=0x7f4ec1ffaad8)
    at /var/build/temp/tmp.un5h20lA25/pbuilder/libvirt-0.8.7/src/qemu/qemu_monitor_text.c:564
#8  0x00000000004434a1 in qemudDomainGetInfo (dom=<value optimized out>, info=0x7f4ec1ffab60) at /var/build/temp/tmp.un5h20lA25/pbuilder/libvirt-0.8.7/src/qemu/qemu_driver.c:4011
#9  0x00007f4edafb716f in virDomainGetInfo (domain=0x7f4ecc029450, info=0x7f4ec1ffab60) at /var/build/temp/tmp.un5h20lA25/pbuilder/libvirt-0.8.7/src/libvirt.c:3191
#10 0x000000000042a6fe in remoteDispatchDomainGetInfo (server=<value optimized out>, client=<value optimized out>, conn=0x7f4ecc064aa0, hdr=<value optimized out>, 
    rerr=0x7f4ec1ffacc0, args=<value optimized out>, ret=0x7f4ec1ffac10) at /var/build/temp/tmp.un5h20lA25/pbuilder/libvirt-0.8.7/daemon/remote.c:1506
#11 0x000000000042d497 in remoteDispatchClientCall (server=0x1a123d0, client=0x1b93c40, msg=0x2473150)
    at /var/build/temp/tmp.un5h20lA25/pbuilder/libvirt-0.8.7/daemon/dispatch.c:526
#12 remoteDispatchClientRequest (server=0x1a123d0, client=0x1b93c40, msg=0x2473150) at /var/build/temp/tmp.un5h20lA25/pbuilder/libvirt-0.8.7/daemon/dispatch.c:404
#13 0x000000000041d6b8 in qemudWorker (data=<value optimized out>) at /var/build/temp/tmp.un5h20lA25/pbuilder/libvirt-0.8.7/daemon/libvirtd.c:1588
#14 0x00007f4ed8fc58ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#15 0x00007f4ed8b2886d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#16 0x0000000000000000 in ?? ()

Vgl. Bug #22096 und Bug #22789
Comment 2 Philipp Hahn univentionstaff 2012-07-23 11:54:43 CEST
(gdb) print mon->vm->state
$3 = 5
(gdb) print mon->vm->pid
$4 = -1
Scheinbar hat sich die VM beendet (5==libvirt.VIR_DOMAIN_SHUTOFF), aber trotzdem probiert libvirtd den qemu-Prozeß zu kontaktieren.
(gdb) print mon->vm->def->name
$6 = 0x7f4ebc075fa0 "jwalkenh-QA-3.0-64"

Race-Condition?
An Bug #22789 ist vermerkt, daß das in neueren libvirt-Versionen im Locking-Bereich Anpassungen vorgenommen wurden.
Comment 3 Philipp Hahn univentionstaff 2013-05-23 16:33:55 CEST
Ebenfalls in meiner Entwicklungsumgebung mit Xen aufgetreten: Nach einigen Neustarts von xend hat UVMM die Verbindung zum Host "xen3" verloren:

2013-05-23 15:26:06,969 - uvmmd.node - DEBUG - timer_callback: qemu://xen12.phahn.dev/s
2013-05-23 15:26:14,122 - uvmmd.node - DEBUG - timer_callback: xen://xen3.phahn.dev/
2013-05-23 15:26:17,180 - uvmmd.node - DEBUG - timer_callback: xen://xen4.phahn.dev/
2013-05-23 15:26:22,248 - uvmmd.node - DEBUG - timer_callback: qemu://xen12.phahn.dev/s
2013-05-23 15:26:32,209 - uvmmd.node - DEBUG - timer_callback: xen://xen4.phahn.dev/

gdb info threads
3: /usr/lib/pymodules/python2.6/univention/uvmm/node.py:580 xen://xen3.phahn.dev/

gdb bt
(gdb) bt
#0  0x00007f9c5be90c13 in *__GI___poll (fds=<value optimized out>, 
    nfds=<value optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f9c5b769541 in virNetClientIOEventLoop (client=0x7f9c482fe180, 
    thiscall=0x2efc230)
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/rpc/virnetclient.c:1356
#2  0x00007f9c5b769f31 in virNetClientIO (client=0x7f9c482fe180, msg=0x309bef0, 
    expectReply=true, nonBlock=false)
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/rpc/virnetclient.c:1618
#3  virNetClientSendInternal (client=0x7f9c482fe180, msg=0x309bef0, expectReply=true, 
    nonBlock=false)
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/rpc/virnetclient.c:1726
#4  0x00007f9c5b76a597 in virNetClientSendWithReply (client=0x7f9c482fe180, 
    msg=0x309bef0)
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/rpc/virnetclient.c:1757
#5  0x00007f9c5b76ad71 in virNetClientProgramCall (prog=0x2fb15f0, 
    client=<value optimized out>, serial=2, proc=1, noutfds=<value optimized out>, 
    outfds=<value optimized out>, ninfds=0x0, infds=0x0, 
    args_filter=0x7f9c5b75f8e0 <xdr_remote_open_args>, args=0x7f9c56366f80, 
    ret_filter=0x7f9c5bec4ed0 <xdr_void>, ret=0x0)
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/rpc/virnetclientprogram.c:330
#6  0x00007f9c5b744ffc in callWithFD (conn=<value optimized out>, priv=
    0x7f9c482f8000, flags=<value optimized out>, fd=<value optimized out>, proc_nr=1, 
    args_filter=0xfffffffffffffdfc, args=0x7f9c56366f80 "\030p6V\234\177", 
    ret_filter=0x7f9c5bec4ed0 <xdr_void>, ret=0x0)
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/remote/remote_driver.c:4736
#7  0x00007f9c5b74509c in call (conn=0x7f9c56366ac0, priv=0x2, flags=4294967295, 
    proc_nr=<value optimized out>, args_filter=<value optimized out>, 
    args=<value optimized out>, ret_filter=0x7f9c5bec4ed0 <xdr_void>, ret=0x0)
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/remote/remote_driver.c:4757
#8  0x00007f9c5b7570e7 in doRemoteOpen (conn=0x7f9c48451000, priv=0x7f9c482f8000, 
    auth=<value optimized out>, flags=<value optimized out>)
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/remote/remote_driver.c:711
#9  0x00007f9c5b758420 in remoteOpen (conn=0x7f9c48451000, auth=0x0, 
    flags=<value optimized out>)
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/remote/remote_driver.c:877
#10 0x00007f9c5b720f6f in do_open (name=<value optimized out>, auth=0x0, flags=0)
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/libvirt.c:1227
#11 0x00007f9c5b722066 in virConnectOpen (name=0x338b694 "xen://xen3.phahn.dev/")
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/libvirt.c:1384
#12 0x00007f9c5baebf34 in libvirt_virConnectOpen (self=<value optimized out>, 
    args=<value optimized out>) at libvirt.c:3657
#13 0x00000000004a7d25 in call_function (f=
    Frame 0x327af40, for file /usr/lib/python2.6/dist-packages/libvirt.py, line 235, in open (name='xen://xen3.phahn.dev/'), throwflag=<value optimized out>)
    at ../Python/ceval.c:3750
#14 PyEval_EvalFrameEx (f=
    Frame 0x327af40, for file /usr/lib/python2.6/dist-packages/libvirt.py, line 235, in open (name='xen://xen3.phahn.dev/'), throwflag=<value optimized out>)
    at ../Python/ceval.c:2412
#15 0x00000000004a8620 in fast_function (f=
    Frame 0x329b890, for file /usr/lib/pymodules/python2.6/univention/uvmm/node.py, line 580, in update_autoreconnect (self=<Node(_cache_id=-6662418663863463106, current_frequency=30000, libvirt_version=9012, domainCB=None, timerEvent=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5d50>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5d50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5d50>) at remote 0x338b510>) at remote 0x338b4d0>, timer=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _T...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:3836
#16 call_function (f=
    Frame 0x329b890, for file /usr/lib/pymodules/python2.6/univention/uvmm/node.py, line 580, in update_autoreconnect (self=<Node(_cache_id=-6662418663863463106, current_frequency=30000, libvirt_version=9012, domainCB=None, timerEvent=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5d50>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5d50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5d50>) at remote 0x338b510>) at remote 0x338b4d0>, timer=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _T...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:3771
#17 PyEval_EvalFrameEx (f=
    Frame 0x329b890, for file /usr/lib/pymodules/python2.6/univention/uvmm/node.py, line 580, in update_autoreconnect (self=<Node(_cache_id=-6662418663863463106, current_frequency=30000, libvirt_version=9012, domainCB=None, timerEvent=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5d50>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5d50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5d50>) at remote 0x338b510>) at remote 0x338b4d0>, timer=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _T...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:2412
#18 0x00000000004a8620 in fast_function (f=
    Frame 0x33ed6b0, for file /usr/lib/pymodules/python2.6/univention/uvmm/node.py, line 567, in run (self=<Node(_cache_id=-6662418663863463106, current_frequency=30000, libvirt_version=9012, domainCB=None, timerEvent=<_Event(_Verbose__verbose=False, _Event__fl
ag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5d50>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5d50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5d50>) at remote 0x338b510>) at remote 0x338b4d0>, timer=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:3836
#19 call_function (f=
    Frame 0x33ed6b0, for file /usr/lib/pymodules/python2.6/univention/uvmm/node.py, line 567, in run (self=<Node(_cache_id=-6662418663863463106, current_frequency=30000, libvirt_version=9012, domainCB=None, timerEvent=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5d50>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5d50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5d50>) at remote 0x338b510>) at remote 0x338b4d0>, timer=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:3771
#20 PyEval_EvalFrameEx (f=
    Frame 0x33ed6b0, for file /usr/lib/pymodules/python2.6/univention/uvmm/node.py, line 567, in run (self=<Node(_cache_id=-6662418663863463106, current_frequency=30000, libvirt_version=9012, domainCB=None, timerEvent=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5d50>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5d50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5d50>) at remote 0x338b510>) at remote 0x338b4d0>, timer=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:2412
#21 0x00000000004a9741 in PyEval_EvalCodeEx (co=0x2b84210, 
    globals=<value optimized out>, locals=<value optimized out>, args=0x1, 
    argcount=<value optimized out>, kws=<value optimized out>, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at ../Python/ceval.c:3000
#22 0x00000000005391ed in function_call (func=<function at remote 0x30742a8>, arg=
    (<Node(_cache_id=-6662418663863463106, current_frequency=30000, libvirt_version=9012, domainCB=None, timerEvent=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5d50>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5d50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5d50>) at remote 0x338b510>) at remote 0x338b4d0>, timer=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _C...(truncated), kw={}) at ../Objects/funcobject.c:524
#23 0x000000000041f067 in PyObject_Call (func=<function at remote 0x30742a8>, arg=
    <unknown at remote 0x2>, kw=<unknown at remote 0x-1>)
    at ../Objects/abstract.c:2492
#24 0x00000000004a6405 in ext_do_call (f=
    Frame 0x33b1300, for file /usr/lib/python2.6/threading.py, line 484, in run (self=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a98>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a98>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a98>) at remote 0x338b8d0>) at remote 0x338bcd0>, _Thread__stderr=<file at remote 0x7f9c5d3b91e0>, _Thread__target=<instancemethod at remote 0x2ebd5f0>, _Thread__kwargs={}, _Verbose__verbose=...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:4063
#25 PyEval_EvalFrameEx (f=
    Frame 0x33b1300, for file /usr/lib/python2.6/threading.py, line 484, in run (self=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a98>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a98>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a98>) at remote 0x338b8d0>) at remote 0x338bcd0>, _Thread__stderr=<file at remote 0x7f9c5d3b91e0>, _Thread__target=<instancemethod at remote 0x2ebd5f0>, _Thread__kwargs={}, _Verbose__verbose=...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:2452
#26 0x00000000004a8620 in fast_function (f=
    Frame 0x3291230, for file /usr/lib/python2.6/threading.py, line 532, in __bootstrap_inner (self=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a98>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a98>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a98>) at remote 0x338b8d0>) at remote 0x338bcd0>, _Thread__stderr=<file at remote 0x7f9c5d3b91e0>, _Thread__target=<instancemethod at remote 0x2ebd5f0>, _Thread__kwargs={}, _Ver...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:3836
#27 call_function (f=
    Frame 0x3291230, for file /usr/lib/python2.6/threading.py, line 532, in __bootstrap_inner (self=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a98>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a98>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a98>) at remote 0x338b8d0>) at remote 0x338bcd0>, _Thread__stderr=<file at remote 0x7f9c5d3b91e0>, _Thread__target=<instancemethod at remote 0x2ebd5f0>, _Thread__kwargs={}, _Ver...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:3771
#28 PyEval_EvalFrameEx (f=
    Frame 0x3291230, for file /usr/lib/python2.6/threading.py, line 532, in __bootstrap_inner (self=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a98>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a98>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a98>) at remote 0x338b8d0>) at remote 0x338bcd0>, _Thread__stderr=<file at remote 0x7f9c5d3b91e0>, _Thread__target=<instancemethod at remote 0x2ebd5f0>, _Thread__kwargs={}, _Ver...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:2412
#29 0x00000000004a8620 in fast_function (f=
    Frame 0x34bef10, for file /usr/lib/python2.6/threading.py, line 504, in __bootstrap (self=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a98>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a98>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a98>) at remote 0x338b8d0>) at remote 0x338bcd0>, _Thread__stderr=<file at remote 0x7f9c5d3b91e0>, _Thread__target=<instancemethod at remote 0x2ebd5f0>, _Thread__kwargs={}, _Verbose__...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:3836
#30 call_function (f=
    Frame 0x34bef10, for file /usr/lib/python2.6/threading.py, line 504, in __bootstrap (self=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a98>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a98>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a98>) at remote 0x338b8d0>) at remote 0x338bcd0>, _Thread__stderr=<file at remote 0x7f9c5d3b91e0>, _Thread__target=<instancemethod at remote 0x2ebd5f0>, _Thread__kwargs={}, _Verbose__...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:3771
#31 PyEval_EvalFrameEx (f=
    Frame 0x34bef10, for file /usr/lib/python2.6/threading.py, line 504, in __bootstrap (self=<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a98>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a98>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a98>) at remote 0x338b8d0>) at remote 0x338bcd0>, _Thread__stderr=<file at remote 0x7f9c5d3b91e0>, _Thread__target=<instancemethod at remote 0x2ebd5f0>, _Thread__kwargs={}, _Verbose__...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:2412
#32 0x00000000004a9741 in PyEval_EvalCodeEx (co=0x29a8c60, 
    globals=<value optimized out>, locals=<value optimized out>, args=0x1, 
    argcount=<value optimized out>, kws=<value optimized out>, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at ../Python/ceval.c:3000
#33 0x00000000005390f0 in function_call (func=<function at remote 0x29bb1b8>, arg=
    (<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a98>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a98>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a98>) at remote 0x338b8d0>) at remote 0x338bcd0>, _Thread__stderr=<file at remote 0x7f9c5d3b91e0>, _Thread__target=<instancemethod at remote 0x2ebd5f0>, _Thread__kwargs={}, _Verbose__verbose=False, _Thread__args=(), _Thread__stopped=False, _Thread__initialized=True) at re...(truncated), kw=0x0) at ../Objects/funcobject.c:524
#34 0x000000000041f067 in PyObject_Call (func=<function at remote 0x29bb1b8>, arg=
    <unknown at remote 0x2>, kw=<unknown at remote 0x-1>)
    at ../Objects/abstract.c:2492
#35 0x0000000000427d3f in instancemethod_call (func=<function at remote 0x29bb1b8>, 
    arg=
    (<Thread(_Thread__ident=140309438039808, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a08>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a08>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a08>) at remote 0x338bc90>, _Thread__name='xen://xen3.phahn.dev/', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33a5a98>, acquire=<built-in method acquire of thread.lock object at remote 0x33a5a98>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33a5a98>) at remote 0x338b8d0>) at remote 0x338bcd0>, _Thread__stderr=<file at remote 0x7f9c5d3b91e0>, _Thread__target=<instancemethod at remote 0x2ebd5f0>, _Thread__kwargs={}, _Verbose__verbose=False, _Thread__args=(), _Thread__stopped=False, _Thread__initialized=True) at re...(truncated), kw=0x0) at ../Objects/classobject.c:2579
#36 0x000000000041f067 in PyObject_Call (func=<instancemethod at remote 0x2f08140>, 
    arg=<unknown at remote 0x2>, kw=<unknown at remote 0x-1>)
    at ../Objects/abstract.c:2492
#37 0x00000000004a1bd3 in PyEval_CallObjectWithKeywords (func=
    <instancemethod at remote 0x2f08140>, arg=(), kw=<unknown at remote 0x-1>)
    at ../Python/ceval.c:3619
#38 0x00000000004d93fa in t_bootstrap (boot_raw=0x2f43620)
    at ../Modules/threadmodule.c:428
#39 0x00007f9c5cfcc8ca in start_thread (arg=<value optimized out>)
    at pthread_create.c:300
#40 0x00007f9c5be9bb6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#41 0x0000000000000000 in ?? ()
Comment 4 Philipp Hahn univentionstaff 2013-05-24 14:44:40 CEST
There's a discrepancy between the open connections between the UVMM server (xen12) and the Xen server (xen3):

xen12# netstat -tap | grep xen3
tcp        0      0 xen12.phahn.dev:51468   xen3.phahn.dev:16514    VERBUNDEN   12062/python2.6 
tcp        0      0 xen12.phahn.dev:34241   xen3.phahn.dev:16514    TIME_WAIT   -               
tcp        0      0 xen12.phahn.dev:51235   xen3.phahn.dev:16514    CLOSE_WAIT  12062/python2.6 
tcp6       0      0 xen12.phahn.dev:6669    xen3.phahn.dev:55692    VERBUNDEN   8556/univention-dir
tcp6       0      0 xen12.phahn.dev:6669    xen3.phahn.dev:34680    VERBUNDEN   8556/univention-dir

xen3#  netstat -tap | grep xen12
tcp        0      0 xen3.phahn.dev:55692    xen12.phahn.dev:6669    VERBUNDEN   2442/univention-dir

I think the following has happened:
1. UVMMd on xen12 tried to open a new connection to xen3
2. libvirtd on xen3 accepted that connection, but crashed while processing the REMOTE_PROC_OPEN call. (was renamed to REMOTE_PROC_CONNECT_OPEN later)
3. the connection on xen3 expired and was cleaned up automatically by the kernel
4. UVMMd on xen12 last sent the packet and is now still waiting for the answer.
5. Since the libvirt keepalive protocol is not enabled by default, UVMMd does not send regular packets and now still waits for the answer, which will never come.
6. Since the keepalive protocol can only be enabled for an opened connection, but the crash happened during doRemoteOpen(), UVMMd is now stuck.


The last "timer_callback" on xen12 roughly matches the process creation time of libvirtd on xen3:
2013-05-23 15:26:14,122 - uvmmd.node - DEBUG - timer_callback: xen://xen3.phahn.dev/

dr-xr-xr-x 8 root root 0 23. Mai 15:38 /proc/1855

That was the time when xen3 OOPSed due an OOM: Bug #25993 comment 2
May 23 15:23:19 xen3 kernel: [28591.129593] BUG: unable to handle kernel paging request at ffff8800cc86c000


There have been several changes to the keepalive protocol implementation in 0.9.14 and 0.10.0, but none of them seem relevant to solve this specific problem.
Comment 5 Philipp Hahn univentionstaff 2013-12-12 15:20:55 CET
This happened again on my QEMU test system xen12:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007feffa24de3a in virCondWait (c=0x16f57fc, m=0x80) at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/util/threads-pthread.c:117
#2  0x00007feffa32b0bb in virNetClientIO (client=0x194b320, msg=0x14f8850, expectReply=true, nonBlock=false) at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/rpc/virnetclient.c:1560
#3  virNetClientSendInternal (client=0x194b320, msg=0x14f8850, expectReply=true, nonBlock=false) at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/rpc/virnetclient.c:1726
#4  0x00007feffa32b597 in virNetClientSendWithReply (client=0x194b320, msg=0x14f8850) at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/rpc/virnetclient.c:1757
#5  0x00007feffa32bd71 in virNetClientProgramCall (prog=0x7fefec007bc0, client=<value optimized out>, serial=145097, proc=2, noutfds=<value optimized out>, outfds=<value optimized out>, ninfds=0x0, infds=0x0, 
    args_filter=0x7feffb2bded0 <xdr_void>, args=0x0, ret_filter=0x7feffb2bded0 <xdr_void>, ret=0x0) at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/rpc/virnetclientprogram.c:330
#6  0x00007feffa305ffc in callWithFD (conn=<value optimized out>, priv=0xdf6750, flags=<value optimized out>, fd=<value optimized out>, proc_nr=2, args_filter=0xfffffffffffffe00, args=0x0, ret_filter=0x7feffb2bded0 <xdr_void>, ret=0x0)
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/remote/remote_driver.c:4736
#7  0x00007feffa30609c in call (conn=0x16f57fc, priv=0x80, flags=1, proc_nr=<value optimized out>, args_filter=<value optimized out>, args=<value optimized out>, ret_filter=0x7feffb2bded0 <xdr_void>, ret=0x0)
    at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/remote/remote_driver.c:4757
#8  0x00007feffa30a5ae in doRemoteClose (conn=0x16f57fc, priv=0x80) at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/remote/remote_driver.c:906
#9  0x00007feffa30a8fb in remoteClose (conn=0x1788930) at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/remote/remote_driver.c:941
#10 0x00007feffa2c9a6b in virReleaseConnect (conn=0x1788930) at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/datatypes.c:114
#11 0x00007feffa2cb078 in virUnrefConnect (conn=0x1788930) at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/datatypes.c:149
#12 0x00007feffa2e1b78 in virConnectClose (conn=0x1788930) at /var/build/temp/tmp.iA6vnKy5jZ/pbuilder/libvirt-0.9.12/./src/libvirt.c:1505
#13 0x00007feffa6aee36 in libvirt_virConnectClose (self=<value optimized out>, args=<value optimized out>) at libvirt.c:2874
#14 0x00000000004a7d25 in call_function (f=Frame 0x1826740, for file /usr/lib/python2.6/dist-packages/libvirt.py, line 2459, in close (self=<virConnect(_o=<PyCObject at remote 0x19060a8>) at remote 0x17dd9e0>), 
    throwflag=<value optimized out>) at ../Python/ceval.c:3750
#15 PyEval_EvalFrameEx (f=Frame 0x1826740, for file /usr/lib/python2.6/dist-packages/libvirt.py, line 2459, in close (self=<virConnect(_o=<PyCObject at remote 0x19060a8>) at remote 0x17dd9e0>), throwflag=<value optimized out>)
    at ../Python/ceval.c:2412
#16 0x00000000004a8620 in fast_function (f=
    Frame 0x18fe3b0, for file /usr/lib/pymodules/python2.6/univention/uvmm/node.py, line 613, in update_autoreconnect (self=<Node(_cache_id=2229004160368851862, current_frequency=30000, libvirt_version=9012, domainCB=None, timerEvent=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7feffc6fc330>, acquire=<built-in method acquire of thread.lock object at remote 0x7feffc6fc330>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x7feffc6fc330>) at remote 0x147af50>) at remote 0x147afd0>, timer=<Thread(_Thread__ident=140668572722944, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x180a5e8>, acquire=<built-in method acquire of thread.lock object at remote 0x180a5e8>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x180a5e8>) at remote 0x16ba3d0>, _Thread__name='qemu://xen12.phahn.dev/system', _Thre...(truncated), throwflag=<value optimized out>) at ../Python/ceval.c:3836
#17 call_function (f=


(In reply to Philipp Hahn from comment #4)
> I think the following has happened:
> 1. UVMMd on xen12 tried to open a new connection to xen3
> 2. libvirtd on xen3 accepted that connection, but crashed while processing
> the REMOTE_PROC_OPEN call. (was renamed to REMOTE_PROC_CONNECT_OPEN later)
> 3. the connection on xen3 expired and was cleaned up automatically by the
> kernel
> 4. UVMMd on xen12 last sent the packet and is now still waiting for the
> answer.
> 5. Since the libvirt keepalive protocol is not enabled by default, UVMMd
> does not send regular packets and now still waits for the answer, which will
> never come.
> 6. Since the keepalive protocol can only be enabled for an opened
> connection, but the crash happened during doRemoteOpen(), UVMMd is now stuck.

# ps www `pidof runsvdir`
XXXXXXXXXXXXXXXXXXXXXXXXX000: 27119: info : libvirt version: 0.9.12
2013-12-12 13:42:27.929+0000: 27119: error : qemuDomainSnapshotLoad:388 : Too many snapshots claiming to be current for domain win7-64-master
2013-12-12 13:44:28.252+0000: 27279: info : libvirt version: 0.9.12
2013-12-12 13:44:28.252+0000: 27279: error : qemuDomainSnapshotLoad:388 : Too many snapshots claiming to be current for domain win7-64-master

lsof does not show any open connection to libvirtd.
It looks like libvirtd was restarted, which forcefully closed the connection between UVMMd and libvirtd one-sided.
UVMM lost its connection and is now stuck in waiting for the answer.

My favorite current candidate is /usr/lib/univention-virtual-machine-manager-daemon/uvmmd-check.sh, which has Bug #33741: UVMMd != libvirtd
It restart libvirtd which may cause the connection to suddenly close.
Comment 6 Philipp Hahn univentionstaff 2017-05-02 15:34:50 CEST
The bug lust occurred 4 years ago; since then we've updated the uvmmd check scripts (Bug #33741). Since then I have not seen that bug again.