Univention Bugzilla – Full Text Bug Listing |
Summary: | xenstored crash SEGV | ||
---|---|---|---|
Product: | UCS | Reporter: | Philipp Hahn <hahn> |
Component: | Virtualization - Xen | Assignee: | Philipp Hahn <hahn> |
Status: | CLOSED FIXED | QA Contact: | Erik Damrose <damrose> |
Severity: | normal | ||
Priority: | P5 | CC: | birkefeld, gohmann, meybohm, thorp-hansen, walkenhorst |
Version: | UCS 3.2 | ||
Target Milestone: | UCS 3.2-7-errata | ||
Hardware: | Other | ||
OS: | Linux | ||
See Also: | https://forge.univention.org/bugzilla/show_bug.cgi?id=38565 | ||
What kind of report is it?: | --- | What type of bug is this?: | --- |
Who will be affected by this bug?: | --- | How will those affected feel about the bug?: | --- |
User Pain: | Enterprise Customer affected?: | ||
School Customer affected?: | ISV affected?: | ||
Waiting Support: | Flags outvoted (downgraded) after PO Review: | ||
Ticket number: | Bug group (optional): | ||
Max CVSS v3 score: | |||
Attachments: | full backtrace |
Ticket#: 2014090421000492 Occured again (same customer) xen-4.1 4.1.3-14.39.201401192234 [15031897.332879] univention-conf[23001]: segfault at ffffffff00000000 ip 000000000045e238 sp 00007fff22eba440 error 7 in python2.6[400000+21e000] [15032699.394026] traps: xenstored[3867] general protection ip:4075c4 sp:7fffdb8aca10 error:0 in xenstored[400000+12000] [15032906.264928] libvirtd D ffff880074dd2140 0 6800 2544 0x00000000 [15032906.264932] ffff880074dd2140 0000000000000286 ffff88007cf8c880 0000000000013ec0 [15032906.264935] ffff880073e39fd8 0000000000013ec0 ffff880073e39fd8 0000000000013ec0 [15032906.264937] ffff880074dd2140 0000000000013ec0 0000000000013ec0 ffff880073e38010 [15032906.264940] Call Trace: [15032906.264949] [<ffffffff813c8e36>] ? schedule_preempt_disabled+0x9/0xb [15032906.264952] [<ffffffff813c7c2e>] ? __mutex_lock_common+0x1ea/0x251 [15032906.264955] [<ffffffff813c7d5b>] ? mutex_lock+0x1a/0x2c [15032906.264959] [<ffffffff81268083>] ? xs_talkv+0x56/0x174 [15032906.264962] [<ffffffff81268221>] ? xs_watch+0x4a/0x57 [15032906.264965] [<ffffffff81268ba6>] ? register_xenbus_watch+0xa4/0x111 [15032906.264969] [<ffffffff8126ad78>] ? xenbus_file_write+0x4a0/0x4b7 [15032906.264973] [<ffffffff8111bcd4>] ? __sb_start_write+0xc5/0xf9 [15032906.264976] [<ffffffff8111ad69>] ? vfs_write+0xd3/0x14c [15032906.264979] [<ffffffff8111aec4>] ? SyS_write+0x58/0x83 [15032906.264983] [<ffffffff813cf0a9>] ? system_call_fastpath+0x16/0x1b #0 talloc_chunk_from_ptr (ptr=0xff0000000000) at talloc.c:116 116 if ((tc->flags & ~0xF) != TALLOC_MAGIC) { warning: not using untrusted file "/root/xen-4.1-4.1.3/xen-4.1.3/tools/xenstore/.gdbinit" (gdb) bt #0 talloc_chunk_from_ptr (ptr=0xff0000000000) at talloc.c:116 #1 0x0000000000407edf in talloc_free (ptr=0xff0000000000) at talloc.c:551 #2 0x000000000040a348 in tdb_open_ex (name=0x167d620 "/var/lib/xenstored/tdb.0x16a48b0", hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized out>, mode=<value optimized out>, log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at tdb.c:1958 #3 0x000000000040a684 in tdb_open (name=0xff0000000000 <Address 0xff0000000000 out of bounds>, hash_size=0, tdb_flags=4254928, open_flags=-1, mode=3974450184) at tdb.c:1773 #4 0x000000000040a70b in tdb_copy (tdb=0x16c9040, outfile=0x167d620 "/var/lib/xenstored/tdb.0x16a48b0") at tdb.c:2124 #5 0x0000000000406c2d in do_transaction_start (conn=0x167e310, in=<value optimized out>) at xenstored_transaction.c:164 #6 0x00000000004045ca in process_message (conn=0x167e310) at xenstored_core.c:1214 #7 consider_message (conn=0x167e310) at xenstored_core.c:1261 #8 handle_input (conn=0x167e310) at xenstored_core.c:1308 #9 0x0000000000405170 in main (argc=<value optimized out>, argv=<value optimized out>) at xenstored_core.c:1964 (gdb) frame 2 #2 0x000000000040a348 in tdb_open_ex (name=0x167d620 "/var/lib/xenstored/tdb.0x16a48b0", hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized out>, mode=<value optimized out>, log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at tdb.c:1958 1958 SAFE_FREE(tdb->locked); (gdb) print tdb->locked $3 = (struct tdb_lock_type *) 0xff0000000000 Thread 1 (Thread 4221): #0 0x00007fcbf066088d in _IO_vfprintf_internal (s=0x7fff46ac3010, format=<value optimized out>, ap=0x7fff46ac3170) at vfprintf.c:1617 #1 0x00007fcbf0682732 in _IO_vsnprintf (string=0x7fff46ac318f "", maxlen=<value optimized out>, format=0x40d4a4 "%.*s", args=0x7fff46ac3170) at vsnprintf.c:120 #2 0x000000000040855b in talloc_vasprintf (t=0x17aaf20, fmt=0x40d4a4 "%.*s", ap=0x7fff46ac31d0) at talloc.c:1104 #3 0x0000000000408666 in talloc_asprintf (t=0x1f, fmt=0xffffe938 <Address 0xffffe938 out of bounds>) at talloc.c:1129 #4 0x0000000000403a38 in ask_parents (conn=0x177a1f0, name=0x17aaf20 "/local/domain/0/backend/vif/1/0/accel", perm=XS_PERM_READ) at xenstored_core.c:492 #5 errno_from_parents (conn=0x177a1f0, name=0x17aaf20 "/local/domain/0/backend/vif/1/0/accel", perm=XS_PERM_READ) at xenstored_core.c:516 #6 get_node (conn=0x177a1f0, name=0x17aaf20 "/local/domain/0/backend/vif/1/0/accel", perm=XS_PERM_READ) at xenstored_core.c:543 #7 0x000000000040481d in do_read (conn=0x177a1f0) at xenstored_core.c:744 #8 process_message (conn=0x177a1f0) at xenstored_core.c:1178 #9 consider_message (conn=0x177a1f0) at xenstored_core.c:1261 #10 handle_input (conn=0x177a1f0) at xenstored_core.c:1308 #11 0x0000000000405170 in main (argc=<value optimized out>, argv=<value optimized out>) at xenstored_core.c:1964 Created attachment 6246 [details]
full backtrace
Look like a memory corruption bug somewhere.
# cat .gdbinit
dir /root/XEN/eglibc-2.11.3/libio
dir /root/XEN/eglibc-2.11.3/stdio-common
file /usr/sbin/xenstored
target core /root/XEN/core-xenstored-4221-1414283702
set pagination 0
bt full
(In reply to Philipp Hahn from comment #2) Another crash is exactly the same location: # gdb /usr/sbin/xenstored ../../../../core-xenstored-3859-1415748063 --batch -ex "set pagination off" -ex 'bt full' -ex 'frame 2' -ex 'print tdb->locked' warning: Can't read pathname for load map: Input/output error. warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff48018000 Core was generated by `xenstored --pid-file=/var/run/xenstored.pid'. Program terminated with signal 11, Segmentation fault. #0 talloc_chunk_from_ptr (ptr=0xff00000000) at talloc.c:116 116 if ((tc->flags & ~0xF) != TALLOC_MAGIC) { #0 talloc_chunk_from_ptr (ptr=0xff00000000) at talloc.c:116 tc = <value optimized out> #1 0x0000000000407edf in talloc_free (ptr=0xff00000000) at talloc.c:551 tc = <value optimized out> #2 0x000000000040a348 in tdb_open_ex (name=0x1941fb0 "/var/lib/xenstored/tdb.0x1935bb0", hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized out>, mode=<value optimized out>, log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at tdb.c:1958 tdb = 0x1921270 st = {st_dev = 17, st_ino = 816913342, st_nlink = 1, st_mode = 33184, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 303104, st_blksize = 4096, st_blocks = 592, st_atim = {tv_sec = 1415748063, tv_nsec = 87562634}, st_mtim = {tv_sec = 1415748063, tv_nsec = 87562634}, st_ctim = {tv_sec = 1415748063, tv_nsec = 87562634}, __unused = {0, 0, 0}} rev = <value optimized out> locked = 4232112 vp = <value optimized out> #3 0x000000000040a684 in tdb_open (name=0xff00000000 <Address 0xff00000000 out of bounds>, hash_size=0, tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773 No locals. #4 0x000000000040a70b in tdb_copy (tdb=0x192e540, outfile=0x1941fb0 "/var/lib/xenstored/tdb.0x1935bb0") at tdb.c:2124 fd = <value optimized out> saved_errno = <value optimized out> copy = 0x0 #5 0x0000000000406c2d in do_transaction_start (conn=0x1939550, in=<value optimized out>) at xenstored_transaction.c:164 trans = 0x1935bb0 exists = <value optimized out> id_str = "\300L\222\001\000\000\000\000\330!@\000\000\000\000\000P\225\223\001" #6 0x00000000004045ca in process_message (conn=0x1939550) at xenstored_core.c:1214 trans = <value optimized out> #7 consider_message (conn=0x1939550) at xenstored_core.c:1261 No locals. #8 handle_input (conn=0x1939550) at xenstored_core.c:1308 bytes = <value optimized out> in = <value optimized out> #9 0x0000000000405170 in main (argc=<value optimized out>, argv=<value optimized out>) at xenstored_core.c:1964 next = 0x1939550 opt = 26622528 sock = 0x19200c0 ro_sock = 0x1920120 max = <value optimized out> addr = {sun_family = 1, sun_path = "/var/run/xenstored/socket_ro\000\000\000\000\000\000\000\000\000\000\350\212q\272\214\177\000\000\001\000\000\000\377\177\000\000\000\000\000\000\000\000\000\000\001\000\000\000\214\177\000\000(1r\272\214\177\000\000\340k\327\271\214\177\000\000\377\262\360\000\000\000\000\000\277\000\000\000\000\000\000\000n~\000H\377\177"} inset = {__fds_bits = {8192, 0 <repeats 15 times>}} outset = {__fds_bits = {0 <repeats 16 times>}} dofork = <value optimized out> outputpid = <value optimized out> no_domain_init = <value optimized out> pidfile = <value optimized out> evtchn_fd = 11 timeout = 0x0 #2 0x000000000040a348 in tdb_open_ex (name=0x1941fb0 "/var/lib/xenstored/tdb.0x1935bb0", hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized out>, mode=<value optimized out>, log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at tdb.c:1958 1958 SAFE_FREE(tdb->locked); $1 = (struct tdb_lock_type *) 0xff00000000 <http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg01173.html> Next idea: Try running xenstored under valgrind with xen support Probably a linux kernel <3.10.63 bug: <http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00191.html> Waiting for customer feedback. Happened again even with 3.10.0-ucs108-amd64 3.10.11-1.108.201412160733 <https://otrs.knut.univention.de/otrs/index.pl?Action=AgentTicketZoom;TicketID=880424;ArticleID=1756597> | [Xen-devel] [PATCH for-4.6] tools/xenstore: Correct use of va_end() after va_copy() <http://lists.xenproject.org/archives/html/xen-devel/2015-08/msg00572.html> This looks like a possible cause of memory/register corruption like we have seen. (In reply to Philipp Hahn from comment #9) > | [Xen-devel] [PATCH for-4.6] tools/xenstore: Correct use of va_end() after > va_copy() > <http://lists.xenproject.org/archives/html/xen-devel/2015-08/msg00572.html> > > This looks like a possible cause of memory/register corruption like we have > seen. Thanks. Please provide an erratum for UCS 3.2-7 with this patch. r63524 | Bug #35104 xen: Correct use of va_end() after va_copy() r63808 | Bug #35104: xenstored crash SEGV, Bug #38565: xen: Multiple issues 0072-tools-xenstore-Correct-use-of-va_end-after-va_copy.patch Package: xen-4.1 Version: 4.1.3-21.52.201509171449 Branch: ucs_3.2-0 Scope: errata3.2-7 r63809 | Bug #35104: xenstored crash SEGV, Bug #38565: xen: Multiple issues YAML 2015-09-17-xen-4.1.yaml See Bug #38565 for additional infos. OK: Patches adapted OK: Patches included in build OK: YAML (minor change in r63881) OK: Also release for UCS 3.2-6 OK: VM state after update @xen1; (UCS, w2k3, w2k12) OK: xm, virsh, uvmm -> Verified |
xenstored crashed again, after which all "xm", "xl" and "virsh" commands got stuck. A xenstored --pid-file=/var/run/xenstored.pid did resolve this, but this as not recommended, as xenstore is akin to /proc/ and is not designed to be restarted. Ticket #2014061121047372 > [2090451.721705] univention-conf[2512]: segfault at ff00000000 ip 000000000045e238 sp 00007ffff68dfa30 error 6 in python2.6[400000+21e000] > [3891866.927834] xend[9267]: segfault at d ip 00007f7c7846088d sp 00007f7c70b23dd0 error 4 in libc-2.11.3.so[7f7c7841b000+159000] > [5271254.627382] traps: xm[425] general protection ip:44e30d sp:7fffd9508830 error:0 in python2.6[400000+21e000] > [6561020.267481] traps: xenstored[3854] general protection ip:403185 sp:7fffff051f88 error:0 in xenstored[400000+12000]