Bug 35104 - xenstored crash SEGV
xenstored crash SEGV
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Virtualization - Xen
UCS 3.2
Other Linux
: P5 normal (vote)
: UCS 3.2-7-errata
Assigned To: Philipp Hahn
Erik Damrose
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-11 16:03 CEST by Philipp Hahn
Modified: 2015-09-23 13:14 CEST (History)
5 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
full backtrace (7.77 KB, text/plain)
2014-10-27 22:57 CET, Philipp Hahn
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2014-06-11 16:03:09 CEST
xenstored crashed again, after which all "xm", "xl" and "virsh" commands got stuck.

A
  xenstored --pid-file=/var/run/xenstored.pid
did resolve this, but this as not recommended, as xenstore is akin to /proc/ and is not designed to be restarted.

Ticket #2014061121047372

> [2090451.721705] univention-conf[2512]: segfault at ff00000000 ip 000000000045e238 sp 00007ffff68dfa30 error 6 in python2.6[400000+21e000]
> [3891866.927834] xend[9267]: segfault at d ip 00007f7c7846088d sp 00007f7c70b23dd0 error 4 in libc-2.11.3.so[7f7c7841b000+159000]
> [5271254.627382] traps: xm[425] general protection ip:44e30d sp:7fffd9508830 error:0 in python2.6[400000+21e000]
> [6561020.267481] traps: xenstored[3854] general protection ip:403185 sp:7fffff051f88 error:0 in xenstored[400000+12000]
Comment 1 Janis Meybohm univentionstaff 2014-09-05 15:27:24 CEST
Ticket#: 2014090421000492

Occured again (same customer) xen-4.1 4.1.3-14.39.201401192234

[15031897.332879] univention-conf[23001]: segfault at ffffffff00000000 ip 000000000045e238 sp 00007fff22eba440 error 7 in python2.6[400000+21e000]
[15032699.394026] traps: xenstored[3867] general protection ip:4075c4 sp:7fffdb8aca10 error:0 in xenstored[400000+12000]
[15032906.264928] libvirtd        D ffff880074dd2140     0  6800   2544 0x00000000
[15032906.264932]  ffff880074dd2140 0000000000000286 ffff88007cf8c880 0000000000013ec0
[15032906.264935]  ffff880073e39fd8 0000000000013ec0 ffff880073e39fd8 0000000000013ec0
[15032906.264937]  ffff880074dd2140 0000000000013ec0 0000000000013ec0 ffff880073e38010
[15032906.264940] Call Trace:
[15032906.264949]  [<ffffffff813c8e36>] ? schedule_preempt_disabled+0x9/0xb
[15032906.264952]  [<ffffffff813c7c2e>] ? __mutex_lock_common+0x1ea/0x251
[15032906.264955]  [<ffffffff813c7d5b>] ? mutex_lock+0x1a/0x2c
[15032906.264959]  [<ffffffff81268083>] ? xs_talkv+0x56/0x174
[15032906.264962]  [<ffffffff81268221>] ? xs_watch+0x4a/0x57
[15032906.264965]  [<ffffffff81268ba6>] ? register_xenbus_watch+0xa4/0x111
[15032906.264969]  [<ffffffff8126ad78>] ? xenbus_file_write+0x4a0/0x4b7
[15032906.264973]  [<ffffffff8111bcd4>] ? __sb_start_write+0xc5/0xf9
[15032906.264976]  [<ffffffff8111ad69>] ? vfs_write+0xd3/0x14c
[15032906.264979]  [<ffffffff8111aec4>] ? SyS_write+0x58/0x83
[15032906.264983]  [<ffffffff813cf0a9>] ? system_call_fastpath+0x16/0x1b
Comment 2 Philipp Hahn univentionstaff 2014-10-13 14:47:05 CEST
#0  talloc_chunk_from_ptr (ptr=0xff0000000000) at talloc.c:116
116             if ((tc->flags & ~0xF) != TALLOC_MAGIC) { 
warning: not using untrusted file "/root/xen-4.1-4.1.3/xen-4.1.3/tools/xenstore/.gdbinit"
(gdb) bt
#0  talloc_chunk_from_ptr (ptr=0xff0000000000) at talloc.c:116
#1  0x0000000000407edf in talloc_free (ptr=0xff0000000000) at talloc.c:551
#2  0x000000000040a348 in tdb_open_ex (name=0x167d620 "/var/lib/xenstored/tdb.0x16a48b0", 
    hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized out>, mode=<value optimized out>, 
    log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at tdb.c:1958
#3  0x000000000040a684 in tdb_open (name=0xff0000000000 <Address 0xff0000000000 out of bounds>, hash_size=0, 
    tdb_flags=4254928, open_flags=-1, mode=3974450184) at tdb.c:1773
#4  0x000000000040a70b in tdb_copy (tdb=0x16c9040, outfile=0x167d620 "/var/lib/xenstored/tdb.0x16a48b0")
    at tdb.c:2124
#5  0x0000000000406c2d in do_transaction_start (conn=0x167e310, in=<value optimized out>)
    at xenstored_transaction.c:164
#6  0x00000000004045ca in process_message (conn=0x167e310) at xenstored_core.c:1214
#7  consider_message (conn=0x167e310) at xenstored_core.c:1261
#8  handle_input (conn=0x167e310) at xenstored_core.c:1308
#9  0x0000000000405170 in main (argc=<value optimized out>, argv=<value optimized out>) at xenstored_core.c:1964

(gdb) frame 2
#2  0x000000000040a348 in tdb_open_ex (name=0x167d620 "/var/lib/xenstored/tdb.0x16a48b0", 
    hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized out>, mode=<value optimized out>, 
    log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at tdb.c:1958
1958            SAFE_FREE(tdb->locked);
(gdb) print tdb->locked
$3 = (struct tdb_lock_type *) 0xff0000000000
Comment 3 Philipp Hahn univentionstaff 2014-10-27 19:32:30 CET
Thread 1 (Thread 4221):
#0  0x00007fcbf066088d in _IO_vfprintf_internal (s=0x7fff46ac3010, format=<value optimized out>, ap=0x7fff46ac3170)
    at vfprintf.c:1617
#1  0x00007fcbf0682732 in _IO_vsnprintf (string=0x7fff46ac318f "", maxlen=<value optimized out>, 
    format=0x40d4a4 "%.*s", args=0x7fff46ac3170) at vsnprintf.c:120
#2  0x000000000040855b in talloc_vasprintf (t=0x17aaf20, fmt=0x40d4a4 "%.*s", ap=0x7fff46ac31d0) at talloc.c:1104
#3  0x0000000000408666 in talloc_asprintf (t=0x1f, fmt=0xffffe938 <Address 0xffffe938 out of bounds>)
    at talloc.c:1129
#4  0x0000000000403a38 in ask_parents (conn=0x177a1f0, name=0x17aaf20 "/local/domain/0/backend/vif/1/0/accel", 
    perm=XS_PERM_READ) at xenstored_core.c:492
#5  errno_from_parents (conn=0x177a1f0, name=0x17aaf20 "/local/domain/0/backend/vif/1/0/accel", perm=XS_PERM_READ)
    at xenstored_core.c:516
#6  get_node (conn=0x177a1f0, name=0x17aaf20 "/local/domain/0/backend/vif/1/0/accel", perm=XS_PERM_READ)
    at xenstored_core.c:543
#7  0x000000000040481d in do_read (conn=0x177a1f0) at xenstored_core.c:744
#8  process_message (conn=0x177a1f0) at xenstored_core.c:1178
#9  consider_message (conn=0x177a1f0) at xenstored_core.c:1261
#10 handle_input (conn=0x177a1f0) at xenstored_core.c:1308
#11 0x0000000000405170 in main (argc=<value optimized out>, argv=<value optimized out>) at xenstored_core.c:1964
Comment 4 Philipp Hahn univentionstaff 2014-10-27 22:57:58 CET
Created attachment 6246 [details]
full backtrace

Look like a memory corruption bug somewhere.

# cat .gdbinit 
dir /root/XEN/eglibc-2.11.3/libio
dir /root/XEN/eglibc-2.11.3/stdio-common
file /usr/sbin/xenstored
target core /root/XEN/core-xenstored-4221-1414283702
set pagination 0
bt full
Comment 5 Philipp Hahn univentionstaff 2014-11-12 12:45:26 CET
(In reply to Philipp Hahn from comment #2)

Another crash is exactly the same location:

# gdb /usr/sbin/xenstored ../../../../core-xenstored-3859-1415748063 --batch -ex "set pagination off" -ex 'bt full' -ex 'frame 2' -ex 'print tdb->locked'

warning: Can't read pathname for load map: Input/output error.

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff48018000
Core was generated by `xenstored --pid-file=/var/run/xenstored.pid'.
Program terminated with signal 11, Segmentation fault.
#0  talloc_chunk_from_ptr (ptr=0xff00000000) at talloc.c:116
116             if ((tc->flags & ~0xF) != TALLOC_MAGIC) { 
#0  talloc_chunk_from_ptr (ptr=0xff00000000) at talloc.c:116
        tc = <value optimized out>
#1  0x0000000000407edf in talloc_free (ptr=0xff00000000) at talloc.c:551
        tc = <value optimized out>
#2  0x000000000040a348 in tdb_open_ex (name=0x1941fb0 "/var/lib/xenstored/tdb.0x1935bb0", hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized out>, mode=<value optimized out>, log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at tdb.c:1958
        tdb = 0x1921270
        st = {st_dev = 17, st_ino = 816913342, st_nlink = 1, st_mode = 33184, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 303104, st_blksize = 4096, st_blocks = 592, st_atim = {tv_sec = 1415748063, tv_nsec = 87562634}, st_mtim = {tv_sec = 1415748063, tv_nsec = 87562634}, st_ctim = {tv_sec = 1415748063, tv_nsec = 87562634}, __unused = {0, 0, 0}}
        rev = <value optimized out>
        locked = 4232112
        vp = <value optimized out>
#3  0x000000000040a684 in tdb_open (name=0xff00000000 <Address 0xff00000000 out of bounds>, hash_size=0, tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773
No locals.
#4  0x000000000040a70b in tdb_copy (tdb=0x192e540, outfile=0x1941fb0 "/var/lib/xenstored/tdb.0x1935bb0") at tdb.c:2124
        fd = <value optimized out>
        saved_errno = <value optimized out>
        copy = 0x0
#5  0x0000000000406c2d in do_transaction_start (conn=0x1939550, in=<value optimized out>) at xenstored_transaction.c:164
        trans = 0x1935bb0
        exists = <value optimized out>
        id_str = "\300L\222\001\000\000\000\000\330!@\000\000\000\000\000P\225\223\001"
#6  0x00000000004045ca in process_message (conn=0x1939550) at xenstored_core.c:1214
        trans = <value optimized out>
#7  consider_message (conn=0x1939550) at xenstored_core.c:1261
No locals.
#8  handle_input (conn=0x1939550) at xenstored_core.c:1308
        bytes = <value optimized out>
        in = <value optimized out>
#9  0x0000000000405170 in main (argc=<value optimized out>, argv=<value optimized out>) at xenstored_core.c:1964
        next = 0x1939550
        opt = 26622528
        sock = 0x19200c0
        ro_sock = 0x1920120
        max = <value optimized out>
        addr = {sun_family = 1, sun_path = "/var/run/xenstored/socket_ro\000\000\000\000\000\000\000\000\000\000\350\212q\272\214\177\000\000\001\000\000\000\377\177\000\000\000\000\000\000\000\000\000\000\001\000\000\000\214\177\000\000(1r\272\214\177\000\000\340k\327\271\214\177\000\000\377\262\360\000\000\000\000\000\277\000\000\000\000\000\000\000n~\000H\377\177"}
        inset = {__fds_bits = {8192, 0 <repeats 15 times>}}
        outset = {__fds_bits = {0 <repeats 16 times>}}
        dofork = <value optimized out>
        outputpid = <value optimized out>
        no_domain_init = <value optimized out>
        pidfile = <value optimized out>
        evtchn_fd = 11
        timeout = 0x0
#2  0x000000000040a348 in tdb_open_ex (name=0x1941fb0 "/var/lib/xenstored/tdb.0x1935bb0", hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized out>, mode=<value optimized out>, log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at tdb.c:1958
1958            SAFE_FREE(tdb->locked);
$1 = (struct tdb_lock_type *) 0xff00000000
Comment 6 Philipp Hahn univentionstaff 2014-11-13 10:55:11 CET
<http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg01173.html>
Next idea: Try running xenstored under valgrind with xen support
Comment 7 Philipp Hahn univentionstaff 2015-01-09 17:13:50 CET
Probably a linux kernel <3.10.63 bug: <http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00191.html>

Waiting for customer feedback.
Comment 8 Philipp Hahn univentionstaff 2015-03-11 17:22:46 CET
Happened again even with 3.10.0-ucs108-amd64 3.10.11-1.108.201412160733
<https://otrs.knut.univention.de/otrs/index.pl?Action=AgentTicketZoom;TicketID=880424;ArticleID=1756597>
Comment 9 Philipp Hahn univentionstaff 2015-08-12 13:06:11 CEST
| [Xen-devel] [PATCH for-4.6] tools/xenstore: Correct use of va_end() after va_copy()
<http://lists.xenproject.org/archives/html/xen-devel/2015-08/msg00572.html>

This looks like a possible cause of memory/register corruption like we have seen.
Comment 10 Stefan Gohmann univentionstaff 2015-09-07 06:43:37 CEST
(In reply to Philipp Hahn from comment #9)
> | [Xen-devel] [PATCH for-4.6] tools/xenstore: Correct use of va_end() after
> va_copy()
> <http://lists.xenproject.org/archives/html/xen-devel/2015-08/msg00572.html>
> 
> This looks like a possible cause of memory/register corruption like we have
> seen.

Thanks. Please provide an erratum for UCS 3.2-7 with this patch.
Comment 11 Philipp Hahn univentionstaff 2015-09-08 17:31:52 CEST
r63524 | Bug #35104 xen: Correct use of va_end() after va_copy()
Comment 12 Philipp Hahn univentionstaff 2015-09-17 15:16:24 CEST
r63808 | Bug #35104: xenstored crash SEGV, Bug #38565: xen: Multiple issues
 0072-tools-xenstore-Correct-use-of-va_end-after-va_copy.patch

Package: xen-4.1
Version: 4.1.3-21.52.201509171449
Branch: ucs_3.2-0
Scope: errata3.2-7

r63809 | Bug #35104: xenstored crash SEGV, Bug #38565: xen: Multiple issues YAML
 2015-09-17-xen-4.1.yaml

See Bug #38565 for additional infos.
Comment 13 Erik Damrose univentionstaff 2015-09-22 16:13:07 CEST
OK: Patches adapted
OK: Patches included in build
OK: YAML (minor change in r63881)
OK: Also release for UCS 3.2-6
OK: VM state after update @xen1; (UCS, w2k3, w2k12)
OK: xm, virsh, uvmm
-> Verified
Comment 14 Janek Walkenhorst univentionstaff 2015-09-23 13:14:41 CEST
<http://errata.software-univention.de/ucs/3.2/372.html>