Univention Bugzilla – Bug 34013
univention-dire[3809]: segfault
Last modified: 2014-09-10 17:43:40 CEST
ucs-kt-get -A amd64 -V 3.2 Generic # run through Univention-System-Setup-Boot ## Master ## FQDN=test.phahn.dev, PWD=univention ## IP=10.200.17.x, GW=10.200.17.1 ## SW=ø # login as root on console root@test:~# [ 347.997015] univention-dire[3809]: segfault at 18 ip 00007f6f464112ba sp 00007fffb892d810 error 4 in libpython2.6.so.1.0[7f6f463c1000+240000] Seen several times, but unpronounceable right now.
again
Bug #24411 reports another SEGV in the listener.
(In reply to Philipp Hahn from comment #2) Wrong bug number: Bug #25094 I mention it here, because in system-setup mode the listener gets restarted, which would terminate the previous process.
Created attachment 5879 [details] Proposed patch During debugging for Bug #34355 I generated the following SIGSEGV from gdb: 667 if (dbp && (rv = dbp->close(dbp, 0)) != 0) { In my case that happened because multiple signals were processed and the first SIGINT did not terminate the listener; only the subsequent SIGTERM did: (gdb) print dbp $1 = (DB *) 0x6471e0 (gdb) list 662 { 663 int rv; 664 665 if ( dbc_cur != NULL ) 666 cache_free_cursor(dbc_cur); 667 if (dbp && (rv = dbp->close(dbp, 0)) != 0) { 668 dbp->err(dbp, rv, "close"); 669 } 670 #ifdef WITH_DB42 671 if ((rv = dbenvp->close(dbenvp, 0)) != 0) { (gdb) bt #0 0x000000000040858d in cache_close () at cache.c:667 #1 0x000000000040f17a in exit_handler (sig=15) at signals.c:112 #2 <signal handler called> #3 0x00007ffff6d783c3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 #4 0x000000000040e471 in notifier_wait (client=0x6148c0, timeout=300) at network.c:498 #5 0x0000000000404872 in notifier_listen (lp=0x617110, kp=0x0, write_transaction_file=1, lp_local=0x617190) at notifier.c:120 #6 0x00000000004046c5 in main (argc=17, argv=0x7fffffffe4a8) at main.c:611 The attached patch does 3 things: 1. dbp=NULL after the close, to not double-free it. This should at least fix this issue. 2. Remove the global "dbc_cur", as the 3 callers of cache_first_entry() correctly free their cursor themselves. 3. Replace lockf(F_TEST)+lockf(F_LOCK) with lockf(F_TLOCK), as the former two system calls are not atomic.
(In reply to Philipp Hahn from comment #4) > In my case that happened because multiple signals were processed and the > first SIGINT did not terminate the listener; only the subsequent SIGTERM did: While debugging Bug #34335 I noticed that SIGPIPE is used by libssl/gnutls/libldap to signal some timeout condition. If a second signal like SIGINT or SIGTERM happens while the first signal handler is still running, this could explain the segmentation fault. I also found a 2 week old core file showing the SEGV happening while processing a signal. (the backtrace was not reliable, because I no longer had the exact listener binary installed in my development environment.) Please also notice that not all functions are signal-save; read "man 7 signal" section "Async-signal-safe functions".
(gdb) signal SIGINT Continuing with signal SIGINT. 26.05.14 14:48:41.762 LISTENER ( WARN ) : received signal 2 26.05.14 14:48:41.762 LISTENER ( INFO ) : postrun handler: nfs-shares (prepared=0) 26.05.14 14:48:41.762 LISTENER ( INFO ) : postrun handler: nfs-homes (prepared=0) 26.05.14 14:48:41.762 LISTENER ( INFO ) : postrun handler: keytab-member (prepared=0) 26.05.14 14:48:41.762 LISTENER ( INFO ) : postrun handler: faillog (prepared=-1) 26.05.14 14:48:41.762 LISTENER ( INFO ) : postrun handler: ldap_server (prepared=0) 26.05.14 14:48:41.762 LISTENER ( INFO ) : postrun handler: license_uuid (prepared=0) 26.05.14 14:48:41.762 LISTENER ( INFO ) : postrun handler: bind (prepared=0) 26.05.14 14:48:41.762 LISTENER ( INFO ) : postrun handler: well-known-sid-name-mapping (prepared=-1) Program received signal SIGSEGV, Segmentation fault. 0x00007ffff73262ba in PyObject_Call () from /usr/lib/libpython2.6.so.1.0 (gdb) bt #0 0x00007ffff73262ba in PyObject_Call () from /usr/lib/libpython2.6.so.1.0 #1 0x00007ffff73c6343 in PyEval_CallObjectWithKeywords () from /usr/lib/libpython2.6.so.1.0 #2 0x0000000000405f2c in handler_postrun (handler=0x9529a0) at handlers.c:348 #3 0x0000000000405fbd in handlers_postrun_all () at handlers.c:368 #4 0x000000000040f8f9 in exit_handler (sig=2) at signals.c:119 #5 <signal handler called> #6 0x00007ffff6d72870 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:82 #7 0x00007ffff6d1ca1f in _IO_file_xsgetn (fp=0xf92ce0, data=0x1044b48, n=8192) at fileops.c:1465 #8 0x00007ffff6d12c42 in _IO_fread (buf=0x1044a94, size=1, count=8192, fp=0xffffffffffffffff) at iofread.c:44 #9 0x00007ffff734b7ac in ?? () from /usr/lib/libpython2.6.so.1.0 #10 0x00007ffff73cc1c0 in PyEval_EvalFrameEx () from /usr/lib/libpython2.6.so.1.0 #11 0x00007ffff73cdf00 in PyEval_EvalCodeEx () from /usr/lib/libpython2.6.so.1.0 #12 0x00007ffff73cc23b in PyEval_EvalFrameEx () from /usr/lib/libpython2.6.so.1.0 #13 0x00007ffff73cdf00 in PyEval_EvalCodeEx () from /usr/lib/libpython2.6.so.1.0 #14 0x00007ffff73cc23b in PyEval_EvalFrameEx () from /usr/lib/libpython2.6.so.1.0 #15 0x00007ffff73cdf00 in PyEval_EvalCodeEx () from /usr/lib/libpython2.6.so.1.0 #16 0x00007ffff7353b80 in ?? () from /usr/lib/libpython2.6.so.1.0 #17 0x00007ffff73262d3 in PyObject_Call () from /usr/lib/libpython2.6.so.1.0 #18 0x00007ffff73c6343 in PyEval_CallObjectWithKeywords () from /usr/lib/libpython2.6.so.1.0 #19 0x0000000000405f2c in handler_postrun (handler=0x9529a0) at handlers.c:348 #20 0x0000000000405fbd in handlers_postrun_all () at handlers.c:368 #21 0x0000000000404aa0 in notifier_listen (lp=0x617110, kp=0x0, write_transaction_file=0, lp_local=0x617190) at notifier.c:140 #22 0x00000000004047b3 in main (argc=16, argv=0x7fffffffe5f8) at main.c:612 Perhaps the signal handler should just set a global flag, so that the loop calling select() just exits.
Ok, the patch has been applied and the exit_handler now checks if it is running already. Advisory: 2014-09-08-univention-directory-listener.yaml
Tests: I was unable to reproduce it, but the code looks good and ucs-test run successfully on all system roles → verified Code review: OK YAML: OK UCS 4.0 merge: OK
http://errata.univention.de/ucs/3.2/201.html