Bug 34013 - univention-dire[3809]: segfault
univention-dire[3809]: segfault
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Listener (univention-directory-listener)
UCS 3.2
amd64 Linux
: P5 normal (vote)
: UCS 3.2-3-errata
Assigned To: Arvid Requate
Stefan Gohmann
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-29 16:57 CET by Philipp Hahn
Modified: 2014-09-10 17:43 CEST (History)
2 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
Proposed patch (1.80 KB, patch)
2014-04-17 15:19 CEST, Philipp Hahn
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2014-01-29 16:57:03 CET
ucs-kt-get -A amd64 -V 3.2 Generic
# run through Univention-System-Setup-Boot
## Master
## FQDN=test.phahn.dev, PWD=univention
## IP=10.200.17.x, GW=10.200.17.1
## SW=ø
# login as root on console

root@test:~# [  347.997015] univention-dire[3809]: segfault at 18 ip 00007f6f464112ba sp 00007fffb892d810 error 4 in libpython2.6.so.1.0[7f6f463c1000+240000]

Seen several times, but unpronounceable right now.
Comment 1 Philipp Hahn univentionstaff 2014-02-11 09:26:44 CET
again
Comment 2 Philipp Hahn univentionstaff 2014-04-10 18:44:00 CEST
Bug #24411 reports another SEGV in the listener.
Comment 3 Philipp Hahn univentionstaff 2014-04-11 08:34:19 CEST
(In reply to Philipp Hahn from comment #2)
Wrong bug number: Bug #25094
I mention it here, because in system-setup mode the listener gets restarted, which would terminate the previous process.
Comment 4 Philipp Hahn univentionstaff 2014-04-17 15:19:36 CEST
Created attachment 5879 [details]
Proposed patch

During debugging for Bug #34355 I generated the following SIGSEGV from gdb:
667             if (dbp && (rv = dbp->close(dbp, 0)) != 0) {

In my case that happened because multiple signals were processed and the first SIGINT did not terminate the listener; only the subsequent SIGTERM did:

(gdb) print dbp
$1 = (DB *) 0x6471e0
(gdb) list
662     {
663             int rv;
664
665             if ( dbc_cur != NULL )
666                     cache_free_cursor(dbc_cur);
667             if (dbp && (rv = dbp->close(dbp, 0)) != 0) {
668                     dbp->err(dbp, rv, "close");
669             }
670     #ifdef WITH_DB42
671             if ((rv = dbenvp->close(dbenvp, 0)) != 0) {
(gdb) bt
#0  0x000000000040858d in cache_close () at cache.c:667
#1  0x000000000040f17a in exit_handler (sig=15) at signals.c:112
#2  <signal handler called>
#3  0x00007ffff6d783c3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82
#4  0x000000000040e471 in notifier_wait (client=0x6148c0, timeout=300) at network.c:498
#5  0x0000000000404872 in notifier_listen (lp=0x617110, kp=0x0, write_transaction_file=1, lp_local=0x617190)
    at notifier.c:120
#6  0x00000000004046c5 in main (argc=17, argv=0x7fffffffe4a8) at main.c:611

The attached patch does 3 things:
1. dbp=NULL after the close, to not double-free it. This should at least fix this issue.
2. Remove the global "dbc_cur", as the 3 callers of cache_first_entry() correctly free their cursor themselves.
3. Replace lockf(F_TEST)+lockf(F_LOCK) with lockf(F_TLOCK), as the former two system calls are not atomic.
Comment 5 Philipp Hahn univentionstaff 2014-05-02 12:36:17 CEST
(In reply to Philipp Hahn from comment #4)
> In my case that happened because multiple signals were processed and the
> first SIGINT did not terminate the listener; only the subsequent SIGTERM did:

While debugging Bug #34335 I noticed that SIGPIPE is used by libssl/gnutls/libldap to signal some timeout condition. If a second signal like SIGINT or SIGTERM happens while the first signal handler is still running, this could explain the segmentation fault.

I also found a 2 week old core file showing the SEGV happening while processing a signal. (the backtrace was not reliable, because I no longer had the exact listener binary installed in my development environment.)

Please also notice that not all functions are signal-save; read "man 7 signal" section "Async-signal-safe functions".
Comment 6 Philipp Hahn univentionstaff 2014-05-26 14:40:22 CEST
(gdb) signal SIGINT
Continuing with signal SIGINT.
26.05.14 14:48:41.762  LISTENER    ( WARN    ) : received signal 2
26.05.14 14:48:41.762  LISTENER    ( INFO    ) : postrun handler: nfs-shares (prepared=0)
26.05.14 14:48:41.762  LISTENER    ( INFO    ) : postrun handler: nfs-homes (prepared=0)
26.05.14 14:48:41.762  LISTENER    ( INFO    ) : postrun handler: keytab-member (prepared=0)
26.05.14 14:48:41.762  LISTENER    ( INFO    ) : postrun handler: faillog (prepared=-1)
26.05.14 14:48:41.762  LISTENER    ( INFO    ) : postrun handler: ldap_server (prepared=0)
26.05.14 14:48:41.762  LISTENER    ( INFO    ) : postrun handler: license_uuid (prepared=0)
26.05.14 14:48:41.762  LISTENER    ( INFO    ) : postrun handler: bind (prepared=0)
26.05.14 14:48:41.762  LISTENER    ( INFO    ) : postrun handler: well-known-sid-name-mapping (prepared=-1)

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff73262ba in PyObject_Call () from /usr/lib/libpython2.6.so.1.0
(gdb) bt
#0  0x00007ffff73262ba in PyObject_Call () from /usr/lib/libpython2.6.so.1.0
#1  0x00007ffff73c6343 in PyEval_CallObjectWithKeywords () from /usr/lib/libpython2.6.so.1.0
#2  0x0000000000405f2c in handler_postrun (handler=0x9529a0) at handlers.c:348
#3  0x0000000000405fbd in handlers_postrun_all () at handlers.c:368
#4  0x000000000040f8f9 in exit_handler (sig=2) at signals.c:119
#5  <signal handler called>
#6  0x00007ffff6d72870 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:82
#7  0x00007ffff6d1ca1f in _IO_file_xsgetn (fp=0xf92ce0, data=0x1044b48, n=8192) at fileops.c:1465
#8  0x00007ffff6d12c42 in _IO_fread (buf=0x1044a94, size=1, count=8192, fp=0xffffffffffffffff) at iofread.c:44
#9  0x00007ffff734b7ac in ?? () from /usr/lib/libpython2.6.so.1.0
#10 0x00007ffff73cc1c0 in PyEval_EvalFrameEx () from /usr/lib/libpython2.6.so.1.0
#11 0x00007ffff73cdf00 in PyEval_EvalCodeEx () from /usr/lib/libpython2.6.so.1.0
#12 0x00007ffff73cc23b in PyEval_EvalFrameEx () from /usr/lib/libpython2.6.so.1.0
#13 0x00007ffff73cdf00 in PyEval_EvalCodeEx () from /usr/lib/libpython2.6.so.1.0
#14 0x00007ffff73cc23b in PyEval_EvalFrameEx () from /usr/lib/libpython2.6.so.1.0
#15 0x00007ffff73cdf00 in PyEval_EvalCodeEx () from /usr/lib/libpython2.6.so.1.0
#16 0x00007ffff7353b80 in ?? () from /usr/lib/libpython2.6.so.1.0
#17 0x00007ffff73262d3 in PyObject_Call () from /usr/lib/libpython2.6.so.1.0
#18 0x00007ffff73c6343 in PyEval_CallObjectWithKeywords () from /usr/lib/libpython2.6.so.1.0
#19 0x0000000000405f2c in handler_postrun (handler=0x9529a0) at handlers.c:348
#20 0x0000000000405fbd in handlers_postrun_all () at handlers.c:368
#21 0x0000000000404aa0 in notifier_listen (lp=0x617110, kp=0x0, write_transaction_file=0, lp_local=0x617190)
    at notifier.c:140
#22 0x00000000004047b3 in main (argc=16, argv=0x7fffffffe5f8) at main.c:612

Perhaps the signal handler should just set a global flag, so that the loop calling select() just exits.
Comment 7 Arvid Requate univentionstaff 2014-09-08 18:39:43 CEST
Ok, the patch has been applied and the exit_handler now checks if it is running already. Advisory: 2014-09-08-univention-directory-listener.yaml
Comment 8 Stefan Gohmann univentionstaff 2014-09-09 07:32:59 CEST
Tests: I was unable to reproduce it, but the code looks good and ucs-test run successfully on all system roles → verified

Code review: OK

YAML: OK

UCS 4.0 merge: OK
Comment 9 Janek Walkenhorst univentionstaff 2014-09-10 17:43:40 CEST
http://errata.univention.de/ucs/3.2/201.html