Univention Bugzilla – Bug 52515
slapd segfaults/deadlock on ppolicy
Last modified: 2022-12-06 11:45:47 CET
+++ This bug was initially created as a clone of Bug #52163 +++ Same environment as in #37915 (comment 3), 4.4-4 errata698 (contains fix for issue described there) new segfaults Sep 29 09:28:43 server kernel: [1718146.046322] traps: slapd[4483] general protection ip:7f31020c34ab sp:7f2ed5ffb3c0 error:0 Sep 29 09:28:43 server kernel: [1718146.046329] in liblber-2.4.so.2.10.8[7f31020bb000+d000] Sep 29 09:28:50 server kernel: [1718153.316897] traps: auth[746] general protection ip:7fe21dba24fc sp:7ffddef78e70 error:0 Sep 29 09:28:50 server kernel: [1718153.316903] in libc-2.24.so[7fe21db29000+195000] Sep 29 13:49:38 server kernel: [1733801.154092] traps: slapd[30238] general protection ip:7ff0988bc4ab sp:7fee23ffe3c0 error:0 Sep 29 13:49:38 server kernel: [1733801.154099] in liblber-2.4.so.2.10.8[7ff0988b4000+d000] 2 of 3 comparable machines affected
We updated the patches on Bug #52163, but that did not completely resolve the problem. see Bug #52163 I can't see a segfault but with a little change on the 10_ldap/56ppolicy_account_lockout_concurrent i can reproduce the slapd "deadlock". ---------- --- a/test/ucs-test/tests/10_ldap/56ppolicy_account_lockout_concurrent +++ b/test/ucs-test/tests/10_ldap/56ppolicy_account_lockout_concurrent @@ -29,6 +29,7 @@ restart_slapd_if_it_hangs() { if pgrep -lf objectClass=OpenLDAProotDSE | grep -qe '\<ldapsearch\>'; then msg="slapd deadlock" echo "ERROR: slapd hangs, attempting kill and start" >&2 + exit 0 pid=$(lslocks -n -o PID,PATH | sed -n 's| /var/lib/univention-ldap/listener/listener.lock||p') if [ -n "$pid" ]; then echo "INFO: Process locking /var/lib/univention-ldap/listener/listener.lock: " >&2 @@ -59,7 +60,7 @@ deactivate_ppolicy() { ucr unset ldap/ppolicy ldap/ppolicy/enabled; /etc/init.d/slapd restart } -ucr set ldap/ppolicy=yes ldap/ppolicy/enabled=yes; /etc/init.d/slapd restart; undo deactivate_ppolicy +#ucr set ldap/ppolicy=yes ldap/ppolicy/enabled=yes; /etc/init.d/slapd restart; undo deactivate_ppolicy default_ppolicy_ldif=$(univention-ldapsearch -LLL -b "cn=default,cn=ppolicy,cn=univention,$ldap_base") @@ -94,7 +95,7 @@ pwdFailureCountInterval: $new_pwdFailureCountInterval test_username_list=() test_userdn_list=() -num_testusers=5 +num_testusers=50 for ((i=0;i<num_testusers;i++)); do test_username=$(user_randomname) user_create "$test_username" && ---------- -> ucr set ldap/ppolicy=yes ldap/ppolicy/enabled=yes -> /etc/init.d/slapd restart -> /usr/share/ucs-test/10_ldap/56ppolicy_account_lockout_concurrent -f modifying entry "cn=default,cn=ppolicy,cn=univention,dc=new,dc=test" info 2020-12-18 11:45:12 create user a7vrrdsc using udm-test users/user create --position=cn=users,dc=new,dc=test --set username=a7vrrdsc --set firstname=Max --set lastname=Muster --set organisation=firma.de_GmbH --set password=univention ... Check if slapd is still responsive: ERROR: slapd hangs, attempting kill and start info 2020-12-18 11:47:03 remove user gxs5fzzy Now slapd hangs, i don't know how to debug this situation, but it happens every time.
Please provide a core file, package version and/or gdb trace.
The first frames with symbols look like this: ======================================================================== (gdb) bt full #0 ber_bvarray_free_x (a=0x55255c2552455652, ctx=0x0) at ../../../../libraries/liblber/memory.c:727 i = 0 #1 0x00007f36866f3517 in ber_bvarray_free (a=<optimized out>) at ../../../../libraries/liblber/memory.c:741 No locals. #2 0x0000555617ad4c89 in slap_mod_free (mod=0x7f2c18002500, freeit=0) at ../../../../servers/slapd/mods.c:463 No locals. #3 0x0000555617ad4d0b in slap_mods_free (ml=ml@entry=0x7f2c18002500, freevals=freevals@entry=1) at ../../../../servers/slapd/mods.c:483 next = 0x74 #4 0x00007f367b1db811 in ppolicy_bind_response (op=<optimized out>, rs=0x7f2c1fffeab0) at ../../../../../servers/slapd/overlays/ppolicy.c:1345 ======================================================================== Source code libraries/liblber/memory.c +727: ======================================================================== void ber_bvarray_free_x( BerVarray a, void *ctx ) { int i; if (a) { BER_MEM_VALID( a ); /* count elements */ for (i=0; a[i].bv_val; i++) ; //<<<<here ======================================================================== Source code libraries/liblber/memory.c +741: ======================================================================== void ber_bvarray_free( BerVarray a ) { ber_bvarray_free_x(a, NULL); } ======================================================================== Source code servers/slapd/mods.c +463: ======================================================================== void slap_mod_free( Modification *mod, int freeit ) { if ( mod->sm_values != NULL ) ber_bvarray_free( mod->sm_values ) ======================================================================== Source code servers/slapd/mods.c +483: ======================================================================== void slap_mods_free( Modifications *ml, int freevals ) { Modifications *next; for ( ; ml != NULL; ml = next ) { next = ml->sml_next; if ( freevals ) slap_mod_free( &ml->sml_mod, 0 ); //<<<< here ======================================================================== Source code servers/slapd/overlays/ppolicy.c +1345 ======================================================================== static int ppolicy_bind_response( Operation *op, SlapReply *rs ) { [...] if ( mod ) { [...] rc = op2.o_bd->be_modify( &op2, &r2 ); slap_mods_free( mod, 1 ); //<<<< here } ========================================================================
The story has been moved to GitLab. Please find the story in the following GitLab issue: https://git.knut.univention.de/univention/ucs/-/issues/520
Info for added backtrace UCS: 4.4-8 errata1167 crash still occurring a few times per day
Ok, this is the most interesting part again: Thread 1 (Thread 0x7f173f7fe700 (LWP 7632)): #0 0x00007f218cc454ab in ber_bvarray_free_x () from /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2 #1 0x0000556d5ff08ce9 in slap_mod_free (mod=0x7f17284e7410, freeit=0) at ../../../../servers/slapd/mods.c:463 #2 0x0000556d5ff08d6b in slap_mods_free (ml=ml@entry=0x7f17284e7410, freevals=freevals@entry=1) at ../../../../servers/slapd/mods.c:483 #3 0x00007f218172c811 in ppolicy_bind_response (op=<optimized out>, rs=0x7f173f7fdab0) at ../../../../../servers/slapd/overlays/ppolicy.c:1345 Looking at servers/slapd/overlays/ppolicy.c +1345 shows that this happens if an account is locked. Frame #1 0x0000556d5ff08ce9 in slap_mod_free (mod=0x7f17284e7410, freeit=0) at ../../../../servers/slapd/mods.c:463 has: > if ( mod->sm_values != NULL ) ber_bvarray_free( mod->sm_values ); so that looks like at least one of the sm_values is an invalid pointer at that time. Would be nice if we would be able to reproduce this again, collect a core and dig a bit deeper with gdb.
some additional remarks and findings the production system is using the following configuration: pwdAllowUserChange: TRUE pwdLockoutDuration: 3600 pwdFailureCountInterval: 900 pwdMaxFailure: 6 pwdLockout: TRUE pwdAttribute: userPassword I am not really sure if using "2.5.4.35" for pwdAttribute would really make a difference. the affected hosts are IMAP/SMTP-backend server, installed as replica nodes. crashes have been observed for the local slapd as well as for the configured fallback server. There should be no other user logins on services beside IMAP and SMTP. Given that the login behaviour for IMAP-clients is more agressive than mail submission, especially when using local mail-clients in conjunction with expired/invalid passwords there is a high chance that the cause is to be seen here. I could not correlate the crash timestamp to any failed login attempt (both Postfix and Dovecot). There is a chance that the crash may occur during the unlock of the account but there wasnt a timestamp exactly one hour before for pwdFailureTime or pwdAccountLockedTime. The production environment shows a mentionable number of accounts having older entries for pwdAccountLockedTime and pwdFailureTime. Some of them might be related to Bug 53072 During my tests in a comparable environment I have noticed a strange behavior regarding the evaluation of failed and succeful IMAP-logins into the ppolicy-attributes. on a failed login attempt there is not always a new entry in pwdFailureTime pwdFailureTime is seldom automatically reset once pwdFailureCountInterval without additional failed logins is reached. pwdFailureTime is also not always deleted after the next successful login even after pwdFailureCountInterval is reached and a successful login happened. Once pwdAccountLockedTime is set it is still possible to login with IMAP using valid credentials (LDAP-bind is locked).
The described behaviour regarding the unreliable setting of pwdFailureTime for failed Dovecot authentication appears to be related to the auth cache in Dovecot. Once this is deactivated using UCRV mail/dovecot/auth/cache* the updates of the ppolicy-attributes occur as expected I have opened Bug 54507 for the issue that Dovecot does not respect the Lockout-Status.
I was unable to reproduce this problem on my VM. OpenLDAP package was rebuild with "-ggdb3 -O0" so it can be used also with Valgrind, but no core was generated. What can/shall we do next?