Bug 52515 - slapd segfaults/deadlock on ppolicy
slapd segfaults/deadlock on ppolicy
Status: NEW
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 4.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: Nikola Radovanovic
UCS maintainers
https://git.openldap.org/openldap/ope...
:
Depends on: 52163
Blocks:
  Show dependency treegraph
 
Reported: 2020-12-18 14:24 CET by Felix Botner
Modified: 2022-12-06 11:45 CET (History)
10 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.200
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Botner univentionstaff 2020-12-18 14:24:20 CET
+++ This bug was initially created as a clone of Bug #52163 +++

Same environment as in #37915 (comment 3), 4.4-4 errata698 (contains fix for issue described there)

new segfaults

Sep 29 09:28:43 server kernel: [1718146.046322] traps: slapd[4483] general protection ip:7f31020c34ab sp:7f2ed5ffb3c0 error:0
Sep 29 09:28:43 server kernel: [1718146.046329]  in liblber-2.4.so.2.10.8[7f31020bb000+d000]
Sep 29 09:28:50 server kernel: [1718153.316897] traps: auth[746] general protection ip:7fe21dba24fc sp:7ffddef78e70 error:0
Sep 29 09:28:50 server kernel: [1718153.316903]  in libc-2.24.so[7fe21db29000+195000]
Sep 29 13:49:38 server kernel: [1733801.154092] traps: slapd[30238] general protection ip:7ff0988bc4ab sp:7fee23ffe3c0 error:0
Sep 29 13:49:38 server kernel: [1733801.154099]  in liblber-2.4.so.2.10.8[7ff0988b4000+d000]

2 of 3 comparable machines affected
Comment 1 Felix Botner univentionstaff 2020-12-18 14:49:05 CET
We updated the patches on Bug #52163, but that did not completely resolve the problem.

see Bug #52163

I can't see a segfault but with a little change on the 10_ldap/56ppolicy_account_lockout_concurrent i can reproduce the slapd "deadlock".

----------

--- a/test/ucs-test/tests/10_ldap/56ppolicy_account_lockout_concurrent
+++ b/test/ucs-test/tests/10_ldap/56ppolicy_account_lockout_concurrent
@@ -29,6 +29,7 @@ restart_slapd_if_it_hangs() {
        if pgrep -lf objectClass=OpenLDAProotDSE | grep -qe '\<ldapsearch\>'; then
                msg="slapd deadlock"
                echo "ERROR: slapd hangs, attempting kill and start" >&2
+               exit 0
                pid=$(lslocks -n -o PID,PATH | sed -n 's| /var/lib/univention-ldap/listener/listener.lock||p')
                if [ -n "$pid" ]; then
                        echo "INFO: Process locking /var/lib/univention-ldap/listener/listener.lock: " >&2
@@ -59,7 +60,7 @@ deactivate_ppolicy() {
        ucr unset ldap/ppolicy ldap/ppolicy/enabled; /etc/init.d/slapd restart
 }
 
-ucr set ldap/ppolicy=yes ldap/ppolicy/enabled=yes; /etc/init.d/slapd restart; undo deactivate_ppolicy
+#ucr set ldap/ppolicy=yes ldap/ppolicy/enabled=yes; /etc/init.d/slapd restart; undo deactivate_ppolicy
 
 default_ppolicy_ldif=$(univention-ldapsearch -LLL -b "cn=default,cn=ppolicy,cn=univention,$ldap_base")
 
@@ -94,7 +95,7 @@ pwdFailureCountInterval: $new_pwdFailureCountInterval
 test_username_list=()
 test_userdn_list=()
 
-num_testusers=5
+num_testusers=50
 for ((i=0;i<num_testusers;i++)); do
        test_username=$(user_randomname)
        user_create "$test_username" &&
----------

-> ucr set ldap/ppolicy=yes ldap/ppolicy/enabled=yes
-> /etc/init.d/slapd restart
-> /usr/share/ucs-test/10_ldap/56ppolicy_account_lockout_concurrent  -f
modifying entry "cn=default,cn=ppolicy,cn=univention,dc=new,dc=test"

info 2020-12-18 11:45:12	 create user a7vrrdsc using udm-test users/user create --position=cn=users,dc=new,dc=test --set username=a7vrrdsc --set firstname=Max --set lastname=Muster --set organisation=firma.de_GmbH --set password=univention
...
Check if slapd is still responsive: ERROR: slapd hangs, attempting kill and start
info 2020-12-18 11:47:03	 remove user gxs5fzzy

Now slapd hangs, i don't know how to debug this situation, but it happens every time.
Comment 3 Arvid Requate univentionstaff 2021-01-12 18:40:08 CET
Please provide a core file, package version and/or gdb trace.
Comment 6 Arvid Requate univentionstaff 2021-01-13 15:48:58 CET
The first frames with symbols look like this:
========================================================================
(gdb) bt full 
#0  ber_bvarray_free_x (a=0x55255c2552455652, ctx=0x0) at ../../../../libraries/liblber/memory.c:727
        i = 0
#1  0x00007f36866f3517 in ber_bvarray_free (a=<optimized out>) at ../../../../libraries/liblber/memory.c:741
No locals.
#2  0x0000555617ad4c89 in slap_mod_free (mod=0x7f2c18002500, freeit=0) at ../../../../servers/slapd/mods.c:463
No locals.
#3  0x0000555617ad4d0b in slap_mods_free (ml=ml@entry=0x7f2c18002500, freevals=freevals@entry=1) at ../../../../servers/slapd/mods.c:483
        next = 0x74
#4  0x00007f367b1db811 in ppolicy_bind_response (op=<optimized out>, rs=0x7f2c1fffeab0) at ../../../../../servers/slapd/overlays/ppolicy.c:1345
========================================================================

Source code libraries/liblber/memory.c +727:
========================================================================
void
ber_bvarray_free_x( BerVarray a, void *ctx )
{
        int i;

        if (a) {
                BER_MEM_VALID( a );

                /* count elements */
                for (i=0; a[i].bv_val; i++) ;   //<<<<here
========================================================================

Source code libraries/liblber/memory.c +741:
========================================================================
void
ber_bvarray_free( BerVarray a )
{
        ber_bvarray_free_x(a, NULL);
}
========================================================================

Source code servers/slapd/mods.c +463:
========================================================================
void
slap_mod_free(
        Modification    *mod,
        int                             freeit )
{
        if ( mod->sm_values != NULL ) ber_bvarray_free( mod->sm_values )
========================================================================

Source code servers/slapd/mods.c +483:
========================================================================
void
slap_mods_free(
    Modifications       *ml,
    int                 freevals )
{
        Modifications *next;

        for ( ; ml != NULL; ml = next ) {
                next = ml->sml_next;

                if ( freevals )
                        slap_mod_free( &ml->sml_mod, 0 );  //<<<< here
========================================================================

Source code servers/slapd/overlays/ppolicy.c +1345
========================================================================
static int
ppolicy_bind_response( Operation *op, SlapReply *rs )
{
        [...]
        if ( mod ) {
        [...]
                rc = op2.o_bd->be_modify( &op2, &r2 ); 
                slap_mods_free( mod, 1 );           //<<<< here
        }
========================================================================
Comment 8 Riya Bhattacharjee univentionstaff 2021-11-11 11:44:45 CET
The story has been moved to GitLab.
Please find the story in the following GitLab issue:
https://git.knut.univention.de/univention/ucs/-/issues/520
Comment 10 Dirk Ahrnke univentionstaff 2022-03-01 11:39:46 CET
Info for added backtrace

UCS: 4.4-8 errata1167

crash still occurring a few times per day
Comment 11 Arvid Requate univentionstaff 2022-03-02 10:32:26 CET
Ok, this is the most interesting part again:

Thread 1 (Thread 0x7f173f7fe700 (LWP 7632)):
#0  0x00007f218cc454ab in ber_bvarray_free_x () from /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2
#1  0x0000556d5ff08ce9 in slap_mod_free (mod=0x7f17284e7410, freeit=0) at ../../../../servers/slapd/mods.c:463
#2  0x0000556d5ff08d6b in slap_mods_free (ml=ml@entry=0x7f17284e7410, freevals=freevals@entry=1) at ../../../../servers/slapd/mods.c:483
#3  0x00007f218172c811 in ppolicy_bind_response (op=<optimized out>, rs=0x7f173f7fdab0) at ../../../../../servers/slapd/overlays/ppolicy.c:1345

Looking at servers/slapd/overlays/ppolicy.c +1345 shows that this happens if an account is locked.

Frame #1  0x0000556d5ff08ce9 in slap_mod_free (mod=0x7f17284e7410, freeit=0) at ../../../../servers/slapd/mods.c:463

has:

> if ( mod->sm_values != NULL ) ber_bvarray_free( mod->sm_values );

so that looks like at least one of the sm_values is an invalid pointer at that time.

Would be nice if we would be able to reproduce this again, collect a core and dig a bit deeper with gdb.
Comment 12 Dirk Ahrnke univentionstaff 2022-03-03 16:53:36 CET
some additional remarks and findings

the production system is using the following configuration:

pwdAllowUserChange: TRUE
pwdLockoutDuration: 3600
pwdFailureCountInterval: 900
pwdMaxFailure: 6
pwdLockout: TRUE
pwdAttribute: userPassword

I am not really sure if using "2.5.4.35" for pwdAttribute would really make a difference.

the affected hosts are IMAP/SMTP-backend server, installed as replica nodes. crashes have been observed for the local slapd as well as for the configured fallback server. 
There should be no other user logins on services beside IMAP and SMTP. Given that the login behaviour for IMAP-clients is more agressive than mail submission, especially when using local mail-clients in conjunction with expired/invalid passwords there is a high chance that the cause is to be seen here.

I could not correlate the crash timestamp to any failed login attempt (both Postfix and Dovecot). There is a chance that the crash may occur during the unlock of the account but there wasnt a timestamp exactly one hour before for pwdFailureTime or pwdAccountLockedTime. 

The production environment shows a mentionable number of accounts having older entries for pwdAccountLockedTime and pwdFailureTime. Some of them might be related to Bug 53072

During my tests in a comparable environment I have noticed a strange behavior regarding the evaluation of failed and succeful IMAP-logins into the ppolicy-attributes.

on a failed login attempt there is not always a new entry in pwdFailureTime
pwdFailureTime is seldom automatically reset once pwdFailureCountInterval without additional failed logins is reached.

pwdFailureTime is also not always deleted after the next successful login even after pwdFailureCountInterval is reached and a successful login happened.

Once pwdAccountLockedTime is set it is still possible to login with IMAP using valid credentials (LDAP-bind is locked).
Comment 13 Dirk Ahrnke univentionstaff 2022-03-04 11:22:35 CET
The described behaviour regarding the unreliable setting of pwdFailureTime for failed Dovecot authentication appears to be related to the auth cache in Dovecot. Once this is deactivated using UCRV mail/dovecot/auth/cache* the updates of the ppolicy-attributes occur as expected

I have opened Bug 54507 for the issue that Dovecot does not respect the Lockout-Status.
Comment 14 Nikola Radovanovic univentionstaff 2022-04-28 12:06:56 CEST
I was unable to reproduce this problem on my VM. OpenLDAP package was rebuild with "-ggdb3 -O0" so it can be used also with Valgrind, but no core was generated. What can/shall we do next?