Univention Bugzilla – Bug 33993
LDAP server restart is not reliable
Last modified: 2016-08-03 15:56:26 CEST
See Bug #33992, the LDAP server restart is not reliable. +++ This bug was initially created as a clone of Bug #33992 +++ It happens on S3 master only: ------------------------------------------------------------------------------- *** BEGIN *** ['/bin/bash', '68univention-admingrp-user-passwordreset-protected-groups'] *** E: Object exists: cn=autotest090.local,cn=domain,cn=mail,dc=autotest090,dc=local info 2014-01-24 19:05:05 create user k704j4ok Object created: uid=k704j4ok,cn=users,dc=autotest090,dc=local Object modified: cn=User Password Admins,cn=groups,dc=autotest090,dc=local info 2014-01-24 19:05:06 create group lixucryo with Mailaddress 6qsp677a3unf8sszbarx Object created: cn=lixucryo,cn=groups,dc=autotest090,dc=local info 2014-01-24 19:05:07 create user d8goq5kb Object created: uid=d8goq5kb,cn=users,dc=autotest090,dc=local Object modified: cn=lixucryo,cn=groups,dc=autotest090,dc=local info 2014-01-24 19:05:08 create user da1üy6cö Object created: uid=da1üy6cö,cn=users,dc=autotest090,dc=local Object modified: uid=da1üy6cö,cn=users,dc=autotest090,dc=local info 2014-01-24 19:05:09 create user r3gmu7bh Object created: uid=r3gmu7bh,cn=users,dc=autotest090,dc=local Waiting for replication: OK: replication complete (nid=4112 lid=4112) Done: replication complete. Waiting for postrun Setting ldap/acl/user/passwordreset/protected/gid Script: /etc/univention/templates/scripts/admingrp-user-passwordreset Create ldap/acl/user/passwordreset/internal/groupmemberlist/lixucryo Multifile: /etc/ldap/slapd.conf Multifile: /etc/ldap/slapd.conf Initiating graceful reload of ldap server(s). Sending HUP to ldap server(s): slapd ...done. Check database: ...Could not determine BDB version of /var/lib/univention-ldap/ldap. Skipping /usr/bin/db4.8_recover to avoid damage. Starting ldap server(s): slapd ...failed. 52e2ffd1 OVER: Loading Translog Overlay 52e2ffd1 OVER: db_init 52e2ffd1 OVER: Configuring Translog Overlay 52e2ffd1 OVER: Configured Translog Overlay to use file "/var/lib/univention-ldap/listener/listener" 52e2ffd1 OVER: db_close 52e2ffd1 OVER: db_destroy. Multifile: /etc/ldap/slapd.conf Traceback (most recent call last): File "/usr/share/univention-directory-manager-tools/univention-cli-server", line 222, in doit output = univention.admincli.admin.doit(arglist) File "/usr/lib/pymodules/python2.6/univention/admincli/admin.py", line 393, in doit out=_doit(arglist) File "/usr/lib/pymodules/python2.6/univention/admincli/admin.py", line 540, in _doit co=univention.admin.config.config(configRegistry['ldap/master']) File "/usr/lib/pymodules/python2.6/univention/admin/config.py", line 38, in __init__ base=univention.admin.uldap.getBaseDN(host) File "/usr/lib/pymodules/python2.6/univention/admin/uldap.py", line 62, in getBaseDN return result[0][1]['namingContexts'][0] KeyError: 'namingContexts' ------------------------------------------------------------------------------- From this point all LDAP operations fail. The slapd can not be stopped through the init script: root@master090:~# pidof slapd 17993 root@master090:~# /etc/init.d/slapd stop Stopping ldap server(s): slapd ...done. root@master090:~# pidof slapd 17993 root@master090:~# cat /var/run/slapd/slapd.pid cat: /var/run/slapd/slapd.pid: No such file or directory root@master090:~# The LDAP server is restarted very often in the LDAP test.
Happened again: Ticket #2016050621000287
Please provide the Erratum for UCS 4.1-1 and 4.1-2.
On a i386 system with ldap mdb and a mdb maxsize of at least 2147483648, slapd sometimes refuses to start (exactly the same setup as on Tiket #2016050621000287). amd64 works fine. -> while true; do /etc/init.d/slapd restart; done Jun 28 00:31:54 slave2 slapd[23930]: mdb_db_open: database "dc=w2k12,dc=test" cannot be opened: Cannot allocate memory (12). Restore from backup! Jun 28 00:31:54 slave2 slapd[23930]: backend_startup_one (type=mdb, suffix="dc=w2k12,dc=test"): bi_db_open failed! (12) Jun 28 00:31:54 slave2 slapd[23930]: slapd stopped. -> strace 4006 open("/var/lib/univention-ldap/ldap/data.mdb", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 17 4006 fstatfs64(17, 84, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12486714, f_bfree=11816933, f_bavail=11176882, f_files=3180464, f_ffree=3059825, f_fsid={11195914, 1546682992}, f_namelen=255, f_frsize=4096}) = 0 4006 uname({sys="Linux", node="slave2", ...}) = 0 4006 pread64(17, "\0\0\0\0\0\0\10\0\0\0\0\0\336\300\357\276\1\0\0\0\0\0\0\0\0\0\0\200\0\20\0\0\10\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\r\0\0\0\26\0\0\0\0\0\0\0\0\0\2\0\1\0\0\0\2\0\0\0\0\0\0\0T\0\0\0a\1\0\0t\1\0\0P\2\0\0", 92, 0) = 92 4006 pread64(17, "\1\0\0\0\0\0\10\0\0\0\0\0\336\300\357\276\1\0\0\0\0\0\0\0\0\0\0\200\0\20\0\0\10\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\16\0\0\0v\0\0\0\0\0\0\0\0\0\2\0\1\0\0\0\2\0\0\0\0\0\0\0T\0\0\0\33\0\0\0t\1\0\0Q\2\0\0", 92, 4096) = 92 4006 mmap2(NULL, 2147483648, PROT_READ, MAP_SHARED, 17, 0) = -1 ENOMEM (Cannot allocate memory) -> gdb --args /usr/sbin/slapd -d -1 -h "ldapi:/// ldap://:7389/ ldap://:389/ ldaps://:7636/ ldaps://:636/" break mdb_env_map run 3739 { (gdb) step 3741 unsigned int flags = env->me_flags; (gdb) 3739 { (gdb) 3782 if (flags & MDB_WRITEMAP) { (gdb) 3739 { (gdb) 3782 if (flags & MDB_WRITEMAP) { (gdb) 3781 int prot = PROT_READ; (gdb) 3787 env->me_map = mmap(addr, env->me_mapsize, prot, MAP_SHARED, (gdb) print pro No symbol "pro" in current context. (gdb) print prot $1 = 1 (gdb) print flags $2 = 805306368 (gdb) print env->me_mapsize $3 = 2147483648 (gdb) print addr $4 = (void *) 0x0 (gdb) step 3789 if (env->me_map == MAP_FAILED) { (gdb) 3787 env->me_map = mmap(addr, env->me_mapsize, prot, MAP_SHARED, (gdb) 3789 if (env->me_map == MAP_FAILED) { (gdb) print env->me_map $5 = 0xffffffff <Address 0xffffffff out of bounds> (gdb) $6 = 0xffffffff <Address 0xffffffff out of bounds> (gdb) step 3790 env->me_map = NULL; (gdb) 3791 return ErrCode(); (gdb) 0xb7a81440 in __errno_location () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0 Seems that slapd can not allocate enough memory for the database memory map. Strange thing is, that this happens sporadically. And my test script #include <stdio.h> #include <fcntl.h> #include <sys/mman.h> int main (int argc, char *argv[]) { char *p; int fd; fd = open (argv[0], O_RDONLY); int prot = PROT_READ; int size = 2147483648; fd = open (argv[0], O_RDONLY); p = mmap (NULL, size, prot, MAP_SHARED, fd, 0); if (p == MAP_FAILED) { printf ("p: %d", p); perror ("mmap"); return 1; } close (fd); return 0; } -> gcc -g -D_FILE_OFFSET_BITS=64 test.c always works. But here https://symas.com/getting-down-and-dirty-with-lmdb-qa-with-symas-corporations-howard-chu-about-symass-lightning-memory-mapped-database/ is explained, that there is indeed a limit for the size of database on i386 systems (which happens to be our default). A: No. The maximum database size is constrained only by the amount of disk space available and by the size of the address space on the machine. For 32-bit implementations it’s restricted to approximately 2^31 bytes (2 GB), and for 64-bit implementations, which typically bring 48 address bits out of the CPU, it’s restricted to 2^47 bytes (128 TB). The operating system takes care of moving data in and out of available memory as needed. This means that database sizes can be many multiples of available physical memory. So maybe we should reduce the mdb maxsize default? All i can do for now, is to triple check slapd start in /etc/init.d/slapd on i686 systems. If the start fails, just try it again. univention-ldap: 12.1.6-24.815.201606281334
Customer reported that reducing the maxsize made his slapd restart reliable again. Maybe we want to change the maxsize by default?
+1 from my side. It's a bit unclear though which value to choose. So, it might not be bad to have this fallback in place too? My recommendation to Felix was: In case a failure has been detected decrease maxsize by 1 and try again. This should converge to the the optimal value :-)
Changed default for i686 system to 1.9GB in univention-ldap. Not changed documentation as the recommendation is to not migrate i386 system to MDB. "If BDB is still in use, a migration to MDB should be performed for amd64 systems (not i386).
Ok, works.
<http://errata.software-univention.de/ucs/4.1/222.html>