Univention Bugzilla – Bug 50114
MDB Maxsize too small during Join
Last modified: 2019-10-02 15:55:01 CEST
Customer has increased mdb size on his servers as he has a lot of objects: root@master:~# ucr set ldap/database/mdb/maxsize=4294967296 Now the join of any new slave fails with: 05.09.19 13:58:41.069 LISTENER ( ERROR ) : cache.c:469:cache_update_entry_in_transaction mdb_put: failed: MDB_MAP_FULL: Environment mapsize limit reached (-30792) Reason is the default value of MDB after installation which has not been increased: root@slave:~# ucr get ldap/database/mdb/maxsize 2147483648 We should somehow get the database size from the master and set the local size accordingly to improve the join. Setting the value manually after installation and joining afterwards is not a suitable way as this extremely slows down installation of new systems!
For the customer the "workaround" (Install w/o join. Set UCRV. Reboot. Join) is not suitable as it hinders the automated setup procedure for EVERY school slave he is installing.
UCR settings also be predefined by UCR policies in LDAP. Does a policy apply early enough in the join to be considered during the initial LDAP replication?
There are several settings we already get from the master (ssh ucr) and set in the local ucs db (windows/domain). I see no reason why we couldn't do that for the LDAP db settings.
Additionally, increase listener/cache/mdb/maxsize This happens in large customer environment (currently around 55.000 users): ====================================================== root@master:~ # mdb_stat -e /var/lib/univention-ldap/ldap/ Environment Info Map address: (nil) Map size: 4294967296 Page size: 4096 Max pages: 1048576 Number of pages used: 696260 Last transaction ID: 4033816 Max readers: 126 Number of readers used: 16 Status of Main DB Tree depth: 2 Branch pages: 1 Leaf pages: 3 Overflow pages: 0 Entries: 87 root@master:~ # mdb_stat -e /var/lib/univention-ldap/translog/ Environment Info Map address: (nil) Map size: 4294967296 Page size: 4096 Max pages: 1048576 Number of pages used: 589316 Last transaction ID: 2515346 Max readers: 126 Number of readers used: 15 Status of Main DB Tree depth: 1 Branch pages: 0 Leaf pages: 1 Overflow pages: 0 Entries: 3 root@master:~ # mdb_stat -e /var/lib/univention-directory-listener/cache/ Environment Info Map address: (nil) Map size: 2147483648 Page size: 4096 Max pages: 524288 Number of pages used: 398318 Last transaction ID: 5441542 Max readers: 126 Number of readers used: 1 Status of Main DB Tree depth: 1 Branch pages: 0 Leaf pages: 1 Overflow pages: 0 Entries: 2 root@master:~ # ls -alh /var/lib/univention-ldap/ldap/ insgesamt 2,7G drwxr-xr-x 2 openldap openldap 4,0K Sep 14 22:10 . drwxr-xr-x 12 root root 4,0K Feb 23 2019 .. -rw------- 1 openldap openldap 2,7G Sep 16 10:25 data.mdb -rw-r--r-- 1 openldap openldap 445 Apr 29 10:19 DB_CONFIG -rw------- 1 openldap openldap 8,0K Sep 16 10:25 lock.mdb root@master:~ # ls -alh /var/lib/univention-ldap/translog/ insgesamt 2,3G drwxr-xr-x 2 openldap openldap 4,0K Sep 14 22:10 . drwxr-xr-x 12 root root 4,0K Feb 23 2019 .. -rw------- 1 openldap openldap 2,3G Sep 16 10:25 data.mdb -rw-r--r-- 1 openldap openldap 449 Apr 29 10:19 DB_CONFIG -rw------- 1 openldap openldap 8,0K Sep 16 10:25 lock.mdb root@master:~ # ls -alh /var/lib/univention-directory-listener/cache/ insgesamt 1,6G drwx------ 2 listener nogroup 4,0K Nov 14 2017 . drwxr-xr-x 6 listener nogroup 4,0K Sep 16 10:25 .. -rw------- 1 listener nogroup 1,6G Sep 16 10:25 data.mdb -rw------- 1 listener nogroup 8,0K Sep 16 10:25 lock.mdb root@master:~ # du -hs /var/lib/univention-ldap/ldap/ 2,7G /var/lib/univention-ldap/ldap/ Ausgabe von du: 2,7G /var/lib/univention-ldap/ldap 2,3G /var/lib/univention-ldap/translog 1,6G /var/lib/univention-directory-listener/cache/
(In reply to Christian Völker from comment #0) > Customer has increased mdb size on his servers as he has a lot of objects: > root@master:~# ucr set ldap/database/mdb/maxsize=4294967296 Wrong UCRV - that one is for `slapd`. > Now the join of any new slave fails with: > 05.09.19 13:58:41.069 LISTENER ( ERROR ) : > cache.c:469:cache_update_entry_in_transaction mdb_put: failed: MDB_MAP_FULL: > Environment mapsize limit reached (-30792) This is the UDL cache, not the LDAP-DB! (In reply to Christian Völker from comment #4) > Additionally, increase listener/cache/mdb/maxsize That one is the correct UCRV. > root@master:~ # mdb_stat -e /var/lib/univention-directory-listener/cache/ > Environment Info > Map size: 2147483648 > Page size: 4096 > Max pages: 524288 524_288 * 4K -> 2 GiB > Number of pages used: 398318 398_318 * 4K -> 1.6 GiB = filesize Reading <https://www.openldap.org/lists/openldap-technical/201306/msg00098.html> and <https://www.openldap.org/lists/openldap-technical/201306/msg00116.html> we should call `mdb_stat -ef /var/lib/univention-directory-listener/cache/` to also get the freelist information. Listening further to <https://www.infoq.com/presentations/lmdb-lighting-memory-mapped-database/> LMDB does CoW with MVCC and at least keeps the last 2 transactions open, but maybe more if readers are still active. So extra space is needed, especially when doing large transactions. This might lead to MDB_MAP_FULL even if it looks like there is enough free space left. Also reading <https://lmdb.readthedocs.io/en/release/#transaction-management> the DB may grow without limits while a reader is still active. Data is never modified in-place as CoW is done. A long running reader may keep older versions alive, so there might even exist more than the 2 last trees. We should check if UDL (or some other process!) starts a long-running reader which keeps the LMDB from reclaiming its free pages. During a short look at udl/src/cache.c I already found several cases where LMDB-transactions are not close()ed correctly in error cases: mdb_txn_begin() is not followed by mdb_rxn_abort() or mdb_txn_commit(). We should consider converting that (and all the other UDL memory allocations) to __attribute__((cleanup(...))). ... > root@master:~ # ls -alh /var/lib/univention-directory-listener/cache/ > -rw------- 1 listener nogroup 1,6G Sep 16 10:25 data.mdb > -rw------- 1 listener nogroup 8,0K Sep 16 10:25 lock.mdb > > Ausgabe von du: > 1,6G /var/lib/univention-directory-listener/cache/ Hint: you can use `ls -s` (`ls -AgGhs /var/lib/univention-directory-listener/cache/`) to get both the file size and block usage from `ls`. Nit: See attachment 10079 [details] for a request to enhance the documentation.
I think comment #5 is another bug/issue. On this bug here we only want set the mdb maxsize settings from the master on backup/slave during the join. univention-join - bfed54136335e7af8d117da96bdfc59929acc1c8 yaml - e105d56f6ec07eab24d4d298c3b84ab7507bb73c Get the settings from the master (univention-ssh) and set them locally (in the normal UCR scope, not forced or stuff like that).
OK: during join the value of ldap/database/mdb/maxsize and listener/cache/mdb/maxsize are set from the master to the backup/slave OK: if the value is unset on the master the current value on the backup/slave is untouched OK: if the value of ldap/database/mdb/maxsize or listener/cache/mdb/maxsize is set on the backup/slave with --force before the join they will not get overwritten with the value from the master OK: yaml -> verified
<http://errata.software-univention.de/ucs/4.4/296.html>