Bug 50114 - MDB Maxsize too small during Join
MDB Maxsize too small during Join
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Join (univention-join)
UCS 4.3
Other Linux
: P5 normal (vote)
: UCS 4.4-2-errata
Assigned To: Felix Botner
Johannes Keiser
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-09-05 16:19 CEST by Christian Völker
Modified: 2019-10-02 15:55 CEST (History)
4 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 6: Setup Problem: Issue for the setup process
Who will be affected by this bug?: 3: Will affect average number of installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.514
Enterprise Customer affected?: Yes
School Customer affected?: Yes
ISV affected?:
Waiting Support: Yes
Flags outvoted (downgraded) after PO Review:
Ticket number: 2019090521000874
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Völker univentionstaff 2019-09-05 16:19:05 CEST
Customer has increased mdb size on his servers as he has a lot of objects:
root@master:~# ucr set ldap/database/mdb/maxsize=4294967296

Now the join of any new slave fails with:
05.09.19 13:58:41.069  LISTENER    ( ERROR   ) : cache.c:469:cache_update_entry_in_transaction mdb_put: failed: MDB_MAP_FULL: Environment mapsize limit reached (-30792)


Reason is the default value of MDB after installation which has not been increased:
root@slave:~# ucr get ldap/database/mdb/maxsize
2147483648


We should somehow get the database size from the master and set the local size accordingly to improve the join.

Setting the value manually after installation and joining afterwards is not a suitable way as this extremely slows down installation of new systems!
Comment 1 Christian Völker univentionstaff 2019-09-06 07:52:28 CEST
For the customer the "workaround" (Install w/o join. Set UCRV. Reboot. Join) is not suitable as it hinders the automated setup procedure for EVERY school slave he is installing.
Comment 2 Ingo Steuwer univentionstaff 2019-09-10 09:43:58 CEST
UCR settings also be predefined by UCR policies in LDAP. Does a policy apply early enough in the join to be considered during the initial LDAP replication?
Comment 3 Felix Botner univentionstaff 2019-09-10 09:58:39 CEST
There are several settings we already get from the master (ssh ucr) and set in the local ucs db (windows/domain). I see no reason why we couldn't do that for the LDAP db settings.
Comment 4 Christian Völker univentionstaff 2019-09-18 09:46:24 CEST
Additionally, increase listener/cache/mdb/maxsize

This happens in large customer environment (currently around 55.000 users):
======================================================
root@master:~ # mdb_stat -e /var/lib/univention-ldap/ldap/
Environment Info
  Map address: (nil)
  Map size: 4294967296
  Page size: 4096
  Max pages: 1048576
  Number of pages used: 696260
  Last transaction ID: 4033816
  Max readers: 126
  Number of readers used: 16
Status of Main DB
  Tree depth: 2
  Branch pages: 1
  Leaf pages: 3
  Overflow pages: 0
  Entries: 87

root@master:~ # mdb_stat -e /var/lib/univention-ldap/translog/
Environment Info
  Map address: (nil)
  Map size: 4294967296
  Page size: 4096
  Max pages: 1048576
  Number of pages used: 589316
  Last transaction ID: 2515346
  Max readers: 126
  Number of readers used: 15
Status of Main DB
  Tree depth: 1
  Branch pages: 0
  Leaf pages: 1
  Overflow pages: 0
  Entries: 3

root@master:~ # mdb_stat -e /var/lib/univention-directory-listener/cache/
Environment Info
  Map address: (nil)
  Map size: 2147483648
  Page size: 4096
  Max pages: 524288
  Number of pages used: 398318
  Last transaction ID: 5441542
  Max readers: 126
  Number of readers used: 1
Status of Main DB
  Tree depth: 1
  Branch pages: 0
  Leaf pages: 1
  Overflow pages: 0
  Entries: 2


root@master:~ # ls -alh /var/lib/univention-ldap/ldap/
insgesamt 2,7G
drwxr-xr-x  2 openldap openldap 4,0K Sep 14 22:10 .
drwxr-xr-x 12 root     root     4,0K Feb 23  2019 ..
-rw-------  1 openldap openldap 2,7G Sep 16 10:25 data.mdb
-rw-r--r--  1 openldap openldap  445 Apr 29 10:19 DB_CONFIG
-rw-------  1 openldap openldap 8,0K Sep 16 10:25 lock.mdb

root@master:~ # ls -alh /var/lib/univention-ldap/translog/
insgesamt 2,3G
drwxr-xr-x  2 openldap openldap 4,0K Sep 14 22:10 .
drwxr-xr-x 12 root     root     4,0K Feb 23  2019 ..
-rw-------  1 openldap openldap 2,3G Sep 16 10:25 data.mdb
-rw-r--r--  1 openldap openldap  449 Apr 29 10:19 DB_CONFIG
-rw-------  1 openldap openldap 8,0K Sep 16 10:25 lock.mdb

root@master:~ # ls -alh /var/lib/univention-directory-listener/cache/
insgesamt 1,6G
drwx------ 2 listener nogroup 4,0K Nov 14  2017 .
drwxr-xr-x 6 listener nogroup 4,0K Sep 16 10:25 ..
-rw------- 1 listener nogroup 1,6G Sep 16 10:25 data.mdb
-rw------- 1 listener nogroup 8,0K Sep 16 10:25 lock.mdb

root@master:~ # du -hs /var/lib/univention-ldap/ldap/
2,7G    /var/lib/univention-ldap/ldap/

Ausgabe von du:
2,7G    /var/lib/univention-ldap/ldap
2,3G    /var/lib/univention-ldap/translog
1,6G    /var/lib/univention-directory-listener/cache/
Comment 5 Philipp Hahn univentionstaff 2019-09-18 13:25:04 CEST
(In reply to Christian Völker from comment #0)
> Customer has increased mdb size on his servers as he has a lot of objects:
> root@master:~# ucr set ldap/database/mdb/maxsize=4294967296

Wrong UCRV - that one is for `slapd`.

> Now the join of any new slave fails with:
> 05.09.19 13:58:41.069  LISTENER    ( ERROR   ) :
> cache.c:469:cache_update_entry_in_transaction mdb_put: failed: MDB_MAP_FULL:
> Environment mapsize limit reached (-30792)

This is the UDL cache, not the LDAP-DB!

(In reply to Christian Völker from comment #4)
> Additionally, increase listener/cache/mdb/maxsize

That one is the correct UCRV.

> root@master:~ # mdb_stat -e /var/lib/univention-directory-listener/cache/
> Environment Info
>   Map size: 2147483648
>   Page size: 4096
>   Max pages: 524288
524_288 * 4K -> 2 GiB
>   Number of pages used: 398318
398_318 * 4K -> 1.6 GiB = filesize

Reading <https://www.openldap.org/lists/openldap-technical/201306/msg00098.html> and <https://www.openldap.org/lists/openldap-technical/201306/msg00116.html> we should call `mdb_stat -ef /var/lib/univention-directory-listener/cache/` to also get the freelist information.

Listening further to <https://www.infoq.com/presentations/lmdb-lighting-memory-mapped-database/> LMDB does CoW with MVCC and at least keeps the last 2 transactions open, but maybe more if readers are still active. So extra space is needed, especially when doing large transactions. This might lead to MDB_MAP_FULL even if it looks like there is enough free space left.

Also reading <https://lmdb.readthedocs.io/en/release/#transaction-management> the DB may grow without limits while a reader is still active. Data is never modified in-place as CoW is done. A long running reader may keep older versions alive, so there might even exist more than the 2 last trees.

We should check if UDL (or some other process!) starts a long-running reader which keeps the LMDB from reclaiming its free pages.

During a short look at udl/src/cache.c I already found several cases where LMDB-transactions are not close()ed correctly in error cases:
  mdb_txn_begin() is not followed by mdb_rxn_abort() or mdb_txn_commit().
We should consider converting that (and all the other UDL memory allocations) to __attribute__((cleanup(...))).

...
> root@master:~ # ls -alh /var/lib/univention-directory-listener/cache/
> -rw------- 1 listener nogroup 1,6G Sep 16 10:25 data.mdb
> -rw------- 1 listener nogroup 8,0K Sep 16 10:25 lock.mdb
> 
> Ausgabe von du:
> 1,6G    /var/lib/univention-directory-listener/cache/

Hint: you can use `ls -s` (`ls -AgGhs /var/lib/univention-directory-listener/cache/`) to get both the file size and block usage from `ls`.

Nit: See attachment 10079 [details] for a request to enhance the documentation.
Comment 6 Felix Botner univentionstaff 2019-09-24 17:36:28 CEST
I think comment #5 is another bug/issue. On this bug here we only want set the mdb maxsize settings from the master on backup/slave during the join.

univention-join - bfed54136335e7af8d117da96bdfc59929acc1c8
yaml - e105d56f6ec07eab24d4d298c3b84ab7507bb73c

Get the settings from the master (univention-ssh) and set them locally (in the normal UCR scope, not forced or stuff like that).
Comment 7 Johannes Keiser univentionstaff 2019-10-02 11:23:28 CEST
OK: during join the value of
ldap/database/mdb/maxsize and listener/cache/mdb/maxsize
are set from the master to the backup/slave

OK: if the value is unset on the master the current value on the backup/slave
is untouched

OK: if the value of
ldap/database/mdb/maxsize or listener/cache/mdb/maxsize
is set on the backup/slave with --force before the join
they will not get overwritten with the value from the master

OK: yaml
-> verified
Comment 8 Erik Damrose univentionstaff 2019-10-02 15:55:01 CEST
<http://errata.software-univention.de/ucs/4.4/296.html>