Bug 50256 - listener mdb database handling incorrect
listener mdb database handling incorrect
Status: REOPENED
Product: UCS
Classification: Unclassified
Component: Listener (univention-directory-listener)
UCS 4.4
Other Linux
: P2 major (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-09-24 17:34 CEST by Felix Botner
Modified: 2022-10-20 10:04 CEST (History)
4 users (show)

See Also:
What kind of report is it?: Development Internal
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
bug50256.patch (801 bytes, patch)
2021-12-06 17:59 CET, Arvid Requate
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Botner univentionstaff 2019-09-24 17:34:37 CEST
from Bug #50114, comment 5

Reading <https://www.openldap.org/lists/openldap-technical/201306/msg00098.html> and <https://www.openldap.org/lists/openldap-technical/201306/msg00116.html> we should call `mdb_stat -ef /var/lib/univention-directory-listener/cache/` to also get the freelist information.

Listening further to <https://www.infoq.com/presentations/lmdb-lighting-memory-mapped-database/> LMDB does CoW with MVCC and at least keeps the last 2 transactions open, but maybe more if readers are still active. So extra space is needed, especially when doing large transactions. This might lead to MDB_MAP_FULL even if it looks like there is enough free space left.

Also reading <https://lmdb.readthedocs.io/en/release/#transaction-management> the DB may grow without limits while a reader is still active. Data is never modified in-place as CoW is done. A long running reader may keep older versions alive, so there might even exist more than the 2 last trees.

We should check if UDL (or some other process!) starts a long-running reader which keeps the LMDB from reclaiming its free pages.

During a short look at udl/src/cache.c I already found several cases where LMDB-transactions are not close()ed correctly in error cases:
  mdb_txn_begin() is not followed by mdb_rxn_abort() or mdb_txn_commit().
We should consider converting that (and all the other UDL memory allocations) to __attribute__((cleanup(...))).

...
> root@master:~ # ls -alh /var/lib/univention-directory-listener/cache/
> -rw------- 1 listener nogroup 1,6G Sep 16 10:25 data.mdb
> -rw------- 1 listener nogroup 8,0K Sep 16 10:25 lock.mdb
> 
> Ausgabe von du:
> 1,6G    /var/lib/univention-directory-listener/cache/

Hint: you can use `ls -s` (`ls -AgGhs /var/lib/univention-directory-listener/cache/`) to get both the file size and block usage from `ls`.

Nit: See attachment 10079 [details] for a request to enhance the documentation.
Comment 1 Arvid Requate univentionstaff 2021-12-06 17:59:21 CET
Created attachment 10870 [details]
bug50256.patch

> During a short look at udl/src/cache.c I already found several cases where LMDB-transactions are not close()ed correctly in error cases:
>   mdb_txn_begin() is not followed by mdb_rxn_abort() or mdb_txn_commit().

The attached patch fixes two locations. Please note that the transaction opened in cache_first_entry gets closed later by calling cache_free_cursor from change_init_module.
Comment 2 Oliver Friedrich univentionstaff 2022-10-19 16:25:16 CEST
As this is going to be critical in big environments, I raised the importance of this.