Univention Bugzilla – Bug 49868
If MDB is full the Notifier may fill up notifier.log eternally with lots of log messages
Last modified: 2022-09-30 10:36:03 CEST
In the case of Ticket#: 2019071621000544 the notifier filled up the notifier.log with over 3000 log entries per second: ============================================================================ 17.07.19 18:15:47.622 DEBUG_INIT 17.07.19 19:10:37.044 TRANSFILE ( ERROR ) : 124790699 ldap_add(): Other (e.g., implementation specific) error 17.07.19 19:10:44.315 DEBUG_INIT 17.07.19 19:10:46.099 TRANSFILE ( WARN ) : 123682201 ldap_add() already exists 17.07.19 19:10:46.100 TRANSFILE ( WARN ) : 123682202 ldap_add() already exists 17.07.19 19:10:46.100 TRANSFILE ( WARN ) : 123682203 ldap_add() already exists 17.07.19 19:10:46.100 TRANSFILE ( WARN ) : 123682204 ldap_add() already exists 17.07.19 19:10:46.101 TRANSFILE ( WARN ) : 123682205 ldap_add() already exists 17.07.19 19:10:46.101 TRANSFILE ( WARN ) : 123682206 ldap_add() already exists 17.07.19 19:10:46.101 TRANSFILE ( WARN ) : 123682207 ldap_add() already exists 17.07.19 19:10:46.102 TRANSFILE ( WARN ) : 123682208 ldap_add() already exists 17.07.19 19:10:46.102 TRANSFILE ( WARN ) : 123682209 ldap_add() already exists [...] 17.07.19 19:16:11.553 TRANSFILE ( WARN ) : 124790698 ldap_add() already exists 17.07.19 19:16:11.553 TRANSFILE ( ERROR ) : 124790699 ldap_add(): Other (e.g., implementation specific) error 17.07.19 19:16:17.092 DEBUG_INIT 17.07.19 19:16:18.907 TRANSFILE ( WARN ) : 123682201 ldap_add() already exists 17.07.19 19:16:18.907 TRANSFILE ( WARN ) : 123682202 ldap_add() already exists 17.07.19 19:16:18.908 TRANSFILE ( WARN ) : 123682203 ldap_add() already exists 17.07.19 19:16:18.908 TRANSFILE ( WARN ) : 123682204 ldap_add() already exists 17.07.19 19:16:18.908 TRANSFILE ( WARN ) : 123682205 ldap_add() already exists [...] ============================================================================ I guess the MDB is full again due to all thos transactions (Bug #49867). The Notifier should be adjusted to not fill up the whole disc storage in a short time.
FYI: This was a UCS Master with ad/member=true.
A customer has this issue on two different backup servers. Cause was again, that the translog database was full. After I increased the ldap/database/mdb/maxsize and restarted slapd and listener, the notifier starts to fill up the notifier.log till the filesystem filled up to 100%. The logfile increased to 49GB The logmessages: 24.03.21 11:16:00.877 TRANSFILE ( WARN ) : 29261397 ldap_add() already exists 24.03.21 11:16:00.877 TRANSFILE ( WARN ) : 29261398 ldap_add() already exists 24.03.21 11:16:00.877 TRANSFILE ( WARN ) : 29261399 ldap_add() already exists 24.03.21 11:16:00.877 TRANSFILE ( WARN ) : 29261400 ldap_add() already exists 24.03.21 11:16:00.878 TRANSFILE ( WARN ) : 29261401 ldap_add() already exists 24.03.21 11:16:00.878 TRANSFILE ( WARN ) : 29261402 ldap_add() already exists 24.03.21 11:16:00.878 TRANSFILE ( WARN ) : 29261403 ldap_add() already exists
The customer has 3 Backupservers, and on all of them the translog database was full, and the notifier fills up with his warnings
(In reply to Arvid Requate from comment #0) > In the case of Ticket#: 2019071621000544 the notifier filled up the > notifier.log with over 3000 log entries per second: > > ============================================================================ > 17.07.19 18:15:47.622 DEBUG_INIT > 17.07.19 19:10:37.044 TRANSFILE ( ERROR ) : 124790699 ldap_add(): Other > (e.g., implementation specific) error > > 17.07.19 19:10:44.315 DEBUG_INIT > 17.07.19 19:10:46.099 TRANSFILE ( WARN ) : 123682201 ldap_add() > already exists > 17.07.19 19:10:46.100 TRANSFILE ( WARN ) : 123682202 ldap_add() > already exists > 17.07.19 19:10:46.100 TRANSFILE ( WARN ) : 123682203 ldap_add() > already exists > 17.07.19 19:10:46.100 TRANSFILE ( WARN ) : 123682204 ldap_add() > already exists > 17.07.19 19:10:46.101 TRANSFILE ( WARN ) : 123682205 ldap_add() > already exists > 17.07.19 19:10:46.101 TRANSFILE ( WARN ) : 123682206 ldap_add() > already exists > 17.07.19 19:10:46.101 TRANSFILE ( WARN ) : 123682207 ldap_add() > already exists > 17.07.19 19:10:46.102 TRANSFILE ( WARN ) : 123682208 ldap_add() > already exists > 17.07.19 19:10:46.102 TRANSFILE ( WARN ) : 123682209 ldap_add() > already exists > [...] > > 17.07.19 19:16:11.553 TRANSFILE ( WARN ) : 124790698 ldap_add() > already exists > 17.07.19 19:16:11.553 TRANSFILE ( ERROR ) : 124790699 ldap_add(): Other > (e.g., implementation specific) error > 17.07.19 19:16:17.092 DEBUG_INIT > 17.07.19 19:16:18.907 TRANSFILE ( WARN ) : 123682201 ldap_add() > already exists > 17.07.19 19:16:18.907 TRANSFILE ( WARN ) : 123682202 ldap_add() > already exists > 17.07.19 19:16:18.908 TRANSFILE ( WARN ) : 123682203 ldap_add() > already exists > 17.07.19 19:16:18.908 TRANSFILE ( WARN ) : 123682204 ldap_add() > already exists > 17.07.19 19:16:18.908 TRANSFILE ( WARN ) : 123682205 ldap_add() > already exists > [...] > ============================================================================ > > I guess the MDB is full again due to all thos transactions (Bug #49867). The > Notifier should be adjusted to not fill up the whole disc storage in a short > time. And so exactly now this happens on the backup servers, the translog DB is full again.
Maybe as a short addition. In a customer environment, not just the notifier.log filled up, but also the transactions are attached in the listener/listener file, so after the database was free again, all the repeating transactions are filled in the ldap. univention-app info UCS: 4.4-7 errata850 Installed: cups=2.2.1 dhcp-server=12.0 kde=5.8 samba4=4.10 squid=3.5 ucsschool=4.4 v8 Upgradable: ucsschool
Today we had a similar situation at Ticket #2021071221000191, where the cn=translog mdb file was full and after pruning entries from it and restarting the notifier they were added again. Where did they come from? While /var/lib/univention-ldap/listener/listener only contained four entries that the notifier had not processed yet, the notifier was reading 19M entries from /var/lib/univention-ldap/listener/listener.priv , and adding them one by one to cn=translog. The origin of the entries is not clear yet, but it may have been an ADC eternally attempting and failing to create a some objects in UDM (Bug #49867). But this is just an FYI, we should fokus this bug on taming the log messages.
The WARNING is there for a reason: AN entry already exists; the code could check that it already matches the expectation and then only issues a INFO mesage. I think the real problem is in our process documentation: we should check which transaction was last imported into cn=translog SUCCESSFULLY and update UDNs internal starte to start from there: that would save time and also would prevent this condition from occurring.
Changing the log level will not fix anything, but will only hide the underlying problem, which is this: U@S for example stops UDN to improve the performance while importing student records: Only after importing all records on the Primary is UDN turned back on to start replicating the imported changes to all other systems in the background. Looking as the log output we see the 1st transaction to be 123682201. The transaction last process before UDN is abort()ed is 124790699. So '/var/lib/univention-ldap/listener/listener.priv' contains of at least 1.108.498 transactions, each line probably requiring ~100 characters. The file is probably ~1 GiB minimum. src/notify.c:713 notify_listener_change_callback() reads the complete file into memory and calls split_transaction_buffer() on that large buffer (duplicating the required memory). Then *all* of these >1e7 transactions are passed to notify_dump_to_ldap() to be appended *at once*. After processing ~1e7 transactions and finding them already imported UDN abort()s itself because that "Other (e.g., implementation specific) error". systemd restarts UDN, UDN again reads the same list again and retries all transactions again. This process repeats until the disk is completely filled up. 'listener/listener.priv' only gets deleted if *all* transactions would have been added to 'cn=translog' and appended to 'notify/transactions' successfully. I propose processing 'listener/listener.priv' in smaller chunks, preferable *line-by-line*, and just skip all transactions which are <notify_last_id as they already have been appended successfully to 'notify/transactions' and 'cn=translog'.
We are now processing 'listener/listener.priv' in smaller chunks, *line-by-line*, and just skipping all transactions which are < notify_last_id that have already been appended to 'notify/transactions' and 'cn=translog'. univention-directory-notifier.yaml 0441057a0071 | Bug #49868 YAML 066c8244f98d | Bug #49868: Updated doc/staging univention-directory-notifier (14.0.5-8) bcf26bc422d2 | Bug #49868: Updated changelog univention-directory-notifier (14.0.5-7) efcc3e9e152e | style[UDN]: Remove empty lines d503f7981230 | fix[UDN]: dynamic client memory leak cf36313c708b | fix[UDN]: static client memory leaks 69fe615c1327 | fix[UDN]: static cache memory leaks 3e3397cb73fe | style[UDN]: Remove empty debug lines 1be96628bed2 | feat[UDN]: Debug to STDERR 916fef674d15 | feat[UDN]: Add signal handlers 3e755bad3bd9 | feat[UDN]: Re-enable security hardening d197f61fe5e1 | refactor[UDN]: Remove duplicate debug line 022cc5445226 | refactor[UDN]: Extract validation code 331dd492c94e | refactor[UDN]: Remove unneeded NULL check d6c60dc2d9dd | fix[UDN]: Free memory in error case 4b24bd9bcc9d | refactor[UDN]: Remove NULLing 432e61aa8ea9 | style[UDN]: Remove unneeded return cd9229e1a845 | fix[UDN]: processing 'listener/listener.priv' in smaller chunks
*** Bug 52716 has been marked as a duplicate of this bug. ***
OK: Code review OK: apt-get install -t apt univention-directory-notifier OK: valgrind OK: /var/log/univention/notifier.log OK: /var/log/univention/listener.log OK: journalctl -u univention-directory-notifier.service OK: Jenkins OK: gdb /usr/sbin/univention-directory-notifier + slapd stop OK: errata-announce -V --only univention-directory-notifier.yaml FIXED: univention-directory-notifier.yaml → 2a3cd10023
<https://errata.software-univention.de/#/?erratum=5.0x228>