Bug 49868 - If MDB is full the Notifier may fill up notifier.log eternally with lots of log messages
If MDB is full the Notifier may fill up notifier.log eternally with lots of l...
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Notifier (univention-directory-notifier)
UCS 4.4
Other Linux
: P5 normal (vote)
: UCS 5.0-1-errata
Assigned To: Esteban
Philipp Hahn
https://git.knut.univention.de/univen...
:
: 52716 (view as bug list)
Depends on:
Blocks: 55235
  Show dependency treegraph
 
Reported: 2019-07-17 19:26 CEST by Arvid Requate
Modified: 2022-09-30 10:36 CEST (History)
8 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.400
Enterprise Customer affected?: Yes
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2019071621000544, 2021032221000339, 2021031121000387, 2021041321000229, 2021101821000281
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2019-07-17 19:26:02 CEST
In the case of Ticket#: 2019071621000544 the notifier filled up the notifier.log with over 3000 log entries per second:

============================================================================
17.07.19 18:15:47.622  DEBUG_INIT
17.07.19 19:10:37.044  TRANSFILE   ( ERROR   ) : 124790699 ldap_add(): Other (e.g., implementation specific) error

17.07.19 19:10:44.315  DEBUG_INIT
17.07.19 19:10:46.099  TRANSFILE   ( WARN    ) : 123682201 ldap_add() already exists
17.07.19 19:10:46.100  TRANSFILE   ( WARN    ) : 123682202 ldap_add() already exists
17.07.19 19:10:46.100  TRANSFILE   ( WARN    ) : 123682203 ldap_add() already exists
17.07.19 19:10:46.100  TRANSFILE   ( WARN    ) : 123682204 ldap_add() already exists
17.07.19 19:10:46.101  TRANSFILE   ( WARN    ) : 123682205 ldap_add() already exists
17.07.19 19:10:46.101  TRANSFILE   ( WARN    ) : 123682206 ldap_add() already exists
17.07.19 19:10:46.101  TRANSFILE   ( WARN    ) : 123682207 ldap_add() already exists
17.07.19 19:10:46.102  TRANSFILE   ( WARN    ) : 123682208 ldap_add() already exists
17.07.19 19:10:46.102  TRANSFILE   ( WARN    ) : 123682209 ldap_add() already exists
[...]

17.07.19 19:16:11.553  TRANSFILE   ( WARN    ) : 124790698 ldap_add() already exists
17.07.19 19:16:11.553  TRANSFILE   ( ERROR   ) : 124790699 ldap_add(): Other (e.g., implementation specific) error
17.07.19 19:16:17.092  DEBUG_INIT
17.07.19 19:16:18.907  TRANSFILE   ( WARN    ) : 123682201 ldap_add() already exists
17.07.19 19:16:18.907  TRANSFILE   ( WARN    ) : 123682202 ldap_add() already exists
17.07.19 19:16:18.908  TRANSFILE   ( WARN    ) : 123682203 ldap_add() already exists
17.07.19 19:16:18.908  TRANSFILE   ( WARN    ) : 123682204 ldap_add() already exists
17.07.19 19:16:18.908  TRANSFILE   ( WARN    ) : 123682205 ldap_add() already exists
[...]
============================================================================

I guess the MDB is full again due to all thos transactions (Bug #49867). The Notifier should be adjusted to not fill up the whole disc storage in a short time.
Comment 1 Arvid Requate univentionstaff 2019-07-17 19:44:41 CEST
FYI: This was a UCS Master with ad/member=true.
Comment 2 Christina Scheinig univentionstaff 2021-03-24 11:30:39 CET
A customer has this issue on two different backup servers. Cause was again, that the translog database was full.
After I increased the ldap/database/mdb/maxsize and restarted slapd and listener, the notifier starts to fill up the notifier.log till the filesystem filled up to 100%. The logfile increased to 49GB


The logmessages:

24.03.21 11:16:00.877  TRANSFILE   ( WARN    ) : 29261397 ldap_add() already exists
24.03.21 11:16:00.877  TRANSFILE   ( WARN    ) : 29261398 ldap_add() already exists
24.03.21 11:16:00.877  TRANSFILE   ( WARN    ) : 29261399 ldap_add() already exists
24.03.21 11:16:00.877  TRANSFILE   ( WARN    ) : 29261400 ldap_add() already exists
24.03.21 11:16:00.878  TRANSFILE   ( WARN    ) : 29261401 ldap_add() already exists
24.03.21 11:16:00.878  TRANSFILE   ( WARN    ) : 29261402 ldap_add() already exists
24.03.21 11:16:00.878  TRANSFILE   ( WARN    ) : 29261403 ldap_add() already exists
Comment 3 Christina Scheinig univentionstaff 2021-03-24 11:41:44 CET
The customer has 3 Backupservers, and on all of them the translog database was full, and the notifier fills up with his warnings
Comment 4 Christina Scheinig univentionstaff 2021-03-24 14:52:51 CET
(In reply to Arvid Requate from comment #0)
> In the case of Ticket#: 2019071621000544 the notifier filled up the
> notifier.log with over 3000 log entries per second:
> 
> ============================================================================
> 17.07.19 18:15:47.622  DEBUG_INIT
> 17.07.19 19:10:37.044  TRANSFILE   ( ERROR   ) : 124790699 ldap_add(): Other
> (e.g., implementation specific) error
> 
> 17.07.19 19:10:44.315  DEBUG_INIT
> 17.07.19 19:10:46.099  TRANSFILE   ( WARN    ) : 123682201 ldap_add()
> already exists
> 17.07.19 19:10:46.100  TRANSFILE   ( WARN    ) : 123682202 ldap_add()
> already exists
> 17.07.19 19:10:46.100  TRANSFILE   ( WARN    ) : 123682203 ldap_add()
> already exists
> 17.07.19 19:10:46.100  TRANSFILE   ( WARN    ) : 123682204 ldap_add()
> already exists
> 17.07.19 19:10:46.101  TRANSFILE   ( WARN    ) : 123682205 ldap_add()
> already exists
> 17.07.19 19:10:46.101  TRANSFILE   ( WARN    ) : 123682206 ldap_add()
> already exists
> 17.07.19 19:10:46.101  TRANSFILE   ( WARN    ) : 123682207 ldap_add()
> already exists
> 17.07.19 19:10:46.102  TRANSFILE   ( WARN    ) : 123682208 ldap_add()
> already exists
> 17.07.19 19:10:46.102  TRANSFILE   ( WARN    ) : 123682209 ldap_add()
> already exists
> [...]
> 
> 17.07.19 19:16:11.553  TRANSFILE   ( WARN    ) : 124790698 ldap_add()
> already exists
> 17.07.19 19:16:11.553  TRANSFILE   ( ERROR   ) : 124790699 ldap_add(): Other
> (e.g., implementation specific) error
> 17.07.19 19:16:17.092  DEBUG_INIT
> 17.07.19 19:16:18.907  TRANSFILE   ( WARN    ) : 123682201 ldap_add()
> already exists
> 17.07.19 19:16:18.907  TRANSFILE   ( WARN    ) : 123682202 ldap_add()
> already exists
> 17.07.19 19:16:18.908  TRANSFILE   ( WARN    ) : 123682203 ldap_add()
> already exists
> 17.07.19 19:16:18.908  TRANSFILE   ( WARN    ) : 123682204 ldap_add()
> already exists
> 17.07.19 19:16:18.908  TRANSFILE   ( WARN    ) : 123682205 ldap_add()
> already exists
> [...]
> ============================================================================
> 
> I guess the MDB is full again due to all thos transactions (Bug #49867). The
> Notifier should be adjusted to not fill up the whole disc storage in a short
> time.

And so exactly now this happens on the backup servers, the translog DB is full again.
Comment 6 Christina Scheinig univentionstaff 2021-04-14 15:20:52 CEST
Maybe as a short addition.
In a customer environment, not just the notifier.log filled up, but also the transactions are attached in the listener/listener file, so after the database was free again, all the repeating transactions are filled in the ldap.

univention-app info
UCS: 4.4-7 errata850
Installed: cups=2.2.1 dhcp-server=12.0 kde=5.8 samba4=4.10 squid=3.5 ucsschool=4.4 v8
Upgradable: ucsschool
Comment 7 Arvid Requate univentionstaff 2021-07-13 21:03:20 CEST
Today we had a similar situation at Ticket #2021071221000191, where the cn=translog mdb file was full and after pruning entries from it and restarting the notifier they were added again. Where did they come from? While /var/lib/univention-ldap/listener/listener only contained four entries that the notifier had not processed yet, the notifier was reading 19M entries from /var/lib/univention-ldap/listener/listener.priv , and adding them one by one to cn=translog. The origin of the entries is not clear yet, but it may have been an ADC eternally attempting and failing to create a some objects in UDM (Bug #49867).

But this is just an FYI, we should fokus this bug on taming the log messages.
Comment 9 Philipp Hahn univentionstaff 2021-12-17 10:46:27 CET
The WARNING is there for a reason: AN entry already exists; the code could check that it already matches the expectation and then only issues a INFO mesage.

I think the real problem is in our process documentation: we should check which transaction was last imported into cn=translog SUCCESSFULLY and update UDNs internal starte to start from there: that would save time and also would prevent this condition from occurring.
Comment 10 Philipp Hahn univentionstaff 2022-01-17 13:30:45 CET
Changing the log level will not fix anything, but will only hide the underlying problem, which is this:

U@S for example stops UDN to improve the performance while importing student records: Only after importing all records on the Primary is UDN turned back on to start replicating the imported changes to all other systems in the background.

Looking as the log output we see the 1st transaction to be 123682201.
The transaction last process before UDN is abort()ed is 124790699.

So '/var/lib/univention-ldap/listener/listener.priv' contains of at least 1.108.498 transactions, each line probably requiring ~100 characters. The file is probably ~1 GiB minimum.

src/notify.c:713 notify_listener_change_callback() reads the complete file into memory and calls split_transaction_buffer() on that large buffer (duplicating the required memory).
Then *all* of these >1e7 transactions are passed to notify_dump_to_ldap() to be appended *at once*. After processing ~1e7 transactions and finding them already imported UDN abort()s itself because that "Other (e.g., implementation specific) error".
systemd restarts UDN, UDN again reads the same list again and retries all transactions again.
This process repeats until the disk is completely filled up.

'listener/listener.priv' only gets deleted if *all* transactions would have been added to 'cn=translog' and appended to 'notify/transactions' successfully.

I propose processing 'listener/listener.priv' in smaller chunks, preferable *line-by-line*, and just skip all transactions which are <notify_last_id as they already have been appended successfully to 'notify/transactions' and 'cn=translog'.
Comment 11 Esteban univentionstaff 2022-02-21 10:00:40 CET
We are now processing 'listener/listener.priv' in smaller chunks, *line-by-line*, and just skipping all transactions which are < notify_last_id that have already been appended to 'notify/transactions' and 'cn=translog'.

univention-directory-notifier.yaml
0441057a0071 | Bug #49868 YAML
066c8244f98d | Bug #49868: Updated doc/staging

univention-directory-notifier (14.0.5-8)
bcf26bc422d2 | Bug #49868: Updated changelog

univention-directory-notifier (14.0.5-7)
efcc3e9e152e | style[UDN]: Remove empty lines
d503f7981230 | fix[UDN]: dynamic client memory leak
cf36313c708b | fix[UDN]: static client memory leaks
69fe615c1327 | fix[UDN]: static cache memory leaks
3e3397cb73fe | style[UDN]: Remove empty debug lines
1be96628bed2 | feat[UDN]: Debug to STDERR
916fef674d15 | feat[UDN]: Add signal handlers
3e755bad3bd9 | feat[UDN]: Re-enable security hardening
d197f61fe5e1 | refactor[UDN]: Remove duplicate debug line
022cc5445226 | refactor[UDN]: Extract validation code
331dd492c94e | refactor[UDN]: Remove unneeded NULL check
d6c60dc2d9dd | fix[UDN]: Free memory in error case
4b24bd9bcc9d | refactor[UDN]: Remove NULLing
432e61aa8ea9 | style[UDN]: Remove unneeded return
cd9229e1a845 | fix[UDN]: processing 'listener/listener.priv' in smaller chunks
Comment 12 Florian Best univentionstaff 2022-02-21 10:10:55 CET
*** Bug 52716 has been marked as a duplicate of this bug. ***
Comment 13 Philipp Hahn univentionstaff 2022-02-22 12:44:47 CET
OK: Code review
OK: apt-get install -t apt univention-directory-notifier
OK: valgrind
OK: /var/log/univention/notifier.log
OK: /var/log/univention/listener.log
OK: journalctl -u univention-directory-notifier.service
OK: Jenkins
OK: gdb /usr/sbin/univention-directory-notifier + slapd stop
OK: errata-announce  -V --only univention-directory-notifier.yaml
FIXED: univention-directory-notifier.yaml → 2a3cd10023