Univention Bugzilla – Bug 56366
univention-translog does not work properly for big ldap environments
Last modified: 2023-08-18 09:09:24 CEST
The translog pruning script in /usr/share/univention-directory-notifier/univention-translog will not work for the customer because they have more than 40 million entries in translog and the script has an ldapsearch with this filter: 1266 filterstr = '(reqSession=*)' The customer also noticed that this part of the code will not purge the ldap when the translog file is already purge. This case happen as the prune_ldap function failed due to the ldapsearch. 1234 if opt.trans <= translog.first: 1235 log.fatal('Already purged.') 1236 raise Abort() 1237 if opt.trans >= translog.last: 1238 log.fatal('Nothing to purge.') 1239 raise Abort() 1240 assert translog.first < opt.trans < translog.last The customer suggests that: - You modify the prune_ldap function to loop over the transaction id and search for a single transaction and then delete it when found. Since the transaction ids are sequential numbers, you will decrement the tid after each iteration. You will exit the loop when the search does not return any result. You could also loop over the transaction ids and delete them while decrementing the tid. You will stop on the first deletion error which mean you've reached the oldest transaction in the translog. - You modify the prune-file function to continue when there is nothing to do. You could also make sure the prune_ldap always run after the prune_file. Could you add an option to the script so it will prune transaction that are older than some number of day. e.g: prune the transactions that are older than 40 days. Means keep last 40 days transactions. Please take these suggestion into account and redeliver the script to us.
(In reply to Christina Scheinig from comment #0) > The translog pruning script in > /usr/share/univention-directory-notifier/univention-translog will not work > for the customer because they have more than 40 million entries in translog Please be more specific as the script has no size limit by design. > and the script has an ldapsearch with this filter: > 1266 filterstr = '(reqSession=*)' Which is not a problem as the search uses paging. > The customer also noticed that this part of the code will not purge the ldap > when the translog file is already purge. Have you read --help: > 1370 parser_prune.add_argument("trans", metavar="TID", type=int, help="Oldest transaction number to keep (negative numbers: the number of transactions to keep)") The script is idempotent: All TIDs before the specified TID are purged. If they are already purged there is noting to do. > This case happen as the prune_ldap function failed due to the ldapsearch. > > 1234 if opt.trans <= translog.first: > 1235 log.fatal('Already purged.') > 1236 raise Abort() > 1237 if opt.trans >= translog.last: > 1238 log.fatal('Nothing to purge.') > 1239 raise Abort() > 1240 assert translog.first < opt.trans < translog.last That code is from `prune_file` and has nothing to do with `ldapsearch`. > The customer suggests that: <suggestions removed> will not work and break as soon the LDAP is inconsistent with the file. > Could you add an option to the script so it will prune transaction that are > older than some number of day. Impossible as there are no timestamp for TIDs, neither in the file nor in LDAP.
The customer gave some new information: he says, they run the script and the first time it was stuck because the ldapsearch in the code was taking time. The second time it exited with the error message 'Already purged.' 'Nothing to purge.' The two cases are reproducible. 1) Put Millions of entry in the translog ldap database and run the prune 2) Kill the process and rerun it with same args.
What exact commands are executed? What is the reason to kill the script execution, what is the system state?
(In reply to Christina Scheinig from comment #3) > The customer gave some new information: > he says, they run the script and the first time it was stuck because the > ldapsearch in the code was taking time. > The second time it exited with the error message > 'Already purged.' > 'Nothing to purge.' By killing the script you made you system state inconsistent! Now they're complaining that your system state is inconsistent? UCS 4.4-9 is using an old version of `univention-translog`, while [UCS 5.0-0+e169](https://errata.software-univention.de/#/?erratum=5.0x169) contains an improved version, which is now idempotent. The most important change is [Make univention-translog prune idempotent](https://git.knut.univention.de/univention/ucs/-/commit/13d04059989b16d96ebe4144945739e4d9ce8c9a), which changes those 2 `raise Abort()` into a `return`, so `prune_ldap()` is still called even when `prune_file()` would abort otherwise. sed -e '1236,1239s/raise Abort()/return/' -i /usr/share/univention-directory-notifier/univention-translog