Bug 56366 - univention-translog does not work properly for big ldap environments
univention-translog does not work properly for big ldap environments
Status: NEW
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 4.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on: 53355
Blocks:
  Show dependency treegraph
 
Reported: 2023-07-28 15:15 CEST by Christina Scheinig
Modified: 2023-08-18 09:09 CEST (History)
4 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.143
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support: Yes
Flags outvoted (downgraded) after PO Review:
Ticket number: 2023072821000175
Bug group (optional):
Max CVSS v3 score:
hahn: Patch_Available+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christina Scheinig univentionstaff 2023-07-28 15:15:08 CEST
The translog pruning script in
/usr/share/univention-directory-notifier/univention-translog will not work for the customer because they have more than 40 million entries in translog and the script has an ldapsearch with
this filter:

1266 filterstr = '(reqSession=*)'

The customer also noticed that this part of the code will not purge the ldap when the translog file is already purge.

This case happen as the prune_ldap function failed due to the ldapsearch.

1234        if opt.trans <= translog.first:
1235            log.fatal('Already purged.')
1236            raise Abort()
1237        if opt.trans >= translog.last:
1238            log.fatal('Nothing to purge.')
1239            raise Abort()
1240        assert translog.first < opt.trans < translog.last


The customer suggests that:

- You modify the prune_ldap function to loop over the transaction id and search for a
single transaction and then delete it when found.
Since the transaction ids are sequential numbers, you will decrement the tid after each
iteration.
You will exit the loop when the search does not return any result.

You could also loop over the transaction ids and delete them while decrementing the tid.

You will stop on the first deletion error which mean
you've reached the oldest transaction in the translog.

- You modify the prune-file function to continue when there is nothing to do.
You could also make sure the prune_ldap always run after the prune_file.

Could you add an option to the script so it will prune transaction that are older than some number of day.
e.g: prune the transactions that are older than 40 days. Means keep last 40 days
transactions.

Please take these suggestion into account and redeliver the script to us.
Comment 2 Philipp Hahn univentionstaff 2023-07-28 15:40:08 CEST
(In reply to Christina Scheinig from comment #0)
> The translog pruning script in
> /usr/share/univention-directory-notifier/univention-translog will not work
> for the customer because they have more than 40 million entries in translog

Please be more specific as the script has no size limit by design.

> and the script has an ldapsearch with this filter:
> 1266 filterstr = '(reqSession=*)'

Which is not a problem as the search uses paging.

> The customer also noticed that this part of the code will not purge the ldap
> when the translog file is already purge.

Have you read --help:
>   1370     parser_prune.add_argument("trans", metavar="TID", type=int, help="Oldest transaction number to keep (negative numbers: the number of transactions to keep)")

The script is idempotent: All TIDs before the specified TID are purged. If they are already purged there is noting to do.

> This case happen as the prune_ldap function failed due to the ldapsearch.
> 
> 1234        if opt.trans <= translog.first:
> 1235            log.fatal('Already purged.')
> 1236            raise Abort()
> 1237        if opt.trans >= translog.last:
> 1238            log.fatal('Nothing to purge.')
> 1239            raise Abort()
> 1240        assert translog.first < opt.trans < translog.last

That code is from `prune_file` and has nothing to do with `ldapsearch`.

> The customer suggests that:
<suggestions removed>

will not work and break as soon the LDAP is inconsistent with the file.

> Could you add an option to the script so it will prune transaction that are
> older than some number of day.

Impossible as there are no timestamp for TIDs, neither in the file nor in LDAP.
Comment 3 Christina Scheinig univentionstaff 2023-08-17 14:12:41 CEST
The customer gave some new information:
he says, they run the script and the first time it was stuck because the ldapsearch in the code was taking time.
The second time it exited with the error message
'Already purged.'
'Nothing to purge.'

The two cases are reproducible.

1) Put Millions of entry in the translog ldap database and run the prune
2) Kill the process and rerun it with same args.
Comment 4 Erik Damrose univentionstaff 2023-08-17 16:06:24 CEST
What exact commands are executed?
What is the reason to kill the script execution, what is the system state?
Comment 5 Philipp Hahn univentionstaff 2023-08-18 09:09:24 CEST
(In reply to Christina Scheinig from comment #3)
> The customer gave some new information:
> he says, they run the script and the first time it was stuck because the
> ldapsearch in the code was taking time.
> The second time it exited with the error message
> 'Already purged.'
> 'Nothing to purge.'

By killing the script you made you system state inconsistent!
Now they're complaining that your system state is inconsistent?

UCS 4.4-9 is using an old version of `univention-translog`, while [UCS 5.0-0+e169](https://errata.software-univention.de/#/?erratum=5.0x169) contains an improved version, which is now idempotent. The most important change is [Make univention-translog prune idempotent](https://git.knut.univention.de/univention/ucs/-/commit/13d04059989b16d96ebe4144945739e4d9ce8c9a), which changes those 2 `raise Abort()` into a `return`, so `prune_ldap()` is still called even when `prune_file()` would abort otherwise.

sed -e '1236,1239s/raise Abort()/return/' -i /usr/share/univention-directory-notifier/univention-translog