Bug 55342 - replication: failed.ldif generated from UDL cache incompatible with local LDAP server
replication: failed.ldif generated from UDL cache incompatible with local LDA...
Status: NEW
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 4.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2022-10-25 12:53 CEST by Philipp Hahn
Modified: 2022-12-19 11:11 CET (History)
4 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.143
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review: Yes
Ticket number: 2022102121000414
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2022-10-25 12:53:26 CEST
When a large group was modified there was a TIMEOUT after 5 min in UDL.
The replication.py modules failed to contact the local LDAP server during that time.
In that case ucs44/management/univention-directory-replication/replication.py:859 defaults to use the data from its cache:
> old = listener_old

For some unknown reason the data from the local UDN cache was not synchronized with the data from the local LDAP server: the generated LDIF contained some users, which no longer existed in the local LDAP server.

Thus the generated `failed.ldif` could not be applied to the local LDAP server:
> /usr/sbin/univention-directory-replication-resync /var/lib/univention-directory-replication/failed.ldif

The "right" thing was to just delete `failed.ldif` file and let UDN resume the last transaction, now being able to connect the local LDAP server again, build and apply the right LDIF and succeeding.
Comment 1 Dirk Wiesenthal univentionstaff 2022-11-18 11:43:49 CET
I will downvote this priority bug because from my understanding:

It happened only once
The reason is unknown
A workaround exists
And the workaround seems to be the viable solution (remove failed.ldif). I do not see a clear path to a product fix

If this happens again, we should reinvestigate.
Comment 2 Philipp Hahn univentionstaff 2022-12-19 11:11:59 CET
Taking Bug 48627 comment 4 into account `replication.py` not only the generating an invalid `failed.ldif` is an issue, but also the fact, that `replication.py` touches *all* LDAP entries which are thus put into the UDL cache on any BDC and RDC, which massively increases their size and leads to errors during join of large domains.

We already added the UCRV `listener/cache/filter` to exempt certain LDAP classes from being added to the UDL cache, but this does not provide enough infrastructure for `replication.py` to opt-out from the caching.
In addition to `handle_every_delete` a similar mechanism for _updating_ would also be needed.