Bug 55342 – replication: failed.ldif generated from UDL cache incompatible with local LDAP server

Bug 55342 - replication: failed.ldif generated from UDL cache incompatible with local LDAP server


Summary:	replication: failed.ldif generated from UDL cache incompatible with local LDA...

Status:	NEW

Product:	UCS
Classification:	Unclassified
Component:	LDAP
Version:	UCS 4.4
Hardware:	Other Linux

Importance:	P5 normal (vote)
Target Milestone:	---
Assigned To:	UCS maintainers
QA Contact:	UCS maintainers

URL:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2022-10-25 12:53 CEST by Philipp Hahn
Modified:	2022-12-19 11:11 CET (History)
CC List:	4 users (show)

See Also:	48627
What kind of report is it?:	Bug Report
What type of bug is this?:	5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?:	1: Will affect a very few installed domains
How will those affected feel about the bug?:	5: Blocking further progress on the daily work
User Pain:	0.143
Enterprise Customer affected?:
School Customer affected?:	Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:	Yes
Ticket number:	2022102121000414
Bug group (optional):
Max CVSS v3 score:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Philipp Hahn

2022-10-25 12:53:26 CEST

When a large group was modified there was a TIMEOUT after 5 min in UDL.
The replication.py modules failed to contact the local LDAP server during that time.
In that case ucs44/management/univention-directory-replication/replication.py:859 defaults to use the data from its cache:
> old = listener_old

For some unknown reason the data from the local UDN cache was not synchronized with the data from the local LDAP server: the generated LDIF contained some users, which no longer existed in the local LDAP server.

Thus the generated `failed.ldif` could not be applied to the local LDAP server:
> /usr/sbin/univention-directory-replication-resync /var/lib/univention-directory-replication/failed.ldif

The "right" thing was to just delete `failed.ldif` file and let UDN resume the last transaction, now being able to connect the local LDAP server again, build and apply the right LDIF and succeeding.

Comment 1 Dirk Wiesenthal

2022-11-18 11:43:49 CET

I will downvote this priority bug because from my understanding:

It happened only once
The reason is unknown
A workaround exists
And the workaround seems to be the viable solution (remove failed.ldif). I do not see a clear path to a product fix

If this happens again, we should reinvestigate.

Comment 2 Philipp Hahn

2022-12-19 11:11:59 CET

Taking Bug 48627 comment 4 into account `replication.py` not only the generating an invalid `failed.ldif` is an issue, but also the fact, that `replication.py` touches *all* LDAP entries which are thus put into the UDL cache on any BDC and RDC, which massively increases their size and leads to errors during join of large domains.

We already added the UCRV `listener/cache/filter` to exempt certain LDAP classes from being added to the UDL cache, but this does not provide enough infrastructure for `replication.py` to opt-out from the caching.
In addition to `handle_every_delete` a similar mechanism for _updating_ would also be needed.