Bug 49194 - Improve robustness of UDN protocol
Improve robustness of UDN protocol
Status: CLOSED WORKSFORME
Product: UCS
Classification: Unclassified
Component: Notifier (univention-directory-notifier)
UCS 4.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: Stefan Gohmann
Philipp Hahn
https://etherpad-lite.knut.univention...
:
Depends on: 28233 49198 49199 49200 49201 49202
Blocks:
  Show dependency treegraph
 
Reported: 2019-03-28 21:49 CET by Christian Völker
Modified: 2019-04-09 14:10 CEST (History)
3 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 4: Will affect most installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.571
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support: Yes
Flags outvoted (downgraded) after PO Review:
Ticket number: 2019032621001041, 2019032821000494, 2019031921001509, 2019030421001144
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Völker univentionstaff 2019-03-28 21:49:51 CET
Support had the last 14 days at least four reports of UDN protocol v3 failing.

Troubleshooting has been very time consuming and at least partially not possible.

When the issue occurs synchronization fails at customer site with the well-known symptoms of users not able to log in and so on.

Forum reports as well numerous issues related to this.

Looks like there are some quirks in the implementation which causes this major issues.

We should make the protocol more robust regarding stability (do not fail so frequently) as well as from troubleshooting viewpoint (be able to fix without massive manual synchronization and renumbering of involved files.
Comment 2 Arvid Requate univentionstaff 2019-03-29 00:22:01 CET
FYI: Ticket#2019032821000494 mentions this error message coming from slapd: "MDB_MAP_FULL: Environment mapsize limit reached".

This looks like the "standard" lmdb error message indicating that the virtual memory limit of the MDB database has been reached. That value can be configured via UCR ldap/database/mdb/maxsize. The default value of 2147483648 (= 2*1024*1024*1024 = 2 GB) has been chosen as the maximum possible value for i386 (IIRC). For amd64 it can be increased as desired and the new value applies directly when the slapd is started again.

On amd64 systems the new cn=translog is part of the overall MDB database size (on i386 we use BDB instead for it), so there is an increased virtual memory footprint for the slapd.

Maybe all tickets noted here share a common issue, but it could also be that they have different or multiple issues. We should keep that in mind when analyzing further.
Comment 4 Stefan Gohmann univentionstaff 2019-04-04 16:53:25 CEST
We have released the following updates:

 - Bug #49198: Mutliple entries in the transaction file
 - Bug #28233: Notifier should check free space
 - Bug #49201: Extend univention-translog by various consistency checks

 - Bug #49199: [4.3] Mutliple entries in the transaction file
 - Bug #49200: [4.3] Notifier should check free space
 - Bug #49202: [4.3] Extend univention-translog by various consistency checks

And we have created two SDB articles:
 https://help.univention.com/t/problem-umc-diagnostic-module-complains-about-problems-with-udn-replication/11707
 https://help.univention.com/t/how-to-reset-listener-notifier-replication/11710

As discussed, the following articles should no longer be used since they are wrong:
 - https://help.univention.com/t/transaction-file-checking/6418
 - https://help.univention.com/t/fixing-translog-issues/11613