Bug 44497 - Missing dependency for wait_for_replication
Missing dependency for wait_for_replication
Status: NEW
Product: UCS Test
Classification: Unclassified
Component: General
unspecified
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-04-28 08:24 CEST by Florian Best
Modified: 2023-07-06 15:00 CEST (History)
1 user (show)

See Also:
What kind of report is it?: Development Internal
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Best univentionstaff 2017-04-28 08:24:44 CEST
http://jenkins.knut.univention.de:8080/job/UCS-4.2/job/UCS-4.2-0/job/AutotestJoin/126/SambaVersion=s3,Systemrolle=member/testReport/59_udm/02_prohibitedObjectClasses/test/

Prints a lot of these errors:
(2017-04-28 00:43:07.725794) /usr/share/ucs-test/lib/base.sh: Zeile 220: /usr/lib/nagios/plugins/check_univention_replication: Datei oder Verzeichnis nicht gefunden
Comment 1 Philipp Hahn univentionstaff 2023-07-06 15:00:30 CEST
# dpkg -S /usr/lib/nagios/plugins/check_univention_replication
univention-nagios-client: /usr/lib/nagios/plugins/check_univention_replication
# aptitude why univention-nagios-client
i   univention-server-master Empfiehlt univention-nagios-client

A `Recommends` is optional and can be removed. `wait_for_replication` should not depend on an optional package itself.

It also is unstable:

ucs-test [10_ldap/43_replication_ldif](https://jenkins2022.knut.univention.de/job/UCS-5.0/job/UCS-5.0-4/job/AutotestJoin/lastCompletedBuild/SambaVersion=s4,Systemrolle=slave/testReport/10_ldap/42replication_ldiff/slave095/) failed for 5.0-4 with Samba4 on ~Slave~Replica: The test (again) started too early when replication and S4C-synchronization was not yet complete:
```
01:47:11.39 Object modified: cn=qmcibzxx,cn=groups,dc=autotest095,dc=test
01:47:12.58 Waiting for replication...
01:47:12.73 OK: replication complete (nid=1316 lid=1316)
01:47:12.74 replication complete.
```
~1s looks way too fast for a change to go to OpenLDAP@Primary → S4C → Samba → S4C → OpenLDAP@Primary → UDN → UDL → OpenLDAP@Replica.
#bitflip we really should find a way to make (our 3 implementations of) `wait_for_replication` more robust and correct as these _S4C-changes-behind-our-back_ have made and still are making many tests flaky. I know that @arequate et.al. already invested time on this, but maybe we should swarm that issue again. Making UDL and UDN observable might be a first step.

@arequate added:

Yes, making UDL observable may be one of the useful ingredients. May also help for things like Bug #55337 . But there are a lot of different moving parts involved here during replication. In the past I've thought about tracking the `entryCSN` to see how it propagates through the system. Ideally OpenLDAP would use the AD-Replication model (`uSNChanged` per object etc pp) as it offers much more data to track the changes, but that's not going to happen in this branch of reality.


Currently we have multiple implementations:
# git grep -n -e '\(def\|function\) wait_for_\(listener_\)\?replication\w*\>' -e '\<wait_for_\(listener_\)\?replication\w*\> *\(() *\)\?{'
management/univention-management-console-module-adtakeover/umc/python/adtakeover/takeover.py:2154:def wait_for_listener_replication(progress=None, max_time=None):
test/ucs-test/debian/changelog:15776:  * 10_ldap/41listener: use function wait_for_replication (Bug #32537)
test/ucs-test/lib/base.sh:233:wait_for_replication () { # wait for listener/notifier replication to complete (timeout 5m)
test/ucs-test/lib/base.sh:247:wait_for_replication_and_postrun () { #wait for listener/notifier replicaion and listener postrun delay
test/ucs-test/tests/59_udm/66_test_udm_computers.py:42:def wait_for_replication_cleanup(wait_for_replication):
test/ucs-test/tests/conftest.py:131:def wait_for_replication() -> Callable[..., None]:
test/ucs-test/univention/testing/utils.py:483:def wait_for_replication_from_master_openldap_to_local_samba(replication_postrun=False, ldap_filter=None, verbose=True):
test/ucs-test/univention/testing/utils.py:497:def wait_for_replication_from_local_samba_to_local_openldap(replication_postrun=False, ldap_filter=None, verbose=True):
test/ucs-test/univention/testing/utils.py:546:def wait_for_listener_replication(verbose=True):
test/ucs-test/univention/testing/utils.py:576:def wait_for_listener_replication_and_postrun(verbose=True):
test/utils/utils.sh:299:wait_for_replication () {


1. We should add some functionality to univention-lib or UDL to check and wait for the replication to finish.
2. See https://git.knut.univention.de/univention/ucs/-/issues/1435 for adding metrics to UDL and UDN
3. See https://git.knut.univention.de/univention/ucs/-/merge_requests/796/diffs#diff-content-69c92ea883a3a965e3ce2e7935b673695ad63f20 for some work on UDL.