Bug 54504 - [4.4] replication.py priority incorrect in listener module order
[4.4] replication.py priority incorrect in listener module order
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Listener (univention-directory-listener)
UCS 4.4
Other Linux
: P5 enhancement (vote)
: UCS 4.4-8-errata
Assigned To: Erik Damrose
Dirk Wiesenthal
:
: 54518 (view as bug list)
Depends on: 54061
Blocks:
  Show dependency treegraph
 
Reported: 2022-03-03 17:43 CET by Erik Damrose
Modified: 2022-03-14 10:30 CET (History)
4 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 6: Setup Problem: Issue for the setup process
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.343
Enterprise Customer affected?: Yes
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2022020921000257
Bug group (optional): Regression
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Erik Damrose univentionstaff 2022-03-03 17:43:20 CET
When backporting bug 26089 to UCS 4.4, an error was introduced. replication.py does not have the correct priority to run as the first module. The issue does not exist in UCS 5, no forward port necessary.

UCS 4 is missing the logic to explicitly set the replication.py priority in the file itself. On the other hand, the logic in the u-directory-listener which hardcodes the priority for replication.py is erroneous - the strcmp checks the loaded modules file basename, while the function gets the absolute path of the file loaded.

A fix would be to set the priority in replication.py, and remove the wrong check from u-d-l.

Impact: As replication.py is not run first, we see errors when joining UCS systems in large environments. Because the replication.py is not run first, all module initialization query the upstream notifier to get the latest schema. Normally the schema would be replicated once, by replication.py.

If i understand the module initialization code correctly, there is no retry logic if the notifier connection is closed due to a notifier restart on the DC master, of if the connection times out.

This leaves the modules uninitialized, and the join fails in 03univention-directory-listener.inst.

Indicator in join log:
27.02.22 18:15:46.352  LISTENER    ( INFO    ) : running handlers [udm_extension] for cn=Subschema
27.02.22 18:15:46.352  LISTENER    ( ALL     ) : handler: udm_extension considered
27.02.22 18:15:46.352  LISTENER    ( ALL     ) : handler: udm_extension (filter doesn't match)
27.02.22 18:15:46.352  LISTENER    ( ALL     ) : unblocking signals (was 2)
27.02.22 18:15:46.353  LISTENER    ( INFO    ) : module udm_extension for relating objects
27.02.22 18:15:46.357  LISTENER    ( ALL     ) : >>>MSGID: 8
GET_SCHEMA_ID

27.02.22 18:15:46.358  LISTENER    ( ALL     ) : RESULT: [(null)]
27.02.22 18:15:46.358  LISTENER    ( ERROR   ) : connection to notifier was closed
27.02.22 18:15:46.358  LISTENER    ( ERROR   ) : failed to get schema DN
27.02.22 18:15:46.359  LISTENER    ( WARN    ) : finished initializing module udm_extension with rv=80

while replication.py did not run until this point.

+++ This bug was initially created as a clone of Bug #54061 +++
Comment 1 Erik Damrose univentionstaff 2022-03-03 18:14:29 CET
2f04b6a62c Bug #54504: Explicitly set replication.py priority to 0.0 in listener module
817dc458a1 Bug #54504: Remove wrong code for replication.py priority
69e84b60dd Bug #54504: yaml

univention-directory-listener 13.0.3-7A~4.4.0.202203031804
univention-directory-replication 12.0.0-12A~4.4.0.202203031807

I could reproduce the issue in a DC master + DC slave environment with u-d-l package version 13.0.3-6A~4.4.0.202203020748 installed. While the system is joining, and joinscript 03univention-directory-listener.inst is executed, restart the u-d-notifier on the master multiple times. The join will fail, join.log will show failed module initializations as described above, with

... finished initializing module udm_extension with rv=80
Comment 2 Philipp Hahn univentionstaff 2022-03-04 08:42:07 CET
OK: 2f04b6a62c
OK: 817dc458a1
OK: 69e84b60dd
OK: errata-announce  -V --only univention-directory-replication.yaml
OK: errata-announce  -V --only univention-directory-listener.yaml
OK: apt -t apt install univention-directory-listener univention-directory-replication
OK: grep ^priority /usr/lib/univention-directory-listener/system/replication.py
OK: gdb print *handlers(->next)*

This might have been detected by failing ucs-test no-samba Backup: <https://jenkins.knut.univention.de:8181/job/UCS-5.0/job/UCS-5.0-1/job/AutotestUpgrade/SambaVersion=no-samba,Systemrolle=backup/ws/test/join.log>

Upgrade test while Master and Slave are still on 4.4-8

> 23:33:17 CET 2022: starting /usr/share/univention-join/univention-join> 23:34:13 CET 2022: Configure 01univention-ldap-server-init.inst> 23:34:15 CET 2022: Configure 02univention-directory-notifier.inst> 23:34:15 CET 2022: Configure 03univention-directory-listener.inst> 23:34:18.370  LISTENER    ( WARN    ) : handler: replication (not ready) (ignore)
> Restarting slapd (via systemctl): slapd.serviceWarning: slapd.service changed on disk. Run 'systemctl daemon-reload' to reload units.
.
> 23:34:19.831  LISTENER    ( WARN    ) : handler: faillog (not ready) (ignore)
> 23:34:19.837  LISTENER    ( WARN    ) : Set Schema ID to 4> 23:34:27.096  LISTENER    ( WARN    ) : initializing module quota
> UNIVENTION_DEBUG_BEGIN  : uldap.__open host=backup072.autotest072.local port=7389 base=dc=autotest072,dc=local> Traceback (most recent call last):
>   File "/usr/lib/univention-directory-listener/system/quota.py", line 216, in handler
>     if _is_container_change_relevant(new, old):
>   File "/usr/lib/univention-directory-listener/system/quota.py", line 142, in _is_container_change_relevant
>     lo = _get_ldap_connection()
>   File "/usr/lib/univention-directory-listener/system/quota.py", line 119, in _get_ldap_connection
>     connection = univention.uldap.getMachineConnection(ldap_master=False)
                                                         ^^^^^^^^^^^^^^^^^
…
> ldap.INVALID_CREDENTIALS: {'desc': 'Invalid credentials'}
> 23:34:27.126  LISTENER    ( WARN    ) : handler: quota (failed)

(gdb) bt
#0  change_init_module (lp=0x555555588910, handler=0x5555556bb640) at change.c:202
#1  0x000055555555e795 in change_new_modules (lp=0x555555588910) at change.c:228
#2  0x000055555555a46e in main (argc=17, argv=0x7fffffffe2e8) at main.c:609
Comment 3 Dirk Wiesenthal univentionstaff 2022-03-09 10:45:53 CET
OK: Code
OK: YAML
OK: Upgrade Tests
OK: Join with 5k users
Comment 5 Philipp Hahn univentionstaff 2022-03-14 10:30:38 CET
*** Bug 54518 has been marked as a duplicate of this bug. ***