Univention Bugzilla – Bug 57024
Join fails if notifier gets restarted
Last modified: 2024-02-28 13:17:25 CET
I've tried to join a school slave in a big customer environment (180k users and 290 school servers). The join failed in three of three cases because the notifer has been restarted while the directory-listener join script was running. The notifer has been restarted due to different reasons: ---------------------------------------------------------------------------------- 03.02.24 08:59:15.356 TRANSFILE ( ERROR ) : ldap_sasl_interactive_bind_s(): Can't contact LDAP server 03.02.24 08:59:25.933 DEBUG_INIT -- 03.02.24 12:00:31.106 TRANSFILE ( ERROR ) : ldap_sasl_interactive_bind_s(): Can't contact LDAP server 03.02.24 12:00:41.625 DEBUG_INIT -- 03.02.24 12:01:03.497 TRANSFILE ( ERROR ) : ldap_sasl_interactive_bind_s(): Can't contact LDAP server 03.02.24 12:01:14.181 DEBUG_INIT -- 03.02.24 15:46:17.359 TRANSFILE ( ERROR ) : ldap_sasl_interactive_bind_s(): Can't contact LDAP server 03.02.24 15:46:27.846 DEBUG_INIT -- ---------------------------------------------------------------------------------- ---------------------------------------------------------------------------------- 04.02.24 09:02:26.895 TRANSFILE ( ERROR ) : Normal exit 04.02.24 09:02:26.895 DEBUG_EXIT 04.02.24 09:02:27.425 DEBUG_INIT ---------------------------------------------------------------------------------- I saw different cases on the school slave. In two cases the replication module was successful but the following modules failed: ---------------------------------------------------------------------------------- 03.02.24 08:32:12.403 LISTENER ( WARN ) : Set Schema ID to 72 03.02.24 08:32:12.403 LISTENER ( WARN ) : initializing module replication File: /var/lib/univention-ldap/ldap/DB_CONFIG slapd: Kein Prozess gefunden File: /var/lib/univention-ldap/ldap/DB_CONFIG Starting slapd (via systemctl): slapd.service. Restarting slapd (via systemctl): slapd.service. 03.02.24 10:52:20.215 LISTENER ( WARN ) : finished initializing module replication with rv=0 03.02.24 10:52:20.220 LISTENER ( WARN ) : initializing module ldap-cache-baa04df67e7af6bb0769f5cb7e72dba9 03.02.24 10:52:32.508 LISTENER ( ERROR ) : connection to notifier was closed 03.02.24 10:52:32.508 LISTENER ( ERROR ) : failed to get schema DN 03.02.24 10:52:32.508 LISTENER ( WARN ) : finished initializing module ldap-cache-baa04df67e7af6bb0769f5cb7e72dba9 with rv=80 03.02.24 10:52:32.508 LISTENER ( WARN ) : initializing module well-known-sid-name-mapping 03.02.24 11:06:50.608 LISTENER ( ERROR ) : connection to notifier was closed 03.02.24 11:06:50.608 LISTENER ( ERROR ) : failed to get schema DN 03.02.24 11:06:50.609 LISTENER ( WARN ) : finished initializing module well-known-sid-name-mapping with rv=80 ---------------------------------------------------------------------------------- In these cases the join failed because the listener modules were not initialized and the listener is busy with initializing the modules and doesn't register syntax extensions or similar things. And in one case it happened during the replication module: ---------------------------------------------------------------------------------- 04.02.24 08:53:22.918 LISTENER ( WARN ) : handler: s4-connector (not ready) (ignore) 04.02.24 08:53:22.929 LISTENER ( WARN ) : Set Schema ID to 72 04.02.24 08:53:22.930 LISTENER ( WARN ) : initializing module replication File: /var/lib/univention-ldap/ldap/DB_CONFIG slapd: Kein Prozess gefunden File: /var/lib/univention-ldap/ldap/DB_CONFIG Starting slapd (via systemctl): slapd.serviceWarning: The unit file, source configuration file or drop-ins of slapd.service changed on disk. Run 'systemctl daemon-reload' to reload units. . Restarting slapd (via systemctl): slapd.serviceWarning: The unit file, source configuration file or drop-ins of slapd.service changed on disk. Run 'systemctl daemon-reload' to reload units. . 04.02.24 09:05:13.838 LISTENER ( ERROR ) : connection to notifier was closed 04.02.24 09:05:13.838 LISTENER ( ERROR ) : failed to get schema DN 04.02.24 09:05:13.842 LISTENER ( WARN ) : finished initializing module replication with rv=80 ---------------------------------------------------------------------------------- One solution would be a reconnection if the notifier gets restarted. It is a really big environment and so the join could take more than a day and a restart could happen.
I've installed the test packages and two joins were successful: root@schoolserver:~# grep RETRY /var/log/univention/join.log 06.02.24 06:57:47.444 LISTENER ( WARN ) : NOTIFIER_RETRY rv=1 06.02.24 09:05:46.447 LISTENER ( WARN ) : NOTIFIER_RETRY rv=1 06.02.24 09:51:12.421 LISTENER ( WARN ) : NOTIFIER_RETRY rv=1 06.02.24 12:04:19.994 LISTENER ( WARN ) : NOTIFIER_RETRY rv=1 07.02.24 09:52:43.667 LISTENER ( WARN ) : NOTIFIER_RETRY rv=1 07.02.24 11:12:43.727 LISTENER ( WARN ) : NOTIFIER_RETRY rv=1 07.02.24 12:53:02.368 LISTENER ( WARN ) : NOTIFIER_RETRY rv=1 root@schoolserver:~#
0c10d73e96 | retry in case notifier doesn't answer Package: univention-directory-listener Version: 14.0.9-2 Branch: ucs_5.0-0 Scope: errata5.0-6 New UCR variable listener/notifier/retries with default 30.
OK: Join works from each server role when notifier is stopped and started every 2 seconds OK: Join works from ucs@school replicas when notifier is stopped and started every 2 seconds OK: Code review OK: Log messages understandable OK: Package build and install OK: YAML
<https://errata.software-univention.de/#/?erratum=5.0x967>