Univention Bugzilla – Bug 40230
Replication may run into LDAP search timeout and lets join fail
Last modified: 2016-02-04 15:58:14 CET
Seen at Ticket#2015121121000514. In a UCS@school scenario, the join of a DC slave failed as an LDAP search request (which presumably queried all LDAP objects) timed out. Subsequently, all following actions carried out by the listener join script failed. After a restart of the DC master, the LDAP server was more responsive and allowed a join. ========== join.log ========== > [...] > Configure 03univention-directory-listener.inst Fri Dec 11 16:20:57 CET 2015 > [...] > 11.12.15 15:47:49.335 LISTENER ( WARN ) : initializing module replication > File: /var/lib/univention-ldap/ldap/DB_CONFIG > slapd: Kein Prozess gefunden > File: /var/lib/univention-ldap/ldap/DB_CONFIG > Starting ldap server(s): slapd ...done. > Restarting ldap server(s). > Stopping ldap server(s): slapd ...retry #1....done. > Starting ldap server(s): slapd ...done. > 11.12.15 15:53:04.076 LISTENER ( ERROR ) : could not get DNs when initializing replication: Timed out > [...] ==================== In the source code, the timeout for this LDAP search is set to 5min: ========== management/univention-directory-listener/src/change.c ========== > [...] > struct timeval timeout = { > .tv_sec = 5*60, > .tv_usec = 0, > }; > int sizelimit0 = 0; > if ((rv = ldap_search_ext_s(lp->ld, (*f)->base, (*f)->scope, (*f)->filter, _attrs, attrsonly1, serverctrls, clientctrls, &timeout, sizelimit0, &res)) != LDAP_SUCCESS) { > univention_debug(UV_DEBUG_LISTENER, UV_DEBUG_ERROR, "could not get DNs when initializing %s: %s", handler->name, ldap_err2string(rv)); > return rv; > } > [...] ==================== It would be nice to use paging for the request in order to avoid these problems.
*** This bug has been marked as a duplicate of bug 34877 ***
(In reply to Alexander Kläser from comment #0) > It would be nice to use paging for the request in order to avoid these > problems. Yes. Alternatively, we could simply increase the timeout or make the timeout configurable in a first step.
(In reply to Stefan Gohmann from comment #2) > Yes. Alternatively, we could simply increase the timeout or make the timeout > configurable in a first step. IMHO, paging would have a real benefit as it allows to log progress information. Especially here where, progress information would be very helpful, as it is difficult to decide at a first glance whether a system might hang or not.
(In reply to Alexander Kläser from comment #3) > IMHO, paging would have a real benefit as it allows to log progress > information. Especially here where, progress information would be very > helpful, as it is difficult to decide at a first glance whether a system > might hang or not. Yes, sure. Therefore we have Bug #34877. I think increasing the timeout or make the timeout configurable is a first step which could help in such a support scenario. The implementation of paging would cost much more effort.
The problem occurred 2 additional times on different slaves. If univention-directory-listener (version 9.0.2-5.269.201506171450) is used, the joins completed successfully. (In reply to Stefan Gohmann from comment #4) > I think increasing the timeout or make the timeout configurable is a first > step which could help in such a support scenario. The implementation of > paging would cost much more effort. The problem seems to be introduced with commit 63434 where a hard timeout of 5 minutes has been added. As discussed, the default timeout should be raised to 2 hours and has to be configurable via UCR. The problem is here, that in problematic environments the timeout has to be set before the first join attempts. So setting the value via UCR policy is not possible.
Customer asked for fix because joining slave systems is not possible unless a listener downgrade is performed before joining the system.
Backport from Bug #40373: r67091 | Bug #40230 UDL: Abort on out-of-memory r67090 | Bug #40230 UDL: Fix memory leaks r67089 | Bug #40230 UDL: Free LDAP memory r67088 | Bug #40230 UDL: Remove pointless free() r67087 | Bug #40230 UDL: Only retrieve DNs r67086 | Bug #40230 UDL: Fix long search timeout r67085 | Bug #40230 UDL: static change_init_module() r67084 | Bug #40230 UDL: Declare extern r67083 | Bug #40230 UDL: Remove self-include r67082 | Bug #40230 UDL: Copyright 2016 Package: univention-directory-listener Version: 9.0.2-9.295.201602011148 Branch: ucs_4.0-0 Scope: errata4.0-4 r67093 | Bug #40338 UDL: Fix long search timeout YAML univention-directory-listener.yaml
Verified: * Code review Ok * Advisory / Versioning Ok * Function Ok Could you adjust the UCR Variable description (and YAML) to tell which time units or more general which value syntax is allowed / expected? See Bug 40373 Comment 3.
(In reply to Arvid Requate from comment #8) > Could you adjust the UCR Variable description (and YAML) to tell which time > units or more general which value syntax is allowed / expected? See Bug > 40373 Comment 3. r67110 | Bug #40230 UDL: Improve UCR variable description Package: univention-directory-listener Version: 9.0.2-10.297.201602020808 Branch: ucs_4.0-0 Scope: errata4.0-4 r67111 | Bug #40230,Bug #40373 UDL: Improve UCR variable description YAML univention-directory-listener.yaml
<http://errata.software-univention.de/ucs/4.0/395.html>