Univention Bugzilla – Bug 34763
Keep alive for LDAP/notifier connections
Last modified: 2018-07-23 18:54:53 CEST
Ticket#: 2014050721007513 At a customer 5 DC slaves thought they had a valid/established TCP connection to the DC backup port 7389 and were waiting for LDAP search results. But on the DC backup all those TCP connections were unknown and seem to be shut down for some time. Since the slaves do not send any data (waiting for LDAP responses), the will not recognize that the TCP connection is broken and has to be reestablished. We should activate the TCP keep alive function within listener for notifier/LDAP connections, so the listener is able to detect broken connections.
This happened again last week at 2015070921000225 and most likely today at 2015071521000179. Versionbump ad new TM
2014050721007513 is a hanging LDAP connection (TCP 7389) 2015070921000225 is a hanging notifier connection (TCP 6669) Prototype testing done with Python: c.set_option(ldap.OPT_X_KEEPALIVE_IDLE, 30) c.set_option(ldap.OPT_X_KEEPALIVE_INTERVAL, 10) c.set_option(ldap.OPT_X_KEEPALIVE_PROBES, 3) can only be set between ldap_initialize() and the ldap_bind_(). As the Listener uses the shared impkementation from univention-ldap, it can't be set on a connection basis. Setting it as the process global defaults should work. ldap.set_option(ldap.OPT_NETWORK_TIMEOUT, 10.0) seems to not change anything. ldap.set_option(ldap.OPT_TIMEOUT, 10.0) sets a default timeout, as most ldap_search_s() use timeout=0, which is infinite! Need to test how SSL/TLS changes the behavior, as SSL implements an additional layer between LDAP and the network; there are known cases/reports where SSL is blocked waiting for data, which makes timeout handling in the LDAP "Application Layer" impossible. Added KA as an additional robustnes layer. See <http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/> r63443 | Bug #34763 Listener: Add notifier TCP keep-alive r63442 | Bug #34763 Listener: Add notifier timeout r63441 | Bug #34763 Listener: Add TCP keep-alive r63440 | Bug #34763 Listener: Add default timeout r63439 | Bug #34763 Listener: Add timeout r63438 | Bug #34763 Listener: Add notifier TCP keep-alive r63437 | Bug #34763 Listener: Add notifier timeout r63436 | Bug #34763 Listener: Add TCP keep-alive r63435 | Bug #34763 Listener: Add default timeout r63434 | Bug #34763 Listener: Add timeout Package: univention-directory-listener Version: 9.0.2-6.278.201509031256 Branch: ucs_4.0-0 Scope: errata4.0-3 Package: univention-directory-listener Version: 10.0.0-1.279.201509031315 Branch: ucs_4.1-0 r63444 | Bug #34763 Listener: timeout, filter 2015-09-32-univention-directory-listener.yaml
Created attachment 7181 [details] qa.patch Please fix typo and bracing style according to the attached patch: LDAP_OPT_X_KEEPALIVE_PROBES is duplicated, LDAP_OPT_X_KEEPALIVE_INTERVAL is missing. Otherwise this seems to work fine: A) notifier socket keep-alive timeout: ifdown on the master was recognized after (60+12*5) seconds for the notifier connection. after ifup synchronization starts again. B) LDAP timeout: 5*60 seconds after SIGSTOP on master slapd the listener on the DC backup logs: ========================================================================== 27.11.14 03:09:49.047 LDAP ( ERROR ) : start_tls: Timed out 27.11.14 03:09:49.047 LISTENER ( WARN ) : can not connect to ldap server (master50.ar40i1.qa) 27.11.14 03:09:49.048 LISTENER ( WARN ) : can not connect to any ldap server, retrying in 30 seconds 27.11.14 03:10:19.048 LISTENER ( WARN ) : chosen server: master50.ar40i1.qa:7389 27.11.14 03:15:19.100 LDAP ( ERROR ) : start_tls: Timed out 27.11.14 03:15:19.100 LISTENER ( WARN ) : can not connect to ldap server (master50.ar40i1.qa) ========================================================================== When I SIGCONT the slapd within the LDAP_OPT_TIMEOUT interval the replication starts immediately. As far as I read the docs LDAP_OPT_NETWORK_TIMEOUT is only for the initial connection, but you are right, changing (e.g. shortening) it doesn't seem to make any difference here.
(In reply to Arvid Requate from comment #3) > Created attachment 7181 [details] > qa.patch > > Please fix typo and bracing style according to the attached patch: Until we have a coding style, I'll follow linux/Documentation/CodingStyle:156 Do not unnecessarily use braces where a single statement will do. > LDAP_OPT_X_KEEPALIVE_PROBES is duplicated, LDAP_OPT_X_KEEPALIVE_INTERVAL is > missing. Fixed, thanks. r63924 | Bug #38823,Bug #34763 Listener: LDAP timeout,filter r63923 | Bug #38823,Bug #34763 Listener: LDAP timeout,filter Package: univention-directory-listener Version: 9.0.2-7.283.201509231400 Branch: ucs_4.0-0 Scope: errata4.0-3 Package: univention-directory-listener Version: 10.0.0-2.282.201509231400 Branch: ucs_4.1-0 r63925 | Bug #38823,Bug #34763 Listener: LDAP timeout,filter YAML 2015-09-32-univention-directory-listener.yaml
> debina/control > +Priority: standard No: UDL should not be installed in all cases - think UCS base-system
Ok. Btw: interesting date :-) 2015-09-32-univention-directory-listener.yaml
<http://errata.software-univention.de/ucs/4.0/336.html>