Univention Bugzilla – Full Text Bug Listing |
Summary: | Keep alive for LDAP/notifier connections | ||
---|---|---|---|
Product: | UCS | Reporter: | Sönke Schwardt-Krummrich <schwardt> |
Component: | Listener (univention-directory-listener) | Assignee: | Philipp Hahn <hahn> |
Status: | CLOSED FIXED | QA Contact: | Arvid Requate <requate> |
Severity: | enhancement | ||
Priority: | P5 | CC: | gohmann, grandjean, petersen, walkenhorst |
Version: | UCS 4.0 | ||
Target Milestone: | UCS 4.0-3-errata | ||
Hardware: | Other | ||
OS: | Linux | ||
See Also: |
https://forge.univention.org/bugzilla/show_bug.cgi?id=39439 https://forge.univention.org/bugzilla/show_bug.cgi?id=47389 |
||
What kind of report is it?: | --- | What type of bug is this?: | --- |
Who will be affected by this bug?: | --- | How will those affected feel about the bug?: | --- |
User Pain: | Enterprise Customer affected?: | ||
School Customer affected?: | ISV affected?: | ||
Waiting Support: | Flags outvoted (downgraded) after PO Review: | ||
Ticket number: | Bug group (optional): | ||
Max CVSS v3 score: | |||
Bug Depends on: | |||
Bug Blocks: | 41249 | ||
Attachments: | qa.patch |
Description
Sönke Schwardt-Krummrich
2014-05-07 12:19:50 CEST
This happened again last week at 2015070921000225 and most likely today at 2015071521000179. Versionbump ad new TM 2014050721007513 is a hanging LDAP connection (TCP 7389) 2015070921000225 is a hanging notifier connection (TCP 6669) Prototype testing done with Python: c.set_option(ldap.OPT_X_KEEPALIVE_IDLE, 30) c.set_option(ldap.OPT_X_KEEPALIVE_INTERVAL, 10) c.set_option(ldap.OPT_X_KEEPALIVE_PROBES, 3) can only be set between ldap_initialize() and the ldap_bind_(). As the Listener uses the shared impkementation from univention-ldap, it can't be set on a connection basis. Setting it as the process global defaults should work. ldap.set_option(ldap.OPT_NETWORK_TIMEOUT, 10.0) seems to not change anything. ldap.set_option(ldap.OPT_TIMEOUT, 10.0) sets a default timeout, as most ldap_search_s() use timeout=0, which is infinite! Need to test how SSL/TLS changes the behavior, as SSL implements an additional layer between LDAP and the network; there are known cases/reports where SSL is blocked waiting for data, which makes timeout handling in the LDAP "Application Layer" impossible. Added KA as an additional robustnes layer. See <http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/> r63443 | Bug #34763 Listener: Add notifier TCP keep-alive r63442 | Bug #34763 Listener: Add notifier timeout r63441 | Bug #34763 Listener: Add TCP keep-alive r63440 | Bug #34763 Listener: Add default timeout r63439 | Bug #34763 Listener: Add timeout r63438 | Bug #34763 Listener: Add notifier TCP keep-alive r63437 | Bug #34763 Listener: Add notifier timeout r63436 | Bug #34763 Listener: Add TCP keep-alive r63435 | Bug #34763 Listener: Add default timeout r63434 | Bug #34763 Listener: Add timeout Package: univention-directory-listener Version: 9.0.2-6.278.201509031256 Branch: ucs_4.0-0 Scope: errata4.0-3 Package: univention-directory-listener Version: 10.0.0-1.279.201509031315 Branch: ucs_4.1-0 r63444 | Bug #34763 Listener: timeout, filter 2015-09-32-univention-directory-listener.yaml Created attachment 7181 [details]
qa.patch
Please fix typo and bracing style according to the attached patch: LDAP_OPT_X_KEEPALIVE_PROBES is duplicated, LDAP_OPT_X_KEEPALIVE_INTERVAL is missing.
Otherwise this seems to work fine:
A) notifier socket keep-alive timeout: ifdown on the master was recognized after (60+12*5) seconds for the notifier connection. after ifup synchronization starts again.
B) LDAP timeout: 5*60 seconds after SIGSTOP on master slapd the listener on the DC backup logs:
==========================================================================
27.11.14 03:09:49.047 LDAP ( ERROR ) : start_tls: Timed out
27.11.14 03:09:49.047 LISTENER ( WARN ) : can not connect to ldap server (master50.ar40i1.qa)
27.11.14 03:09:49.048 LISTENER ( WARN ) : can not connect to any ldap server, retrying in 30 seconds
27.11.14 03:10:19.048 LISTENER ( WARN ) : chosen server: master50.ar40i1.qa:7389
27.11.14 03:15:19.100 LDAP ( ERROR ) : start_tls: Timed out
27.11.14 03:15:19.100 LISTENER ( WARN ) : can not connect to ldap server (master50.ar40i1.qa)
==========================================================================
When I SIGCONT the slapd within the LDAP_OPT_TIMEOUT interval the replication starts immediately. As far as I read the docs LDAP_OPT_NETWORK_TIMEOUT is only for the initial connection, but you are right, changing (e.g. shortening) it doesn't seem to make any difference here.
(In reply to Arvid Requate from comment #3) > Created attachment 7181 [details] > qa.patch > > Please fix typo and bracing style according to the attached patch: Until we have a coding style, I'll follow linux/Documentation/CodingStyle:156 Do not unnecessarily use braces where a single statement will do. > LDAP_OPT_X_KEEPALIVE_PROBES is duplicated, LDAP_OPT_X_KEEPALIVE_INTERVAL is > missing. Fixed, thanks. r63924 | Bug #38823,Bug #34763 Listener: LDAP timeout,filter r63923 | Bug #38823,Bug #34763 Listener: LDAP timeout,filter Package: univention-directory-listener Version: 9.0.2-7.283.201509231400 Branch: ucs_4.0-0 Scope: errata4.0-3 Package: univention-directory-listener Version: 10.0.0-2.282.201509231400 Branch: ucs_4.1-0 r63925 | Bug #38823,Bug #34763 Listener: LDAP timeout,filter YAML 2015-09-32-univention-directory-listener.yaml > debina/control
> +Priority: standard
No: UDL should not be installed in all cases - think UCS base-system
Ok. Btw: interesting date :-) 2015-09-32-univention-directory-listener.yaml |