Bug 34763 - Keep alive for LDAP/notifier connections
Keep alive for LDAP/notifier connections
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Listener (univention-directory-listener)
UCS 4.0
Other Linux
: P5 enhancement (vote)
: UCS 4.0-3-errata
Assigned To: Philipp Hahn
Arvid Requate
:
Depends on:
Blocks: 41249
  Show dependency treegraph
 
Reported: 2014-05-07 12:19 CEST by Sönke Schwardt-Krummrich
Modified: 2018-07-23 18:54 CEST (History)
4 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
qa.patch (3.47 KB, patch)
2015-09-23 13:17 CEST, Arvid Requate
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sönke Schwardt-Krummrich univentionstaff 2014-05-07 12:19:50 CEST
Ticket#: 2014050721007513

At a customer 5 DC slaves thought they had a valid/established TCP connection to the DC backup port 7389 and were waiting for LDAP search results. But on the DC backup all those TCP connections were unknown and seem to be shut down for some time. Since the slaves do not send any data (waiting for LDAP responses), the will not recognize that the TCP connection is broken and has to be reestablished.


We should activate the TCP keep alive function within listener for notifier/LDAP connections, so the listener is able to detect broken connections.
Comment 1 Tim Petersen univentionstaff 2015-07-15 11:12:58 CEST
This happened again last week at 2015070921000225 and most likely today at 2015071521000179.

Versionbump ad new TM
Comment 2 Philipp Hahn univentionstaff 2015-09-03 13:25:28 CEST
2014050721007513 is a hanging LDAP connection (TCP 7389)
2015070921000225 is a hanging notifier connection (TCP 6669)

Prototype testing done with Python:

	c.set_option(ldap.OPT_X_KEEPALIVE_IDLE, 30)
	c.set_option(ldap.OPT_X_KEEPALIVE_INTERVAL, 10)
	c.set_option(ldap.OPT_X_KEEPALIVE_PROBES, 3)
can only be set between ldap_initialize() and the ldap_bind_(). As the Listener uses the shared impkementation from univention-ldap, it can't be set on a connection basis.
Setting it as the process global defaults should work.

    ldap.set_option(ldap.OPT_NETWORK_TIMEOUT, 10.0)
seems to not change anything.

    ldap.set_option(ldap.OPT_TIMEOUT, 10.0)
sets a default timeout, as most ldap_search_s() use timeout=0, which is infinite!

Need to test how SSL/TLS changes the behavior, as SSL implements an additional layer between LDAP and the network; there are known cases/reports where SSL is blocked waiting for data, which makes timeout handling in the LDAP "Application Layer" impossible.
Added KA as an additional robustnes layer.

See <http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/>

r63443 | Bug #34763 Listener: Add notifier TCP keep-alive
r63442 | Bug #34763 Listener: Add notifier timeout
r63441 | Bug #34763 Listener: Add TCP keep-alive
r63440 | Bug #34763 Listener: Add default timeout
r63439 | Bug #34763 Listener: Add timeout
r63438 | Bug #34763 Listener: Add notifier TCP keep-alive
r63437 | Bug #34763 Listener: Add notifier timeout
r63436 | Bug #34763 Listener: Add TCP keep-alive
r63435 | Bug #34763 Listener: Add default timeout
r63434 | Bug #34763 Listener: Add timeout

Package: univention-directory-listener
Version: 9.0.2-6.278.201509031256
Branch: ucs_4.0-0
Scope: errata4.0-3

Package: univention-directory-listener
Version: 10.0.0-1.279.201509031315
Branch: ucs_4.1-0

r63444 | Bug #34763 Listener: timeout, filter
 2015-09-32-univention-directory-listener.yaml
Comment 3 Arvid Requate univentionstaff 2015-09-23 13:17:16 CEST
Created attachment 7181 [details]
qa.patch

Please fix typo and bracing style according to the attached patch: LDAP_OPT_X_KEEPALIVE_PROBES is duplicated, LDAP_OPT_X_KEEPALIVE_INTERVAL is missing.


Otherwise this seems to work fine:

A) notifier socket keep-alive timeout: ifdown on the master was recognized after (60+12*5) seconds for the notifier connection. after ifup synchronization starts again.

B) LDAP timeout: 5*60 seconds after SIGSTOP on master slapd the listener on the DC backup logs:
==========================================================================
27.11.14 03:09:49.047  LDAP        ( ERROR   ) : start_tls: Timed out
27.11.14 03:09:49.047  LISTENER    ( WARN    ) : can not connect to ldap server (master50.ar40i1.qa)
27.11.14 03:09:49.048  LISTENER    ( WARN    ) : can not connect to any ldap server, retrying in 30 seconds
27.11.14 03:10:19.048  LISTENER    ( WARN    ) : chosen server: master50.ar40i1.qa:7389
27.11.14 03:15:19.100  LDAP        ( ERROR   ) : start_tls: Timed out
27.11.14 03:15:19.100  LISTENER    ( WARN    ) : can not connect to ldap server (master50.ar40i1.qa)
==========================================================================
When I SIGCONT the slapd within the LDAP_OPT_TIMEOUT interval the replication starts immediately. As far as I read the docs LDAP_OPT_NETWORK_TIMEOUT is only for the initial connection, but you are right, changing (e.g. shortening) it doesn't seem to make any difference here.
Comment 4 Philipp Hahn univentionstaff 2015-09-23 14:06:04 CEST
(In reply to Arvid Requate from comment #3)
> Created attachment 7181 [details]
> qa.patch
> 
> Please fix typo and bracing style according to the attached patch:

Until we have a coding style, I'll follow linux/Documentation/CodingStyle:156
 Do not unnecessarily use braces where a single statement will do.

> LDAP_OPT_X_KEEPALIVE_PROBES is duplicated, LDAP_OPT_X_KEEPALIVE_INTERVAL is
> missing.

Fixed, thanks.

r63924 | Bug #38823,Bug #34763 Listener: LDAP timeout,filter
r63923 | Bug #38823,Bug #34763 Listener: LDAP timeout,filter

Package: univention-directory-listener
Version: 9.0.2-7.283.201509231400
Branch: ucs_4.0-0
Scope: errata4.0-3

Package: univention-directory-listener
Version: 10.0.0-2.282.201509231400
Branch: ucs_4.1-0

r63925 | Bug #38823,Bug #34763 Listener: LDAP timeout,filter YAML
 2015-09-32-univention-directory-listener.yaml
Comment 5 Philipp Hahn univentionstaff 2015-09-23 14:23:32 CEST
> debina/control
> +Priority: standard

No: UDL should not be installed in all cases - think UCS base-system
Comment 6 Arvid Requate univentionstaff 2015-09-24 12:54:06 CEST
Ok.

Btw: interesting date :-) 2015-09-32-univention-directory-listener.yaml
Comment 7 Janek Walkenhorst univentionstaff 2015-09-30 12:55:57 CEST
<http://errata.software-univention.de/ucs/4.0/336.html>