Bug 40373 - Replication may run into LDAP search timeout and lets join fail
Replication may run into LDAP search timeout and lets join fail
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Listener (univention-directory-listener)
UCS 4.1
Other Linux
: P5 normal (vote)
: UCS 4.1-0-errata
Assigned To: Philipp Hahn
Arvid Requate
:
Depends on: 40230
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-06 10:39 CET by Sönke Schwardt-Krummrich
Modified: 2016-02-04 13:46 CET (History)
2 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional): External feedback, Large environments, UCS Performance
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sönke Schwardt-Krummrich univentionstaff 2016-01-06 10:39:48 CET
This has to be fixed for UCS 4.1, too.


+++ This bug was initially created as a clone of Bug #40230 +++

Seen at Ticket#2015121121000514.

In a UCS@school scenario, the join of a DC slave failed as an LDAP search request (which presumably queried all LDAP objects) timed out. Subsequently, all following actions carried out by the listener join script failed. After a restart of the DC master, the LDAP server was more responsive and allowed a join.

========== join.log ==========
> [...]
> Configure 03univention-directory-listener.inst Fri Dec 11 16:20:57 CET 2015
> [...]
> 11.12.15 15:47:49.335  LISTENER    ( WARN    ) : initializing module replication
> File: /var/lib/univention-ldap/ldap/DB_CONFIG
> slapd: Kein Prozess gefunden
> File: /var/lib/univention-ldap/ldap/DB_CONFIG
> Starting ldap server(s): slapd ...done.
> Restarting ldap server(s).
> Stopping ldap server(s): slapd ...retry #1....done.
> Starting ldap server(s): slapd ...done.
> 11.12.15 15:53:04.076  LISTENER    ( ERROR   ) : could not get DNs when initializing replication: Timed out
> [...]
====================

In the source code, the timeout for this LDAP search is set to 5min:

========== management/univention-directory-listener/src/change.c ==========
> [...]
>        struct timeval timeout = {
>            .tv_sec = 5*60,
>            .tv_usec = 0,
>        };
>        int sizelimit0 = 0;
>        if ((rv =  ldap_search_ext_s(lp->ld, (*f)->base, (*f)->scope, (*f)->filter, _attrs, attrsonly1,  serverctrls, clientctrls, &timeout, sizelimit0, &res)) != LDAP_SUCCESS) {
>            univention_debug(UV_DEBUG_LISTENER, UV_DEBUG_ERROR, "could not get DNs when initializing %s: %s", handler->name, ldap_err2string(rv));
>            return rv;
>        }
> [...]
====================

It would be nice to use paging for the request in order to avoid these problems.
Comment 1 Philipp Hahn univentionstaff 2016-01-07 15:54:16 CET
create-32k-users-in-groups after ~18k users

OLD:
22:03:20.610  initializing module replication
22:14:39.892  finished initializing module replication
22:14:39.893  initializing module keytab
22:14:42.054  finished initializing module keytab
22:14:42.054  initializing module nss
22:14:45.711  finished initializing module nss
22:14:45.711  initializing module umc-service-providers
22:14:48.274  finished initializing module umc-service-providers
22:14:48.274  initializing module nfs-homes
22:14:50.386  finished initializing module nfs-homes
22:14:50.387  initializing module udm_extension
22:14:52.847  finished initializing module udm_extension
22:14:52.847  initializing module univention-saml-servers
22:14:55.097  finished initializing module univention-saml-servers
22:14:55.098  initializing module nscd_update
22:14:59.575  finished initializing module nscd_update
22:14:59.576  initializing module ldap_server
22:15:06.060  finished initializing module ldap_server
22:15:06.061  initializing module faillog
22:17:41.604  finished initializing module faillog
22:17:41.604  initializing module bind
22:17:43.783  finished initializing module bind
22:17:43.784  initializing module nagios-client
22:17:45.981  finished initializing module nagios-client
22:17:45.982  initializing module well-known-sid-name-mapping
22:19:11.138  finished initializing module well-known-sid-name-mapping
22:19:11.139  initializing module license_uuid
22:19:13.515  finished initializing module license_uuid
22:19:13.515  initializing module ldap_extension
22:19:16.635  finished initializing module ldap_extension
22:19:16.635  initializing module pkgdb-watch
22:19:18.845  finished initializing module pkgdb-watch
22:19:18.845  initializing module gencertificate
22:19:21.010  finished initializing module gencertificate
22:19:21.010  initializing module hosteddomains
22:19:23.191  finished initializing module hosteddomains
22:19:23.191  initializing module quota
22:19:25.504  finished initializing module quota 
22:19:25.504  initializing module keytab-member
22:19:27.673  finished initializing module keytab-member
22:19:27.673  initializing module nfs-shares
22:19:29.891  finished initializing module nfs-shares

NEW:
21:40:40.270  initializing module replication
21:50:07.610  finished initializing module replication with rv=0
21:50:07.610  initializing module keytab
21:50:09.647  finished initializing module keytab with rv=0
21:50:09.647  initializing module nss
21:50:12.275  finished initializing module nss with rv=0
21:50:12.275  initializing module umc-service-providers
21:50:14.514  finished initializing module umc-service-providers with rv=0
21:50:14.514  initializing module nfs-homes
21:50:16.537  finished initializing module nfs-homes with rv=0
21:50:16.537  initializing module udm_extension
21:50:18.883  finished initializing module udm_extension with rv=0
21:50:18.883  initializing module univention-saml-servers
21:50:21.036  finished initializing module univention-saml-servers with rv=0
21:50:21.036  initializing module nscd_update
21:50:23.935  finished initializing module nscd_update with rv=0
21:50:23.936  initializing module ldap_server
21:50:26.754  finished initializing module ldap_server with rv=0
21:50:26.754  initializing module faillog
21:50:43.220  finished initializing module faillog with rv=0
21:50:43.221  initializing module bind
21:50:45.260  finished initializing module bind with rv=0
21:50:45.260  initializing module nagios-client
21:50:47.443  finished initializing module nagios-client with rv=0
21:50:47.443  initializing module well-known-sid-name-mapping
21:51:09.420  finished initializing module well-known-sid-name-mapping with rv=0
21:51:09.420  initializing module license_uuid
21:51:11.689  finished initializing module license_uuid with rv=0
21:51:11.689  initializing module ldap_extension
21:51:14.393  finished initializing module ldap_extension with rv=0
21:51:14.393  initializing module pkgdb-watch
21:51:16.505  finished initializing module pkgdb-watch with rv=0
21:51:16.505  initializing module gencertificate
21:51:18.610  finished initializing module gencertificate with rv=0
21:51:18.610  initializing module hosteddomains
21:51:20.700  finished initializing module hosteddomains with rv=0
21:51:20.700  initializing module quota
21:51:22.839  finished initializing module quota with rv=0
21:51:22.839  initializing module keytab-member
21:51:24.922  finished initializing module keytab-member with rv=0
21:51:24.922  initializing module nfs-shares
21:51:27.753  finished initializing module nfs-shares with rv=0

r66641 | Bug #40373 UDL: Abort on out-of-memory
r66640 | Bug #40373 UDL: Fix memory leaks
r66639 | Bug #40373 UDL: Free LDAP memory
r66638 | Bug #40373 UDL: Remove pointless free()
r66637 | Bug #40373 UDL: Only retrieve DNs
r66636 | Bug #40373 UDL: Fix long search timeout
r66635 | Bug #40373 UDL: static change_init_module()
r66634 | Bug #40373 UDL: Declare extern
r66633 | Bug #40373 UDL: Remove self-include
r66632 | Bug #40373 UDL: Copyright 2016

Package: univention-directory-listener
Version: 10.0.0-4.291.201601071541
Branch: ucs_4.1-0
Scope: errata4.1-0

r66642 | Bug #40373 UDL: Fix long search timeout YAML
 univention-directory-listener.yaml

QA:
 ucr set listener/timeout/scans=1 ;  univention-join
Comment 2 Arvid Requate univentionstaff 2016-02-01 21:01:13 CET
Verified:
* Code review Ok
* Advisory / Versioning Ok
* Function Ok

I did an A/B comparison with
  cpulimit to 10% on the master slapd w/o LDAP indexing and udm creating users
** Negative:
   With ucr listener/timeout/scans=1 I got this in the listener.log
============================================================================
23.11.15 19:53:18.809  LISTENER    ( ERROR   ) : could not get DNs when initializing replication
: Timed out
23.11.15 19:53:18.810  LISTENER    ( WARN    ) : initializing module nfs-homes
23.11.15 19:53:19.518  LISTENER    ( WARN    ) : finished initializing module nfs-homes with rv=0
23.11.15 19:53:19.518  LISTENER    ( WARN    ) : initializing module keytab-member
23.11.15 19:53:20.521  LISTENER    ( ERROR   ) : could not get DNs when initializing keytab-memb
er: Timed out
============================================================================
** Positive:
   With ucr listener/timeout/scans=10 the joinscript succeeded.
Comment 3 Arvid Requate univentionstaff 2016-02-01 21:22:37 CET
Could you adjust the UCR Variable description to tell which time units or more general which value syntax is allowed / expected?

listener/timeout/scans: <empty>
 Timeout for long running synchronous LDAP search queries. Default: 2h

That would lead me to think I can set "2h" as value. In fact that would end up getting interpreted as 2 seconds (I checked with univention_config_get_int).
Comment 4 Philipp Hahn univentionstaff 2016-02-02 08:12:16 CET
(In reply to Arvid Requate from comment #3)
> Could you adjust the UCR Variable description to tell which time units or
> more general which value syntax is allowed / expected?

r67109 | Bug #40373 UDL: Improve UCR variable description

Package: univention-directory-listener
Version: 10.0.0-5.296.201602020808
Branch: ucs_4.1-0
Scope: errata4.1-0

r67111 | Bug #40230,Bug #40373 UDL: Improve UCR variable description YAML
 univention-directory-listener.yaml
Comment 5 Janek Walkenhorst univentionstaff 2016-02-04 13:46:39 CET
<http://errata.software-univention.de/ucs/4.1/80.html>