Bug 34292 - uldap.py: Try again when the LDAP server is not reachable
uldap.py: Try again when the LDAP server is not reachable
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 3.2
Other Linux
: P5 normal (vote)
: UCS 3.2-1-errata
Assigned To: Stefan Gohmann
Janek Walkenhorst
:
Depends on:
Blocks: 34293 35852
  Show dependency treegraph
 
Reported: 2014-03-10 09:23 CET by Stefan Gohmann
Modified: 2014-09-09 20:20 CEST (History)
2 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Gohmann univentionstaff 2014-03-10 09:23:25 CET
Since UCS 3.2 the LDAP server is restarted in more and more cases. For example during the schema registration. This leads to a lot of "Can't contact LDAP server" messages and sometimes to aborts for example during the join.

With Bug #32617 we switched to LDAPReconnectObject which reconnects to the LDAP server if a connection is lost.

We should try to connect to the server again in a defined interval when the server is not reachable.

In this bug we should not try to iterate among the different LDAP servers.

UCR variables suggestion:
 ldap/client/retry=(true|false)
 ldap/client/retry/number=<number of retries>
 ldap/client/retry/wait=<seconds waiting before next retry>

Defaults:
 ldap/client/retry=true
 ldap/client/retry/count=5
 ldap/client/retry/count=1
Comment 1 Sönke Schwardt-Krummrich univentionstaff 2014-03-11 15:44:10 CET
> Defaults:
>  ldap/client/retry=true
>  ldap/client/retry/count=5
>  ldap/client/retry/count=1
                     ^^^^^ ==> wait

I doubt that the slapd is ready within 5 seconds after initiating a restart.
I would suggest to wait about 30 seconds. What about using an exponentially increasing timeout with given minimum (e.g. 1) and maximum value (e.g. 5)?
Comment 2 Stefan Gohmann univentionstaff 2014-03-12 16:20:49 CET
(In reply to Sönke Schwardt-Krummrich from comment #1)
> > Defaults:
> >  ldap/client/retry=true
> >  ldap/client/retry/count=5
> >  ldap/client/retry/count=1
>                      ^^^^^ ==> wait
> 
> I doubt that the slapd is ready within 5 seconds after initiating a restart.
> I would suggest to wait about 30 seconds. What about using an exponentially
> increasing timeout with given minimum (e.g. 1) and maximum value (e.g. 5)?

In my tests it took round about one or two seconds. But you are right we should use a higher value.

I would suggest we use only one variable for the count and wait for always one second. I think it is much easier:
 ldap/client/retry/count?30
Comment 3 Stefan Gohmann univentionstaff 2014-03-12 20:01:50 CET
First code was added to univention.uldap (r48474). The following things are still missing:
 - YAML
 - Cleanup
 - UCR description
 - Tests
Comment 4 Stefan Gohmann univentionstaff 2014-03-14 07:25:47 CET
(In reply to Stefan Gohmann from comment #3)
> First code was added to univention.uldap (r48474). The following things are
> still missing:
>  - YAML
>  - Cleanup
>  - UCR description
>  - Tests

The name of the variable is now ldap/client/retry/count and it is set to 10 by default. This means uldap tries it again for 10 times. This value is also given to the LDAPReconnectObject instance.

The UCR description was added to univention-base-files.

Code fixes:
 - univention-python: r48474, r48498, r48520
 - univention-base-files: r48479, r48519, r48539

YAML:
 - univention-python: r48540
 - univention-base-files: r48540

ucs-test:
 - r48477, r48478, r48487, r48507
Comment 5 Janek Walkenhorst univentionstaff 2014-03-14 14:35:27 CET
UMC still works when slapd runs only intermittently.
ldap/client/retry/count works as advertised.
Advisories: OK
Comment 6 Moritz Muehlenhoff univentionstaff 2014-03-17 12:59:56 CET
http://errata.univention.de/ucs/3.2/70.html