Bug 32475 - univention-directory-listener should rebind if LDAP disconnects during change_update_dn
univention-directory-listener should rebind if LDAP disconnects during change...
Status: CLOSED INVALID
Product: UCS
Classification: Unclassified
Component: Listener (univention-directory-listener)
UCS 3.2
Other Linux
: P5 normal (vote)
: UCS 4.1-2-errata
Assigned To: Philipp Hahn
Arvid Requate
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-04 15:30 CEST by Arvid Requate
Modified: 2016-09-02 11:58 CEST (History)
4 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional): Error handling
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2013-09-04 15:30:09 CEST
If the connection of the univention-directory-listener to the LDAP server breaks (e.g. by restarting the slapd in a listener module) the function call change_update_dn fails with 

  LISTENER    ( ERROR   ) : change_update_dn: Can't contact LDAP server
  LDAP        ( INFO    ) : connecting to ldap://master110.arerr311i3.qa:7389/

Maybe the code could directly attempt a reconnect and avoid the alerting error message.
Comment 1 Philipp Hahn univentionstaff 2016-06-06 13:12:34 CEST
(In reply to Arvid Requate from comment #0)
> If the connection of the univention-directory-listener to the LDAP server
> breaks (e.g. by restarting the slapd in a listener module) the function call
> change_update_dn fails with 
> 
>   LISTENER    ( ERROR   ) : change_update_dn: Can't contact LDAP server
>   LDAP        ( INFO    ) : connecting to
> ldap://master110.arerr311i3.qa:7389/
> 
> Maybe the code could directly attempt a reconnect and avoid the alerting
> error message.

The above analysis is wrong: The update is already retried:

src/notifier.c:176
> while ((rv = change_update_dn(&trans)) != LDAP_SUCCESS) {
>	univention_debug(UV_DEBUG_LISTENER, UV_DEBUG_ERROR, "change_update_dn: %s", ldap_err2string(rv));
>	if (rv == LDAP_SERVER_DOWN)
>		if ((rv = connect_to_ldap(trans.lp, kp)) == 0)
>			continue;
>	goto out;
> }

Normally the function should not return an error, as the code before the call "ensures that the LDAP connection is open".
An error during change_update_dn() is sever enough to be logged, as it might break some atomicy assumptions and may lead to modules being called multiple time for the same change.
Comment 2 Arvid Requate univentionstaff 2016-07-13 20:39:00 CEST
> An error during change_update_dn() is sever enough to be logged, as it might break some atomicy assumptions and may lead to modules being called multiple time for the same change.

If that is the reason for the log message, i.e. if it's appearance indicates that we just ran into this problem, we should consider how we can fix this (e.g. by journaling the notifier id per handler module). Logging alone wouldn't be enough. Open for discussion.
Comment 3 Philipp Hahn univentionstaff 2016-07-18 16:06:53 CEST
(In reply to Arvid Requate from comment #2)
> > An error during change_update_dn() is sever enough to be logged, as it might break some atomicy assumptions and may lead to modules being called multiple time for the same change.
> 
> If that is the reason for the log message, i.e. if it's appearance indicates
> that we just ran into this problem, we should consider how we can fix this
> (e.g. by journaling the notifier id per handler module). Logging alone
> wouldn't be enough. Open for discussion.

1. Only if the error is caused by some LDAP functions failing to connect to the LDAP server, the function is retried again. This is very unlikely since Bug #40460 as all LDAP-searched by now use LDAP_RETRY() and re-try again themselves.

2. There are a lot more cases where change_update_dn() returns != LDAP_SUCCESS, as many low-level functions also return an "LDAP" error value when for example allocating memory fails or the cache is not working as expected. So actually calling ldap_err2string() on the return-value is mostly useless (expect when it is a real LDAP error). But the called functions then print a error message themselves, so changing that is IMHO minor.

Looking at the called function there already are lots of retries, so this is a fatal abort after everything else failed.
If that happens, printing an error message seems okay, especially as the UDL will terminate itself next (or may switch to a different LDAP server, which might fix the problem, but might lead otherwise hard-to-debug inconsistencies). So knowing about that case is better than guessing.


For better per-module-handling there are Bug #34341 and Bug #35193.




(In reply to Arvid Requate from comment #0)
> Maybe the code could directly attempt a reconnect
this is already done

> and avoid the alerting error message.
this is debatable, but I prefer the fatal message to be kept.
Comment 4 Arvid Requate univentionstaff 2016-07-18 16:13:41 CEST
Ok, understood.