Univention Bugzilla – Full Text Bug Listing |
Summary: | Listener doesn't reconnect to notifier after changing the LDAP server which leads to inconstencies | ||
---|---|---|---|
Product: | UCS | Reporter: | Stefan Gohmann <gohmann> |
Component: | Listener (univention-directory-listener) | Assignee: | Philipp Hahn <hahn> |
Status: | CLOSED FIXED | QA Contact: | Arvid Requate <requate> |
Severity: | normal | ||
Priority: | P5 | CC: | best, hahn, steuwer |
Version: | UCS 4.3 | ||
Target Milestone: | UCS 4.3-2-errata | ||
Hardware: | Other | ||
OS: | Linux | ||
URL: | https://trello.com/c/7jsYr4G2 | ||
What kind of report is it?: | Bug Report | What type of bug is this?: | 7: Crash: Bug causes crash or data loss |
Who will be affected by this bug?: | 1: Will affect a very few installed domains | How will those affected feel about the bug?: | 3: A User would likely not purchase the product |
User Pain: | 0.120 | Enterprise Customer affected?: | Yes |
School Customer affected?: | ISV affected?: | ||
Waiting Support: | Flags outvoted (downgraded) after PO Review: | ||
Ticket number: | 2018100121000666 | Bug group (optional): | |
Max CVSS v3 score: | |||
Bug Depends on: | |||
Bug Blocks: | 48088 | ||
Attachments: |
ucs-kvm-create.cfg
Log file revert_flow_refactoring.diff |
Description
Stefan Gohmann
2018-10-19 20:45:00 CEST
[4.3-2] 083de5b0ab Bug #48030 UDL: Remove wrong select_server() .../univention-directory-listener/debian/changelog | 6 +++++ .../univention-directory-listener/src/notifier.c | 28 ++++++++++++---------- .../src/select_server.c | 6 +++++ 3 files changed, 27 insertions(+), 13 deletions(-) [4.3-2] 6dc20d9efa Bug #48030 UDL: Print selected server management/univention-directory-listener/src/main.c | 8 +++----- management/univention-directory-listener/src/notifier.c | 1 - management/univention-directory-listener/src/select_server.c | 2 +- 3 files changed, 4 insertions(+), 7 deletions(-) [4.3-2] 4dda7ca8a5 Bug #48030 UDL: Remove broken connection attempt tracking management/univention-directory-listener/src/main.c | 2 -- management/univention-directory-listener/src/notifier.c | 2 -- .../univention-directory-listener/src/select_server.c | 15 --------------- .../univention-directory-listener/src/select_server.h | 2 -- 4 files changed, 21 deletions(-) Package: univention-directory-listener Version: 12.0.0-32A~4.3.0.201811051607 Branch: ucs_4.3-0 Scope: errata4.3-2 [4.3-2] eed92d5e59 Bug #48030 UDL: Remove wrong select_server() YAML doc/errata/staging/univention-directory-listener.yaml | 13 +++++++++++++ 1 file changed, 13 insertions(+) QA: I have not tested this, but the bug should be triggerable by this: @backup2: stop the listener on the 2nd backup (so it no longer updates its slapd) @master: create a new object @slave: Restart UDL until connected to UDN@backup1 @slave: wait for the 15s timeout period of UDL to close the LDAP connection @backup1: stop that upstream slapd @master: modify the object That should trigger UDL @ slave, it will try to reconnect to slapd@backup1, which is offline it will re-select to slapd@backup2, but will not find the object there @slave will remove the object instead of modifying it. With the new version UDL@slave will terminate eventually, get restarted by runit, will select UDN@backup2, wily only start doing things until UDL@backup2 is started again. Created attachment 9734 [details]
ucs-kvm-create.cfg
cfg file to setup test environment using out internal KVM test environment and IPs from my range.
Created attachment 9735 [details] Log file ... >[slave] 2018-11-09T10:19:55.558137 dn: uid=phahn,cn=users,dc=bug48030,dc=dev >[backup1] 2018-11-09T10:19:55.608117 dn: uid=phahn,cn=users,dc=bug48030,dc=dev >[backup1] 2018-11-09T10:19:55.608117 description: 1541755193 Entry is updated on both systems, but only the LDAP search on the backup1 shows the modified entry, as UDL@slave replicates from backup2, which is NOT updates and thus the old entry is replicated again. Created attachment 9736 [details]
revert_flow_refactoring.diff
Commit 083de5b0ab changes the program flow in notifier_listen(...). I don't think this change is functionally required for this bug and I also think that it could introduce an infinite loop in case change_update_dn returns something that is neither LDAP_SUCCESS nor LDAP_SERVER_DOWN. The attached patch shows exactly what I would like to see reverted.
There is another question: In case the currently selected LDAP server process becomes unavailable, the code would now retry 5 times and wait 30 seconds before each retry. So now we have (at least) 150 seconds failover delay where before we had (at least) 30 seconds. So my question would be: why did you choose to retry 5 times? Do you think it is too expensive to just retry once (or maximally twice) and then give up on the currently selected server? (In reply to Arvid Requate from comment #4) > Commit 083de5b0ab changes the program flow in notifier_listen(...). I don't > think this change is functionally required for this bug and I also think > that it could introduce an infinite loop in case change_update_dn returns > something that is neither LDAP_SUCCESS nor LDAP_SERVER_DOWN. The attached > patch shows exactly what I would like to see reverted. ... (In reply to Arvid Requate from comment #5) > There is another question: In case the currently selected LDAP server > process becomes unavailable, the code would now retry 5 times and wait 30 > seconds before each retry. So now we have (at least) 150 seconds failover > delay where before we had (at least) 30 seconds. So my question would be: > why did you choose to retry 5 times? Do you think it is too expensive to > just retry once (or maximally twice) and then give up on the currently > selected server? As discussed I removed the complete block as every ldap_search_ext_s() call is already wrapped by LDAP_RETRY(), which is used to handle LDAP_SERVER_DOWN. [4.3-2] 9464dfbe05 Bug #48030: Fix retry logic .../univention-directory-listener/debian/changelog | 6 +++++ .../univention-directory-listener/src/notifier.c | 30 +++------------------- 2 files changed, 10 insertions(+), 26 deletions(-) Package: univention-directory-listener Version: 12.0.0-33A~4.3.0.201811131559 Branch: ucs_4.3-0 Scope: errata4.3-2 [4.3-2] 63c1604429 Bug #48030 LDAP: Log connections at PROCESS level base/univention-policy/debian/changelog | 6 ++++++ base/univention-policy/lib/ldap.c | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) Package: univention-policy Version: 9.0.0-4A~4.3.0.201811131609 Branch: ucs_4.3-0 Scope: errata4.3-2 [4.3-2] 06c5a2a306 Bug #48030 LDAP: Log connections at PROCESS level YAML doc/errata/staging/univention-directory-listener.yaml | 2 +- doc/errata/staging/univention-policy.yaml | 10 ++++++++++ 2 files changed, 11 insertions(+), 1 deletion(-) QA: tail /var/log/univention/listener.log ... 13.11.18 16:28:36.578 LISTENER ( WARN ) : Notifier/LDAP server is ma43.phahn.qa:7389 13.11.18 16:28:36.578 LDAP ( PROCESS ) : connecting to ldap://ma43.phahn.qa:7389 Ok, works and the listener.log looks good. Now the LDAP_RETRY macro handles the retries. I saw 6 attempts in my logs, spread over 15 seconds before giving up. Then the listener is restarted. Tonights Jenkins run does't show replication related regressions. 3ef7d7a035 | Advisory: Fix version number in univention-policy.yaml 8edde6fe5d | Minimal wording adjustment in univention-directory-listener.yaml |