Bug 42389 - Debian-Jessie: bind9 1:9.9.5.dfsg-9+deb8u6
Debian-Jessie: bind9 1:9.9.5.dfsg-9+deb8u6
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: DNS
UCS 4.2
Other Linux
: P5 normal (vote)
: UCS 4.2
Assigned To: Philipp Hahn
Arvid Requate
: interim-2
: 41714 (view as bug list)
Depends on: 41929
Blocks: 41961
  Show dependency treegraph
 
Reported: 2016-09-14 13:04 CEST by Philipp Hahn
Modified: 2017-04-04 18:29 CEST (History)
1 user (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 2: Improvement: Would be a product improvement
Who will be affected by this bug?: 5: Will affect all installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.286
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2016-09-14 13:04:35 CEST
Update bind9 for UCS-4.2 by merging patches

Consider Bug #41714

+++ This bug was initially created as a clone of Bug #41929 +++
+++ This bug was initially created as a clone of Bug #41608 +++
Packages newer in Debian-Jessie than in UCS-4.1, patches in UCS
1:  Review patches
2a: Cherry-pick patches if required and re-build package in UCS-4.2
2b: OR copy package from Debian-Jessie (drops UCS patches)
Comment 1 Philipp Hahn univentionstaff 2016-09-15 10:44:40 CEST
*** Bug 41714 has been marked as a duplicate of this bug. ***
Comment 2 Philipp Hahn univentionstaff 2016-09-15 12:30:32 CEST
r16741 | Bug #42389 bind9: UCS-4.2
 New upstream version: API changed
 New compiler: More warnings/errors
 New ldapdb version: Old changes no longer apply
  <http://bind9-ldap.bayour.com/>/<https://github.com/FransUrbo/bind9-ldap>

Package: bind9
Version: 1:9.9.5.dfsg-9+deb8u6A~4.2.0.201609151032
Branch: ucs_4.2-0
Comment 3 Arvid Requate univentionstaff 2017-02-01 16:38:51 CET
* OK: 070_bind9_restart.patch (Bug 29659) has been dropped. Bug 42380 has been commented to address this.

* FAIL: 0012-Bug-41714-Retry-search-in-case-of-closed-connections.patch
  replaces 061_bind9_ldap_idletimeout.patch but doesn't handle a possible
  ISC_R_FAILURE return code from ldapdb_search any more.

All other patches are OK and functional test was OK too.
Update not tested yet.
Comment 4 Philipp Hahn univentionstaff 2017-02-06 12:04:39 CET
(In reply to Arvid Requate from comment #3)
> * OK: 070_bind9_restart.patch (Bug 29659) has been dropped. Bug 42380 has
> been commented to address this.
> 
> * FAIL: 0012-Bug-41714-Retry-search-in-case-of-closed-connections.patch
>   replaces 061_bind9_ldap_idletimeout.patch but doesn't handle a possible
>   ISC_R_FAILURE return code from ldapdb_search any more.

The original approach no longer works with BIND9.9, as the new version now supported wildcard domains names, which does multiple LDAP searches internally.
So I moved to code inside ldapdb_lookup() and fixed the original Bug #25138 there, which tries to re-connect if the LDAP server connection gets closed.

329 »···ldapdb_bind(zone, data, ldp);
330 »···if (*ldp == NULL)
331 »···  LDAPDB_FAILURE("bind failed");

ldapdb_bind() already tries to re-connect 5 times internally.

353   ldap_search_ext(*ldp, data->base, LDAP_SCOPE_SUBTREE, fltr, NULL, 0, NULL, NULL, NULL /*timeout*/, 0, &msgid);
354   if (msgid == -1) {
355 »···ldapdb_bind(zone, data, ldp);
356 »···if (*ldp != NULL)
357 »···  ldap_search_ext(*ldp, data->base, LDAP_SCOPE_SUBTREE, fltr, NULL, 0, NULL, NULL, NULL /*timeout*/, 0, &msg    id);

This already is a re-connect if the first search fails.
Also ldap_search_ext() is the *a*synchronous search operation, which does NOT return an error condition immediately, even if the server is unreachable at that moment; the error is only returned later through ldap_result()

368   while ((rc = ldap_result(*ldp, msgid, 0, NULL, &res)) != LDAP_RES_SEARCH_RESULT ) {
...
468   }

Here the case was missed, that ldap_result() returns -1 for error: This happens when the LDAP server goes down while a search is running.

All other cases where ISC_R_FAILURE are not LDAP related, but internal errors like "out-of-memory" or "name-too-long", which an LDAP-retry will NOT fix!


Also:
The fix for Bug #29977 was never applied to any version in 4.x.
 Updated 0002-Bug-25868-Save-debugging-symbols-in-bind9-dbg-packag.patch

For Bug #28748 a global LDAP timeout was added; this is a stab into the dark, as any previous core file is useless because of the bug above.
 Added 0013-Bug-28748-Default-LDAP-timeout-60s.patch


r17164 | Bug #42389,Bug #28748,Bug #29977: bind9
 Bug #42389: Detect LDAP_SERVER_DOWN
 Bug #29977: Fix debug symbol generation
 Bug #28748: Add default LDAP timeout

Package: bind9
Version: 1:9.9.5.dfsg-9+deb8u6A~4.2.0.201702061124
Branch: ucs_4.2-0

QA:
 apt-cache show --no-all-versions bind9-dbg | grep -e Size: -e Filename: # previous version
  Installed-Size: 914
  Filename: ucs_4.2-0/amd64/bind9-dbg_9.9.5.dfsg-9+deb8u6A~4.2.0.201609151032_amd64.deb
  Size: 143986
 apt-cache show --no-all-versions bind9-dbg | grep -e Size -e Filename: # new version
  Installed-Size: 15848
  Filename: ucs_4.2-0/amd64/bind9-dbg_9.9.5.dfsg-9+deb8u6A~4.2.0.201702061124_amd64.deb
  Size: 3332154
 gdb -p `pidof named` --batch -ex 'thread apply all bt full'
 zless /usr/share/doc/bind9/changelog.Debian.gz
Comment 5 Arvid Requate univentionstaff 2017-02-06 19:28:29 CET
Verified:

* OK: Bug #42389: Detect LDAP_SERVER_DOWN
* OK: Bug #29977: Fix debug symbol generation
* OK: Bug #28748: Add default LDAP timeout

Timeout seems to take effect, but there is no retry, compare with the 2 seconds found in Bug 28748 Comment 6:

root@master20:~# ucr set dns/backend='ldap'; /etc/init.d/bind9 restart; \
                 pkill --signal STOP slapd; \
                 time dig -p 7777 @127.0.0.1 $(hostname -f) +time=300 +retry=0; \
                 pkill --signal CONT slapd
[...]
real    1m0.077s

I guess that should be 2 seconds? No clue what's going on here in the code.
Comment 6 Philipp Hahn univentionstaff 2017-02-07 21:39:09 CET
(In reply to Arvid Requate from comment #5)
> Verified:
> 
> * OK: Bug #42389: Detect LDAP_SERVER_DOWN
> * OK: Bug #29977: Fix debug symbol generation
> * OK: Bug #28748: Add default LDAP timeout
> 
> Timeout seems to take effect, but there is no retry, compare with the 2
> seconds found in Bug 28748 Comment 6:
> 
> root@master20:~# ucr set dns/backend='ldap'; /etc/init.d/bind9 restart; \
>                  pkill --signal STOP slapd; \
>                  time dig -p 7777 @127.0.0.1 $(hostname -f) +time=300
> +retry=0; \
>                  pkill --signal CONT slapd
> [...]
> real    1m0.077s
> 
> I guess that should be 2 seconds? No clue what's going on here in the code.

r17168 | Bug #42389 bind9: Handle timeout
r17169 | Bug #42389 bind9: Handle timeout 2

Package: bind9
Version: 1:9.9.5.dfsg-9+deb8u6A~4.2.0.201702071811
Version: 1:9.9.5.dfsg-9+deb8u6A~4.2.0.201702072125
Branch: ucs_4.2-0


Seen SIGSEGV
Times out after 5m
Comment 7 Philipp Hahn univentionstaff 2017-02-08 14:20:50 CET
r17170 | Bug #42389 bind9: Handle timeout 3

Package: bind9
Version: 1:9.9.5.dfsg-9+deb8u6A~4.2.0.201702081326
Branch: ucs_4.2-0

QA: real    2m0.125s
Comment 8 Arvid Requate univentionstaff 2017-02-09 18:56:59 CET
Ok, looks good and works. I've added a line to the UCS 4.2 changelog:

The timeout and retry handling of the Bind9 LDAP database plugin has been improved (<u:bug>42389</u:bug>).
Comment 9 Philipp Hahn univentionstaff 2017-02-15 09:05:04 CET
(In reply to Arvid Requate from comment #8)
> Ok, looks good and works. I've added a line to the UCS 4.2 changelog:
> 
> The timeout and retry handling of the Bind9 LDAP database plugin has been
> improved (<u:bug>42389</u:bug>).

It's called BIND - Berkeley Internet Name Domain → r76679
Comment 10 Philipp Hahn univentionstaff 2017-02-28 16:45:12 CET
r17226 | Bug #42389: Fix crash on shutdown
 dns/backend=ldap LDAP-named crashed on shutdown or on error conditions, as bindname/bindpwd are freed, which are only ponters into a larger string.

Package: bind9
Version: 1:9.9.5.dfsg-9+deb8u6A~4.2.0.201702281603
Branch: ucs_4.2-0

QA: Change the password in /etc/bind/univention.conf.d/* and run
 gdb --args named -c /etc/bind/named.conf -p 7777 -u bind -f -d 65535
old version will crash, new will start.
Comment 11 Arvid Requate univentionstaff 2017-02-28 17:39:48 CET
Ok, reproducible and fixed.
Comment 12 Stefan Gohmann univentionstaff 2017-04-04 18:29:51 CEST
UCS 4.2 has been released:
 https://docs.software-univention.de/release-notes-4.2-0-en.html
 https://docs.software-univention.de/release-notes-4.2-0-de.html

If this error occurs again, please use "Clone This Bug".