Univention Bugzilla – Bug 42389
Debian-Jessie: bind9 1:9.9.5.dfsg-9+deb8u6
Last modified: 2017-04-04 18:29:51 CEST
Update bind9 for UCS-4.2 by merging patches Consider Bug #41714 +++ This bug was initially created as a clone of Bug #41929 +++ +++ This bug was initially created as a clone of Bug #41608 +++ Packages newer in Debian-Jessie than in UCS-4.1, patches in UCS 1: Review patches 2a: Cherry-pick patches if required and re-build package in UCS-4.2 2b: OR copy package from Debian-Jessie (drops UCS patches)
*** Bug 41714 has been marked as a duplicate of this bug. ***
r16741 | Bug #42389 bind9: UCS-4.2 New upstream version: API changed New compiler: More warnings/errors New ldapdb version: Old changes no longer apply <http://bind9-ldap.bayour.com/>/<https://github.com/FransUrbo/bind9-ldap> Package: bind9 Version: 1:9.9.5.dfsg-9+deb8u6A~4.2.0.201609151032 Branch: ucs_4.2-0
* OK: 070_bind9_restart.patch (Bug 29659) has been dropped. Bug 42380 has been commented to address this. * FAIL: 0012-Bug-41714-Retry-search-in-case-of-closed-connections.patch replaces 061_bind9_ldap_idletimeout.patch but doesn't handle a possible ISC_R_FAILURE return code from ldapdb_search any more. All other patches are OK and functional test was OK too. Update not tested yet.
(In reply to Arvid Requate from comment #3) > * OK: 070_bind9_restart.patch (Bug 29659) has been dropped. Bug 42380 has > been commented to address this. > > * FAIL: 0012-Bug-41714-Retry-search-in-case-of-closed-connections.patch > replaces 061_bind9_ldap_idletimeout.patch but doesn't handle a possible > ISC_R_FAILURE return code from ldapdb_search any more. The original approach no longer works with BIND9.9, as the new version now supported wildcard domains names, which does multiple LDAP searches internally. So I moved to code inside ldapdb_lookup() and fixed the original Bug #25138 there, which tries to re-connect if the LDAP server connection gets closed. 329 »···ldapdb_bind(zone, data, ldp); 330 »···if (*ldp == NULL) 331 »··· LDAPDB_FAILURE("bind failed"); ldapdb_bind() already tries to re-connect 5 times internally. 353 ldap_search_ext(*ldp, data->base, LDAP_SCOPE_SUBTREE, fltr, NULL, 0, NULL, NULL, NULL /*timeout*/, 0, &msgid); 354 if (msgid == -1) { 355 »···ldapdb_bind(zone, data, ldp); 356 »···if (*ldp != NULL) 357 »··· ldap_search_ext(*ldp, data->base, LDAP_SCOPE_SUBTREE, fltr, NULL, 0, NULL, NULL, NULL /*timeout*/, 0, &msg id); This already is a re-connect if the first search fails. Also ldap_search_ext() is the *a*synchronous search operation, which does NOT return an error condition immediately, even if the server is unreachable at that moment; the error is only returned later through ldap_result() 368 while ((rc = ldap_result(*ldp, msgid, 0, NULL, &res)) != LDAP_RES_SEARCH_RESULT ) { ... 468 } Here the case was missed, that ldap_result() returns -1 for error: This happens when the LDAP server goes down while a search is running. All other cases where ISC_R_FAILURE are not LDAP related, but internal errors like "out-of-memory" or "name-too-long", which an LDAP-retry will NOT fix! Also: The fix for Bug #29977 was never applied to any version in 4.x. Updated 0002-Bug-25868-Save-debugging-symbols-in-bind9-dbg-packag.patch For Bug #28748 a global LDAP timeout was added; this is a stab into the dark, as any previous core file is useless because of the bug above. Added 0013-Bug-28748-Default-LDAP-timeout-60s.patch r17164 | Bug #42389,Bug #28748,Bug #29977: bind9 Bug #42389: Detect LDAP_SERVER_DOWN Bug #29977: Fix debug symbol generation Bug #28748: Add default LDAP timeout Package: bind9 Version: 1:9.9.5.dfsg-9+deb8u6A~4.2.0.201702061124 Branch: ucs_4.2-0 QA: apt-cache show --no-all-versions bind9-dbg | grep -e Size: -e Filename: # previous version Installed-Size: 914 Filename: ucs_4.2-0/amd64/bind9-dbg_9.9.5.dfsg-9+deb8u6A~4.2.0.201609151032_amd64.deb Size: 143986 apt-cache show --no-all-versions bind9-dbg | grep -e Size -e Filename: # new version Installed-Size: 15848 Filename: ucs_4.2-0/amd64/bind9-dbg_9.9.5.dfsg-9+deb8u6A~4.2.0.201702061124_amd64.deb Size: 3332154 gdb -p `pidof named` --batch -ex 'thread apply all bt full' zless /usr/share/doc/bind9/changelog.Debian.gz
Verified: * OK: Bug #42389: Detect LDAP_SERVER_DOWN * OK: Bug #29977: Fix debug symbol generation * OK: Bug #28748: Add default LDAP timeout Timeout seems to take effect, but there is no retry, compare with the 2 seconds found in Bug 28748 Comment 6: root@master20:~# ucr set dns/backend='ldap'; /etc/init.d/bind9 restart; \ pkill --signal STOP slapd; \ time dig -p 7777 @127.0.0.1 $(hostname -f) +time=300 +retry=0; \ pkill --signal CONT slapd [...] real 1m0.077s I guess that should be 2 seconds? No clue what's going on here in the code.
(In reply to Arvid Requate from comment #5) > Verified: > > * OK: Bug #42389: Detect LDAP_SERVER_DOWN > * OK: Bug #29977: Fix debug symbol generation > * OK: Bug #28748: Add default LDAP timeout > > Timeout seems to take effect, but there is no retry, compare with the 2 > seconds found in Bug 28748 Comment 6: > > root@master20:~# ucr set dns/backend='ldap'; /etc/init.d/bind9 restart; \ > pkill --signal STOP slapd; \ > time dig -p 7777 @127.0.0.1 $(hostname -f) +time=300 > +retry=0; \ > pkill --signal CONT slapd > [...] > real 1m0.077s > > I guess that should be 2 seconds? No clue what's going on here in the code. r17168 | Bug #42389 bind9: Handle timeout r17169 | Bug #42389 bind9: Handle timeout 2 Package: bind9 Version: 1:9.9.5.dfsg-9+deb8u6A~4.2.0.201702071811 Version: 1:9.9.5.dfsg-9+deb8u6A~4.2.0.201702072125 Branch: ucs_4.2-0 Seen SIGSEGV Times out after 5m
r17170 | Bug #42389 bind9: Handle timeout 3 Package: bind9 Version: 1:9.9.5.dfsg-9+deb8u6A~4.2.0.201702081326 Branch: ucs_4.2-0 QA: real 2m0.125s
Ok, looks good and works. I've added a line to the UCS 4.2 changelog: The timeout and retry handling of the Bind9 LDAP database plugin has been improved (<u:bug>42389</u:bug>).
(In reply to Arvid Requate from comment #8) > Ok, looks good and works. I've added a line to the UCS 4.2 changelog: > > The timeout and retry handling of the Bind9 LDAP database plugin has been > improved (<u:bug>42389</u:bug>). It's called BIND - Berkeley Internet Name Domain → r76679
r17226 | Bug #42389: Fix crash on shutdown dns/backend=ldap LDAP-named crashed on shutdown or on error conditions, as bindname/bindpwd are freed, which are only ponters into a larger string. Package: bind9 Version: 1:9.9.5.dfsg-9+deb8u6A~4.2.0.201702281603 Branch: ucs_4.2-0 QA: Change the password in /etc/bind/univention.conf.d/* and run gdb --args named -c /etc/bind/named.conf -p 7777 -u bind -f -d 65535 old version will crash, new will start.
Ok, reproducible and fixed.
UCS 4.2 has been released: https://docs.software-univention.de/release-notes-4.2-0-en.html https://docs.software-univention.de/release-notes-4.2-0-de.html If this error occurs again, please use "Clone This Bug".