Bug 39139 - rndc reload against dlz_bind9 takes three minutes (plus segfault)
rndc reload against dlz_bind9 takes three minutes (plus segfault)
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Samba4
UCS 4.0
Other Linux
: P1 normal (vote)
: UCS 4.0-3-errata
Assigned To: Arvid Requate
Stefan Gohmann
https://bugzilla.samba.org/show_bug.c...
:
: 39080 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-10 22:25 CEST by Arvid Requate
Modified: 2018-02-05 23:59 CET (History)
3 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
dlz_bind9_rndc_reload.patch (3.29 KB, patch)
2015-08-11 21:57 CEST, Arvid Requate
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2015-08-10 22:25:52 CEST
During tests for Bug 34184 I noticed that the bind9 reload initiated by the listener module takes way too long, making e.g. 175sync_create_dns_alias fail. In that test case the reload is required to make dns_bind9 load a new forward zone created by the test.

syslog shows:
==============================================================
Aug  5 01:29:52 unassigned-hostname named[21800]: reloading configuration succeeded
Aug  5 01:29:52 unassigned-hostname named[21800]: reloading zones succeeded
Aug  5 01:29:52 unassigned-hostname named[21800]: samba_dlz: shutting down
[...]
Aug  5 01:32:53 unassigned-hostname kernel: [24143.199912] named[21807]: segfault at 0 ip 00007f43a9d67cca sp 00007f43ab7efe70 error 6 in dlz_bind9.so[7f43a9d65000+9000]
Aug  5 01:32:58 unassigned-hostname named[22711]: starting BIND 9.8.4-rpz2+rl005.12-P1 -c /etc/bind/named.conf.samba4 -f -d 0
==============================================================

listener.log shows (added a log message for timestamp)
==============================================================
05.08.15 01:29:52.547  LISTENER    ( PROCESS ) : DNS: Reloading bind9
server reload successful
==============================================================

When I rename /usr/sbin/rndc to that the listener module (bind9.py) doesn't find it, it restarts the DNS server instead and the test succeeds in time.

It's pretty strange that the segfault happens three minutes after the "samba_dlz: shutting down".
Comment 1 Arvid Requate univentionstaff 2015-08-11 21:57:02 CEST
Created attachment 7099 [details]
dlz_bind9_rndc_reload.patch

The attached patch (still raw debug version) fixes the issue.

For every "rndc reload" the dlz_create method of dlz_bind9 is called, then the dlz_configure method which apparently always needs to report all zones found as writeable, even though the module was not unloaded in the mean time. This seems to be necessary due to the way bind9 handles "views". The bind9 syslog output is pretty misleading here:

===============================================================
Aug  5 08:10:59 unassigned-hostname named[2916]: Loading 'samba4.zone' using driver dlopen
Aug  5 08:10:59 unassigned-hostname named[2916]: samba_dlz: dlz_create (2)
Aug  5 08:10:59 unassigned-hostname named[2916]: samba_dlz: starting configure
[...]
Aug  5 08:10:59 unassigned-hostname named[2916]: samba_dlz: configured writeable zone '8.200.10.in-addr.arpa'
[...]
Aug  5 08:10:59 unassigned-hostname named[2916]: samba_dlz: configured writeable zone 'arucs31ad1.qa'
[...]
Aug  5 08:10:59 unassigned-hostname named[2916]: samba_dlz: configured writeable zone '_msdcs.arucs31ad1.qa'
[...]
Aug  5 08:10:59 unassigned-hostname named[2916]: Warning: 'empty-zones-enable/disable-empty-zone' not set: disabling RFC 1918 empty zones
Aug  5 08:10:59 unassigned-hostname named[2916]: zone 8.200.10.in-addr.arpa/NONE: (other) removed
Aug  5 08:10:59 unassigned-hostname named[2916]: zone arucs31ad1.qa/NONE: (other) removed
Aug  5 08:10:59 unassigned-hostname named[2916]: zone _msdcs.arucs31ad1.qa/NONE: (other) removed
Aug  5 08:10:59 unassigned-hostname named[2916]: reloading configuration succeeded
Aug  5 08:10:59 unassigned-hostname named[2916]: reloading zones succeeded
Aug  5 08:10:59 unassigned-hostname named[2916]: samba_dlz: dlz_destroy called
===============================================================

But the "zone ... (other) removed" messages only seem to indicate that the zones have been removed from "the old production view list or our temporary list", quoting a comment from bind9-9.8.4.dfsg.P1/bin/named/server.c.

Also the "samba_dlz: dlz_destroy called" seems to follow this pattern:
* rndc reload
* dlz_create (second time / instance)
* dlz_configure (for new view)
* dlz_destroy (probably to get rid of the first instance)

Since the dlz_bind9 module behaves like a singleton, the dlz_destroy doesn't do anything as long as the reference count of the module has not reached zero.

Details you probably don't want to known. :-)
Comment 2 Arvid Requate univentionstaff 2015-08-11 21:57:09 CEST
*** Bug 39080 has been marked as a duplicate of this bug. ***
Comment 3 Arvid Requate univentionstaff 2015-08-12 23:24:00 CEST
The samba source package is rebuilt with adjusted/improved dlz_bind9 patch.

With the fix this doesn't hang and crash the named any more:

/usr/sbin/rndc reload; host $(hostname)

Advisory: 2015-08-12-samba.yaml
Comment 4 Arvid Requate univentionstaff 2015-08-12 23:25:17 CEST
*** Bug 39076 has been marked as a duplicate of this bug. ***
Comment 5 Stefan Gohmann univentionstaff 2015-08-27 12:04:48 CEST
Tests: OK (I was able to reproduce it with the old version)

YAML: OK
Comment 6 Janek Walkenhorst univentionstaff 2015-08-27 18:14:16 CEST
<http://errata.univention.de/ucs/4.0/289.html>