Bug 39139

Summary: rndc reload against dlz_bind9 takes three minutes (plus segfault)
Product: UCS Reporter: Arvid Requate <requate>
Component: Samba4Assignee: Arvid Requate <requate>
Status: CLOSED FIXED QA Contact: Stefan Gohmann <gohmann>
Severity: normal    
Priority: P1 CC: gohmann, roland.buser, walkenhorst
Version: UCS 4.0   
Target Milestone: UCS 4.0-3-errata   
Hardware: Other   
OS: Linux   
URL: https://bugzilla.samba.org/show_bug.cgi?id=13214
What kind of report is it?: --- What type of bug is this?: ---
Who will be affected by this bug?: --- How will those affected feel about the bug?: ---
User Pain: Enterprise Customer affected?:
School Customer affected?: ISV affected?:
Waiting Support: Flags outvoted (downgraded) after PO Review:
Ticket number: Bug group (optional):
Max CVSS v3 score:
Attachments: dlz_bind9_rndc_reload.patch

Description Arvid Requate univentionstaff 2015-08-10 22:25:52 CEST
During tests for Bug 34184 I noticed that the bind9 reload initiated by the listener module takes way too long, making e.g. 175sync_create_dns_alias fail. In that test case the reload is required to make dns_bind9 load a new forward zone created by the test.

syslog shows:
==============================================================
Aug  5 01:29:52 unassigned-hostname named[21800]: reloading configuration succeeded
Aug  5 01:29:52 unassigned-hostname named[21800]: reloading zones succeeded
Aug  5 01:29:52 unassigned-hostname named[21800]: samba_dlz: shutting down
[...]
Aug  5 01:32:53 unassigned-hostname kernel: [24143.199912] named[21807]: segfault at 0 ip 00007f43a9d67cca sp 00007f43ab7efe70 error 6 in dlz_bind9.so[7f43a9d65000+9000]
Aug  5 01:32:58 unassigned-hostname named[22711]: starting BIND 9.8.4-rpz2+rl005.12-P1 -c /etc/bind/named.conf.samba4 -f -d 0
==============================================================

listener.log shows (added a log message for timestamp)
==============================================================
05.08.15 01:29:52.547  LISTENER    ( PROCESS ) : DNS: Reloading bind9
server reload successful
==============================================================

When I rename /usr/sbin/rndc to that the listener module (bind9.py) doesn't find it, it restarts the DNS server instead and the test succeeds in time.

It's pretty strange that the segfault happens three minutes after the "samba_dlz: shutting down".
Comment 1 Arvid Requate univentionstaff 2015-08-11 21:57:02 CEST
Created attachment 7099 [details]
dlz_bind9_rndc_reload.patch

The attached patch (still raw debug version) fixes the issue.

For every "rndc reload" the dlz_create method of dlz_bind9 is called, then the dlz_configure method which apparently always needs to report all zones found as writeable, even though the module was not unloaded in the mean time. This seems to be necessary due to the way bind9 handles "views". The bind9 syslog output is pretty misleading here:

===============================================================
Aug  5 08:10:59 unassigned-hostname named[2916]: Loading 'samba4.zone' using driver dlopen
Aug  5 08:10:59 unassigned-hostname named[2916]: samba_dlz: dlz_create (2)
Aug  5 08:10:59 unassigned-hostname named[2916]: samba_dlz: starting configure
[...]
Aug  5 08:10:59 unassigned-hostname named[2916]: samba_dlz: configured writeable zone '8.200.10.in-addr.arpa'
[...]
Aug  5 08:10:59 unassigned-hostname named[2916]: samba_dlz: configured writeable zone 'arucs31ad1.qa'
[...]
Aug  5 08:10:59 unassigned-hostname named[2916]: samba_dlz: configured writeable zone '_msdcs.arucs31ad1.qa'
[...]
Aug  5 08:10:59 unassigned-hostname named[2916]: Warning: 'empty-zones-enable/disable-empty-zone' not set: disabling RFC 1918 empty zones
Aug  5 08:10:59 unassigned-hostname named[2916]: zone 8.200.10.in-addr.arpa/NONE: (other) removed
Aug  5 08:10:59 unassigned-hostname named[2916]: zone arucs31ad1.qa/NONE: (other) removed
Aug  5 08:10:59 unassigned-hostname named[2916]: zone _msdcs.arucs31ad1.qa/NONE: (other) removed
Aug  5 08:10:59 unassigned-hostname named[2916]: reloading configuration succeeded
Aug  5 08:10:59 unassigned-hostname named[2916]: reloading zones succeeded
Aug  5 08:10:59 unassigned-hostname named[2916]: samba_dlz: dlz_destroy called
===============================================================

But the "zone ... (other) removed" messages only seem to indicate that the zones have been removed from "the old production view list or our temporary list", quoting a comment from bind9-9.8.4.dfsg.P1/bin/named/server.c.

Also the "samba_dlz: dlz_destroy called" seems to follow this pattern:
* rndc reload
* dlz_create (second time / instance)
* dlz_configure (for new view)
* dlz_destroy (probably to get rid of the first instance)

Since the dlz_bind9 module behaves like a singleton, the dlz_destroy doesn't do anything as long as the reference count of the module has not reached zero.

Details you probably don't want to known. :-)
Comment 2 Arvid Requate univentionstaff 2015-08-11 21:57:09 CEST
*** Bug 39080 has been marked as a duplicate of this bug. ***
Comment 3 Arvid Requate univentionstaff 2015-08-12 23:24:00 CEST
The samba source package is rebuilt with adjusted/improved dlz_bind9 patch.

With the fix this doesn't hang and crash the named any more:

/usr/sbin/rndc reload; host $(hostname)

Advisory: 2015-08-12-samba.yaml
Comment 4 Arvid Requate univentionstaff 2015-08-12 23:25:17 CEST
*** Bug 39076 has been marked as a duplicate of this bug. ***
Comment 5 Stefan Gohmann univentionstaff 2015-08-27 12:04:48 CEST
Tests: OK (I was able to reproduce it with the old version)

YAML: OK
Comment 6 Janek Walkenhorst univentionstaff 2015-08-27 18:14:16 CEST
<http://errata.univention.de/ucs/4.0/289.html>