Bug 46526 - BIND9 crashes - memory leak or corruption- /var/cache/bind/core
BIND9 crashes - memory leak or corruption- /var/cache/bind/core
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: DNS
UCS 4.3
Other Linux
: P5 normal (vote)
: UCS 4.3
Assigned To: Philipp Hahn
Stefan Gohmann
: interim-4
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-03-07 12:33 CET by Arvid Requate
Modified: 2018-03-14 14:38 CET (History)
3 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 5: Will affect all installed domains
How will those affected feel about the bug?: 1: Nuisance – not a big deal but noticeable
User Pain: 0.200
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
Valgrind - nothing interesting (19.65 KB, text/plain)
2018-03-09 09:23 CET, Philipp Hahn
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2018-03-07 12:33:36 CET
There is a /var/cache/bind/core after installation of UCS 4.3-0 Samba/AD DC Master:

root@master70:~# file /var/cache/bind/core
/var/cache/bind/core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/named -c /etc/bind/named.conf -p 7777 -u bind -f -d 0', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/named', platform: 'x86_64'

gdb says:
===============================================================================
Core was generated by `/usr/sbin/named -c /etc/bind/named.conf -p 7777 -u bind -f -d 0'.
Program terminated with signal SIGABRT, Aborted.
===============================================================================

===============================================================================
#3  0x00007f666c7d693a in isc_assertion_failed (file=file@entry=0x7f666c81b158 "../../../lib/isc/mem.c", line=line@entry=1080, type=type@entry=isc_assertiontype_insist, 
    cond=cond@entry=0x7f666c81b24f "ctx->stats[i].gets == 0U") at ../../../lib/isc/assertions.c:58
No locals.
#4  0x00007f666c7e6934 in destroy (ctx=ctx@entry=0x5595e784fde0) at ../../../lib/isc/mem.c:1080
        i = 183
        ondest = {magic = 0, events = {head = 0x5595e785f0f0, tail = 0x5595e60b72c8 <ns_g_lctx>}}
#5  0x00007f666c7e6e0c in isc__mem_destroy (ctxp=0x5595e60b7328 <ns_g_mctx>) at ../../../lib/isc/mem.c:1233
        ctx = 0x5595e784fde0
#6  0x00007f666c7ea935 in isc_mem_destroy (mctxp=0x5595e60b7328 <ns_g_mctx>) at ../../../lib/isc/mem.c:2770
#7  0x00005595e5e3c8c3 in main (argc=<optimized out>, argv=<optimized out>) at ../../../bin/named/main.c:1352
        result = <optimized out>
===============================================================================

root@master70:~/bind9-9.10.3.dfsg.P4# dpkg -l bind9 | tail -1
ii  bind9          1:9.10.3.dfsg.P4-12.3+deb9u4A~4.3.0.201802151445 amd64
Comment 1 Philipp Hahn univentionstaff 2018-03-08 13:43:45 CET
I also see similar and repeated corruption on my test system (without Samba4):

> Failing assertion due to probable leaked memory in context 0x555fefe44de0 ("main") (stats[174].gets == 1).
> ../../../lib/isc/mem.c:1080: INSIST(ctx->stats[i].gets == 0U) failed .

ucr set repository/online/unmaintained=yes repository/online/sources=yes
apt-get -qq update
apt-get -qq install libc6-dbg bind9-dbg gdb dpkg-dev debhelper
apt-get source bind9
cd bind9-*/
mkdir -p build/bin/named
gdb --cd build/bin/named -q --batch -ex 'thread apply all backtrace full -5' /usr/sbin/named /var/cache/bind/core 

Core was generated by `/usr/sbin/named -c /etc/bind/named.conf -p 7777 -u bind -f -d 0'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: Datei oder Verzeichnis nicht gefunden.
[Current thread is 1 (Thread 0x7f0c2e96f740 (LWP 30040))]

Thread 2 (LWP 30052):
#0  0x00007f0c2be51556 in __exit_thread () at ../sysdeps/unix/sysv/linux/exit-thread.h:36
        __arg1 = 0
        _a1 = 0
#1  start_thread (arg=0x7f0c26dd4700) at pthread_create.c:478
        pd = 0x7f0c26dd4700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139690168370944, -9108688101467719748, 0, 140727266607759, 0, 139690298110016, 9191633345236203452, 9191608951266506684}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = 0x7f0c26dd3f20 ""
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#2  0x00007f0c2b2a2aff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
No locals.

Thread 1 (Thread 0x7f0c2e96f740 (LWP 30040)):
#3  0x00007f0c2cafe93a in isc_assertion_failed (file=file@entry=0x7f0c2cb43158 "../../../lib/isc/mem.c", line=line@entry=1080, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7f0c2cb4324f "ctx->stats[i].gets == 0U") at ../../../lib/isc/assertions.c:58
No locals.
#4  0x00007f0c2cb0e934 in destroy (ctx=ctx@entry=0x55ce101adde0) at ../../../lib/isc/mem.c:1080
        i = 174
        ondest = {magic = 0, events = {head = 0x55ce101bd0f0, tail = 0x55ce0e3902c8 <ns_g_lctx>}}
#5  0x00007f0c2cb0ee0c in isc__mem_destroy (ctxp=0x55ce0e390328 <ns_g_mctx>) at ../../../lib/isc/mem.c:1233
        ctx = 0x55ce101adde0
#6  0x00007f0c2cb12935 in isc_mem_destroy (mctxp=0x55ce0e390328 <ns_g_mctx>) at ../../../lib/isc/mem.c:2770
No locals.
#7  0x000055ce0e1158c3 in main (argc=<optimized out>, argv=<optimized out>) at ../../../bin/named/main.c:1352
        result = <optimized out>


frame 4
set print pretty on
print *ctx

$9 = {
  common = {
    impmagic = 0, 
    magic = 0, 
    methods = 0x7f0c2cd5c240 <memmethods>
  }, 
  ondestroy = {
    magic = 1147491188, 
    events = {
      head = 0x0, 
      tail = 0x0
    }
  }, 
  flags = 2, 
  lock = {
    __data = {
      __lock = 0, 
      __count = 0, 
      __owner = 0, 
      __nusers = 0, 
      __kind = 3, 
      __spins = 0, 
      __elision = 0, 
      __list = {
        __prev = 0x0, 
        __next = 0x0
      }
    }, 
    __size =       '\000' <repeats 16 times>, "\003", '\000' <repeats 22 times>, 
    __align = 0
  }, 
  memalloc = 0x7f0c2cb0d8c0 <default_memalloc>, 
  memfree = 0x7f0c2cb0d8b0 <default_memfree>, 
  arg = 0x0, 
  max_size = 1100, 
  checkfree = isc_boolean_true, 
  stats = 0x55ce101adf50, 
  references = 0, 
  name =     "main", '\000' <repeats 11 times>, 
  tag = 0x0, 
  quota = 0, 
  total = 2396967, 
  inuse = 368, 
  maxinuse = 1447159, 
  hi_water = 0, 
  lo_water = 0, 
  hi_called = isc_boolean_false, 
  is_overmem = isc_boolean_false, 
  water = 0x0, 
  water_arg = 0x0, 
  pools = {
    head = 0x0, 
    tail = 0x0
  }, 
  poolcnt = 0, 
  mem_target = 4096, 
  freelists = 0x55ce101b6900, 
  basic_blocks = 0x7f0c25363010, 
  basic_table = 0x55ce101bf170, 
  basic_table_count = 3, 
  basic_table_size = 1024, 
  lowest = 0x7f0c25348010 "\020ߏ.\f\177", 
  highest = 0x7f0c2e96e00f <incomplete sequence \336>, 
  debuglist = 0x0, 
  debuglistcnt = 0, 
  memalloc_failures = 0, 
  link = {
    prev = 0xffffffffffffffff, 
    next = 0xffffffffffffffff
  }
}
Comment 2 Stefan Gohmann univentionstaff 2018-03-08 21:07:41 CET
It happened on another system at the end of the installation.

Maybe we should ensure in the 4.3 postup.sh that bind is started correctly?
Comment 3 Arvid Requate univentionstaff 2018-03-08 21:35:59 CET
All cores from the named running on port 7777  ? Can be checked quickly via
file /var/cache/bind/core

The processes are supervised by runsv. I can
 pkill -11 -f /usr/sbin/named
and 5 seconds later there is a new one.
Comment 4 Philipp Hahn univentionstaff 2018-03-09 09:13:45 CET
(In reply to Stefan Gohmann from comment #2)
> Maybe we should ensure in the 4.3 postup.sh that bind is started correctly?

What do you mean by "correctly"? The crash is happening (only) on shutdown.

(In reply to Arvid Requate from comment #3)
> All cores from the named running on port 7777

I have checked the code quickly and the crash happens only on shutdown: BIND checks for leaked memory there and find (at least) one allocation there not yet freed. An "assert" finds this and aborts the process.

Strictly speaking this is not critical as it only happens shutdown, but there might be a bitter problem with memory leaks.

As the crash only happens in the LDAP-BIND, the memory leak is probably introduced by one of our patches:
0001-Bug-22478-build-bind-with-libdb4.8.patch
0002-Bug-25868-Save-debugging-symbols-in-bind9-dbg-packag.patch
0003-Bug-24160-check-if-bind9-init-script-is-available-be.patch
0004-Bug-41714-Add-LDAP-support.patch
0004-Bug-41714-Add-LDAP-support.quilt
0005-Bug-41714-conditional-compiler-error.quilt
0006-Bug-41714-Adapt-to-new-APIs.quilt
0007-Bug-41714-Fix-illegal-return-value.quilt
0008-Bug-41714-Clone-URL.quilt
0009-Bug-41714-Check-for-allocation-error.quilt
0010-Bug-41714-Replace-deprecated-libldap-API.quilt
0011-Bug-41714-rename-errno-to-rc.quilt
0012-Bug-41714-Retry-search-in-case-of-closed-connections.quilt
0013-Bug-28748-Default-LDAP-timeout-60s.quilt
0014-Bug-42389-Fix-crash-on-shutdown.quilt
0015-Bug-45341-restart-bind9-only-if-dns-backend-is-set.patch
0015-Bug-46072-Tolerate-restart-failure-in-postinst.patch

Otherwise I would expect the crash to also happen with SAMBA-BIND or the PROXY-BIND.
Comment 5 Philipp Hahn univentionstaff 2018-03-09 09:23:28 CET
Created attachment 9456 [details]
Valgrind - nothing interesting
Comment 6 Stefan Gohmann univentionstaff 2018-03-09 09:29:23 CET
(In reply to Philipp Hahn from comment #4)
> (In reply to Stefan Gohmann from comment #2)
> > Maybe we should ensure in the 4.3 postup.sh that bind is started correctly?
> 
> What do you mean by "correctly"? The crash is happening (only) on shutdown.

At least on one test system, bind didn't response. We will try to reproduce it.
Comment 7 Philipp Hahn univentionstaff 2018-03-09 10:54:22 CET
r18047 | Bug #46526 bind9: Fix memory leak
r18047 | Bug #46526 bind9: Fix memory leak 2

Package: bind9
Version: 1:9.10.3.dfsg.P4-12.3+deb9u4A~4.3.0.201803091039
Branch: ucs_4.3-0

[4.3-0] 047ff6bafc Bug #46526: bind9 crash and memory leak
Comment 8 Philipp Hahn univentionstaff 2018-03-09 11:39:47 CET
OK: dpkg-query -W bind9 # 1:9.10.3.dfsg.P4-12.3+deb9u4A~4.3.0.201803091039
OK: rm -f /var/cache/bind/core && rndc -p 55555 stop && pidof named && sleep 5 && pidof named && [ ! -e /var/cache/bind/core ]
Comment 9 Stefan Gohmann univentionstaff 2018-03-10 13:01:57 CET
Code review: OK 

Tests: OK

Changelog: OK
Comment 10 Stefan Gohmann univentionstaff 2018-03-14 14:38:23 CET
UCS 4.3 has been released:
 https://docs.software-univention.de/release-notes-4.3-0-en.html
 https://docs.software-univention.de/release-notes-4.3-0-de.html

If this error occurs again, please use "Clone This Bug".