Univention Bugzilla – Bug 46526
BIND9 crashes - memory leak or corruption- /var/cache/bind/core
Last modified: 2018-03-14 14:38:23 CET
There is a /var/cache/bind/core after installation of UCS 4.3-0 Samba/AD DC Master: root@master70:~# file /var/cache/bind/core /var/cache/bind/core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/named -c /etc/bind/named.conf -p 7777 -u bind -f -d 0', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/named', platform: 'x86_64' gdb says: =============================================================================== Core was generated by `/usr/sbin/named -c /etc/bind/named.conf -p 7777 -u bind -f -d 0'. Program terminated with signal SIGABRT, Aborted. =============================================================================== =============================================================================== #3 0x00007f666c7d693a in isc_assertion_failed (file=file@entry=0x7f666c81b158 "../../../lib/isc/mem.c", line=line@entry=1080, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7f666c81b24f "ctx->stats[i].gets == 0U") at ../../../lib/isc/assertions.c:58 No locals. #4 0x00007f666c7e6934 in destroy (ctx=ctx@entry=0x5595e784fde0) at ../../../lib/isc/mem.c:1080 i = 183 ondest = {magic = 0, events = {head = 0x5595e785f0f0, tail = 0x5595e60b72c8 <ns_g_lctx>}} #5 0x00007f666c7e6e0c in isc__mem_destroy (ctxp=0x5595e60b7328 <ns_g_mctx>) at ../../../lib/isc/mem.c:1233 ctx = 0x5595e784fde0 #6 0x00007f666c7ea935 in isc_mem_destroy (mctxp=0x5595e60b7328 <ns_g_mctx>) at ../../../lib/isc/mem.c:2770 #7 0x00005595e5e3c8c3 in main (argc=<optimized out>, argv=<optimized out>) at ../../../bin/named/main.c:1352 result = <optimized out> =============================================================================== root@master70:~/bind9-9.10.3.dfsg.P4# dpkg -l bind9 | tail -1 ii bind9 1:9.10.3.dfsg.P4-12.3+deb9u4A~4.3.0.201802151445 amd64
I also see similar and repeated corruption on my test system (without Samba4): > Failing assertion due to probable leaked memory in context 0x555fefe44de0 ("main") (stats[174].gets == 1). > ../../../lib/isc/mem.c:1080: INSIST(ctx->stats[i].gets == 0U) failed . ucr set repository/online/unmaintained=yes repository/online/sources=yes apt-get -qq update apt-get -qq install libc6-dbg bind9-dbg gdb dpkg-dev debhelper apt-get source bind9 cd bind9-*/ mkdir -p build/bin/named gdb --cd build/bin/named -q --batch -ex 'thread apply all backtrace full -5' /usr/sbin/named /var/cache/bind/core Core was generated by `/usr/sbin/named -c /etc/bind/named.conf -p 7777 -u bind -f -d 0'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: Datei oder Verzeichnis nicht gefunden. [Current thread is 1 (Thread 0x7f0c2e96f740 (LWP 30040))] Thread 2 (LWP 30052): #0 0x00007f0c2be51556 in __exit_thread () at ../sysdeps/unix/sysv/linux/exit-thread.h:36 __arg1 = 0 _a1 = 0 #1 start_thread (arg=0x7f0c26dd4700) at pthread_create.c:478 pd = 0x7f0c26dd4700 now = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139690168370944, -9108688101467719748, 0, 140727266607759, 0, 139690298110016, 9191633345236203452, 9191608951266506684}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> pagesize_m1 = <optimized out> sp = 0x7f0c26dd3f20 "" freesize = <optimized out> __PRETTY_FUNCTION__ = "start_thread" #2 0x00007f0c2b2a2aff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 No locals. Thread 1 (Thread 0x7f0c2e96f740 (LWP 30040)): #3 0x00007f0c2cafe93a in isc_assertion_failed (file=file@entry=0x7f0c2cb43158 "../../../lib/isc/mem.c", line=line@entry=1080, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7f0c2cb4324f "ctx->stats[i].gets == 0U") at ../../../lib/isc/assertions.c:58 No locals. #4 0x00007f0c2cb0e934 in destroy (ctx=ctx@entry=0x55ce101adde0) at ../../../lib/isc/mem.c:1080 i = 174 ondest = {magic = 0, events = {head = 0x55ce101bd0f0, tail = 0x55ce0e3902c8 <ns_g_lctx>}} #5 0x00007f0c2cb0ee0c in isc__mem_destroy (ctxp=0x55ce0e390328 <ns_g_mctx>) at ../../../lib/isc/mem.c:1233 ctx = 0x55ce101adde0 #6 0x00007f0c2cb12935 in isc_mem_destroy (mctxp=0x55ce0e390328 <ns_g_mctx>) at ../../../lib/isc/mem.c:2770 No locals. #7 0x000055ce0e1158c3 in main (argc=<optimized out>, argv=<optimized out>) at ../../../bin/named/main.c:1352 result = <optimized out> frame 4 set print pretty on print *ctx $9 = { common = { impmagic = 0, magic = 0, methods = 0x7f0c2cd5c240 <memmethods> }, ondestroy = { magic = 1147491188, events = { head = 0x0, tail = 0x0 } }, flags = 2, lock = { __data = { __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 3, __spins = 0, __elision = 0, __list = { __prev = 0x0, __next = 0x0 } }, __size = '\000' <repeats 16 times>, "\003", '\000' <repeats 22 times>, __align = 0 }, memalloc = 0x7f0c2cb0d8c0 <default_memalloc>, memfree = 0x7f0c2cb0d8b0 <default_memfree>, arg = 0x0, max_size = 1100, checkfree = isc_boolean_true, stats = 0x55ce101adf50, references = 0, name = "main", '\000' <repeats 11 times>, tag = 0x0, quota = 0, total = 2396967, inuse = 368, maxinuse = 1447159, hi_water = 0, lo_water = 0, hi_called = isc_boolean_false, is_overmem = isc_boolean_false, water = 0x0, water_arg = 0x0, pools = { head = 0x0, tail = 0x0 }, poolcnt = 0, mem_target = 4096, freelists = 0x55ce101b6900, basic_blocks = 0x7f0c25363010, basic_table = 0x55ce101bf170, basic_table_count = 3, basic_table_size = 1024, lowest = 0x7f0c25348010 "\020ߏ.\f\177", highest = 0x7f0c2e96e00f <incomplete sequence \336>, debuglist = 0x0, debuglistcnt = 0, memalloc_failures = 0, link = { prev = 0xffffffffffffffff, next = 0xffffffffffffffff } }
It happened on another system at the end of the installation. Maybe we should ensure in the 4.3 postup.sh that bind is started correctly?
All cores from the named running on port 7777 ? Can be checked quickly via file /var/cache/bind/core The processes are supervised by runsv. I can pkill -11 -f /usr/sbin/named and 5 seconds later there is a new one.
(In reply to Stefan Gohmann from comment #2) > Maybe we should ensure in the 4.3 postup.sh that bind is started correctly? What do you mean by "correctly"? The crash is happening (only) on shutdown. (In reply to Arvid Requate from comment #3) > All cores from the named running on port 7777 I have checked the code quickly and the crash happens only on shutdown: BIND checks for leaked memory there and find (at least) one allocation there not yet freed. An "assert" finds this and aborts the process. Strictly speaking this is not critical as it only happens shutdown, but there might be a bitter problem with memory leaks. As the crash only happens in the LDAP-BIND, the memory leak is probably introduced by one of our patches: 0001-Bug-22478-build-bind-with-libdb4.8.patch 0002-Bug-25868-Save-debugging-symbols-in-bind9-dbg-packag.patch 0003-Bug-24160-check-if-bind9-init-script-is-available-be.patch 0004-Bug-41714-Add-LDAP-support.patch 0004-Bug-41714-Add-LDAP-support.quilt 0005-Bug-41714-conditional-compiler-error.quilt 0006-Bug-41714-Adapt-to-new-APIs.quilt 0007-Bug-41714-Fix-illegal-return-value.quilt 0008-Bug-41714-Clone-URL.quilt 0009-Bug-41714-Check-for-allocation-error.quilt 0010-Bug-41714-Replace-deprecated-libldap-API.quilt 0011-Bug-41714-rename-errno-to-rc.quilt 0012-Bug-41714-Retry-search-in-case-of-closed-connections.quilt 0013-Bug-28748-Default-LDAP-timeout-60s.quilt 0014-Bug-42389-Fix-crash-on-shutdown.quilt 0015-Bug-45341-restart-bind9-only-if-dns-backend-is-set.patch 0015-Bug-46072-Tolerate-restart-failure-in-postinst.patch Otherwise I would expect the crash to also happen with SAMBA-BIND or the PROXY-BIND.
Created attachment 9456 [details] Valgrind - nothing interesting
(In reply to Philipp Hahn from comment #4) > (In reply to Stefan Gohmann from comment #2) > > Maybe we should ensure in the 4.3 postup.sh that bind is started correctly? > > What do you mean by "correctly"? The crash is happening (only) on shutdown. At least on one test system, bind didn't response. We will try to reproduce it.
r18047 | Bug #46526 bind9: Fix memory leak r18047 | Bug #46526 bind9: Fix memory leak 2 Package: bind9 Version: 1:9.10.3.dfsg.P4-12.3+deb9u4A~4.3.0.201803091039 Branch: ucs_4.3-0 [4.3-0] 047ff6bafc Bug #46526: bind9 crash and memory leak
OK: dpkg-query -W bind9 # 1:9.10.3.dfsg.P4-12.3+deb9u4A~4.3.0.201803091039 OK: rm -f /var/cache/bind/core && rndc -p 55555 stop && pidof named && sleep 5 && pidof named && [ ! -e /var/cache/bind/core ]
Code review: OK Tests: OK Changelog: OK
UCS 4.3 has been released: https://docs.software-univention.de/release-notes-4.3-0-en.html https://docs.software-univention.de/release-notes-4.3-0-de.html If this error occurs again, please use "Clone This Bug".