Bug 39832 - /etc/network/if-up(down).d/bind9 -> univention-bind reload why?
/etc/network/if-up(down).d/bind9 -> univention-bind reload why?
Status: RESOLVED FIXED
Product: UCS
Classification: Unclassified
Component: DNS
UCS 4.1
Other Linux
: P5 normal (vote)
: UCS 4.1-x
Assigned To: Philipp Hahn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-09 17:30 CET by Felix Botner
Modified: 2015-12-29 11:06 CET (History)
1 user (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Botner univentionstaff 2015-11-09 17:30:43 CET
The if up/down scripts for bind9 calling

univention-bind reload 

which is basically "rndc -p 55555 reload".


rndc reload reloads the configuration and zones.

Is this really what we need if interface are removed/added?
(i think we should restart bind9)
Comment 1 Philipp Hahn univentionstaff 2015-11-10 08:29:41 CET
(In reply to Felix Botner from comment #0)
> The if up/down scripts for bind9 calling
> 
> univention-bind reload 
> 
> which is basically "rndc -p 55555 reload".
> 
> 
> rndc reload reloads the configuration and zones.
> 
> Is this really what we need if interface are removed/added?

Yes: if network interfaces are added (ethX, vlan, bridge, bonding), BIND9 need to be told about that to start listening on those interfaces.

> (i think we should restart bind9)

No: Before Bug #36532 is was even slower when BIND9 was even restarted instead of being told to reload its configuration and look for changed interfaces.
Comment 2 Felix Botner univentionstaff 2015-11-10 09:18:35 CET
(In reply to Philipp Hahn from comment #1)
> (In reply to Felix Botner from comment #0)
> > The if up/down scripts for bind9 calling
> > 
> > univention-bind reload 
> > 
> > which is basically "rndc -p 55555 reload".
> > 
> > 
> > rndc reload reloads the configuration and zones.
> > 
> > Is this really what we need if interface are removed/added?
> 
> Yes: if network interfaces are added (ethX, vlan, bridge, bonding), BIND9
> need to be told about that to start listening on those interfaces.

OK, but i have a system which hangs during "ifdown -a" because /etc/network/if-down.d/bind9 -> "rndc -p 55555 reload" is not coming back.
This system has been installed with UCS 2.x and was updated up to 4.1-0, so there are maybe some outdated config files/setting ... (10.200.7.200)

restart in  if-down/up.d/bind9 works.

> 
> > (i think we should restart bind9)
> 
> No: Before Bug #36532 is was even slower when BIND9 was even restarted
> instead of being told to reload its configuration and look for changed
> interfaces.
Comment 3 Philipp Hahn univentionstaff 2015-11-18 10:46:29 CET
Another hang:

 7636 tty1     Ss     0:00 /bin/login --     
 7669 tty1     S      0:00  \_ -bash
16065 tty1     S+     0:00      \_ /usr/bin/python2.7 /usr/sbin/univention-upgrade
18978 tty1     S+     0:00          \_ /usr/bin/python2.7 /usr/share/univention-updater/univention-updater net --updateto 4.1-0 --silent
19126 tty1     S+     0:00              \_ sh -c touch /var/run/apt-get.lock && DEBIAN_FRONTEND=noninteractive apt-get -o DPkg::Options::=--force-confold -o DPkg::Options::=--force-overwrite -o DPkg::Options::=--force-overwrite-dir --tri
19128 tty1     S+     0:02                  \_ apt-get -o DPkg::Options::=--force-confold -o DPkg::Options::=--force-overwrite -o DPkg::Options::=--force-overwrite-dir --trivial-only=no --assume-yes --quiet=1 -u dist-upgrade
27574 pts/0    Ss+    0:00                      \_ /usr/bin/dpkg --force-confold --force-overwrite --force-overwrite-dir --status-fd 39 --configure libunivention-debug1:amd64 python-univention-debug:amd64 python-univention:all python-uni
28285 pts/0    S+     0:00                          \_ /bin/sh /var/lib/dpkg/info/univention-base-files.postinst configure 4.0.8-11.202.201509161525
28304 pts/0    S+     0:00                              \_ /usr/bin/python2.7 /usr/sbin/univention-config-registry register univention-base-files
28341 pts/0    S      0:00                                  \_ ifdown -a
28342 pts/0    S      0:00                                      \_ /bin/sh -c run-parts  /etc/network/if-down.d
28343 pts/0    S      0:00                                          \_ run-parts /etc/network/if-down.d
28345 pts/0    S      0:00                                              \_ /bin/sh -e /etc/network/if-down.d/bind9
28346 pts/0    S      0:00                                                  \_ /bin/sh /usr/sbin/invoke-rc.d univention-bind reload
28364 pts/0    S      0:00                                                      \_ /bin/sh /etc/init.d/univention-bind reload
28369 pts/0    Sl     0:00                                                          \_ rndc -p 55555 reload


# gdb -p 28369 --batch  -ex "set pagination off" -ex "thread apply all bt"
...
0x00007f1c4c59a514 in do_sigsuspend (set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:63
63      ../sysdeps/unix/sysv/linux/sigsuspend.c: Datei oder Verzeichnis nicht gefunden.

Thread 3 (Thread 0x7f1c4ad52700 (LWP 28370)):
#0  0x00007f1c4c643fb3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f1c4d2b5636 in ?? () from /usr/lib/libisc.so.84
#2  0x00007f1c4cc57b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#3  0x00007f1c4c64395d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f1c4a551700 (LWP 28371)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f1c4d2a3cd1 in ?? () from /usr/lib/libisc.so.84
#2  0x00007f1c4cc57b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#3  0x00007f1c4c64395d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f1c4e737720 (LWP 28369)):
#0  0x00007f1c4c59a514 in do_sigsuspend (set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:63
#1  *__GI___sigsuspend (set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:78
#2  0x00007f1c4d2a8750 in isc__app_ctxrun () from /usr/lib/libisc.so.84
#3  0x000055c115fee5ea in main ()



# lsof -i :55555
COMMAND   PID USER   FD   TYPE     DEVICE SIZE/OFF NODE NAME
named    6218 bind   30u  IPv4      19852      0t0  TCP localhost:55555 (LISTEN)
rndc    28369 root   20u  IPv4 1031668698      0t0  TCP localhost:40886->localhost:55555 (ESTABLISHED)



# gdb -p 6218 --batch  -ex "set pagination off" -ex "thread apply all bt"
...
0x00007f0d21a2bee5 in pthread_join (threadid=139694327658240, thread_return=0x0) at pthread_join.c:89
89      pthread_join.c: Datei oder Verzeichnis nicht gefunden.
No symbol "paginate" in current context.

Thread 11 (Thread 0x7f0d1ec6f700 (LWP 6270)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f0d226eccd1 in ?? () from /usr/lib/libisc.so.84
#2  0x00007f0d21a2ab50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#3  0x00007f0d2141695d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 10 (Thread 0x7f0d1e46e700 (LWP 6273)):
#0  0x00007f0d2141787d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f0d21437fcc in make_request (fd=11, pid=6218, seen_ipv4=<optimized out>, seen_ipv6=<optimized out>, in6ai=<optimized out>, in6ailen=<optimized out>) at ../sysdeps/unix/sysv/linux/check_pf.c:119
#2  0x00007f0d2143844a in __check_pf (seen_ipv4=0x7f0d1e46cbef, seen_ipv6=0x7f0d1e46cbee, in6ai=0x7f0d1e46cbd0, in6ailen=0x7f0d1e46cbc8) at ../sysdeps/unix/sysv/linux/check_pf.c:271
#3  0x00007f0d213f5007 in *__GI_getaddrinfo (name=0x7f0d14000990 "127.0.0.1", service=0x7f0d1e46cc70 "7389", hints=0x7f0d1e46ccb0, pai=0x7f0d1e46cc78) at ../sysdeps/posix/getaddrinfo.c:2389
#4  0x00007f0d22495525 in ldap_connect_to_host () from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
#5  0x00007f0d2247ffce in ldap_int_open_connection () from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
#6  0x00007f0d22492ce5 in ldap_new_connection () from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
#7  0x00007f0d2247f65f in ldap_create () from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
#8  0x00007f0d224940f0 in ?? () from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
#9  0x00007f0d23d1a796 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f0d1dc6d700 (LWP 6275)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f0d226ecabb in isc__task_beginexclusive () from /usr/lib/libisc.so.84
#2  0x00005603a0e63a07 in ?? ()
#3  0x00007f0d226ece1d in ?? () from /usr/lib/libisc.so.84
#4  0x00007f0d21a2ab50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#5  0x00007f0d2141695d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7f0d1d46c700 (LWP 6278)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f0d226eccd1 in ?? () from /usr/lib/libisc.so.84
#2  0x00007f0d21a2ab50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#3  0x00007f0d2141695d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f0d1cc6b700 (LWP 6280)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f0d226eccd1 in ?? () from /usr/lib/libisc.so.84
#2  0x00007f0d21a2ab50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#3  0x00007f0d2141695d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f0d1c46a700 (LWP 6282)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f0d226eccd1 in ?? () from /usr/lib/libisc.so.84
#2  0x00007f0d21a2ab50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#3  0x00007f0d2141695d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f0d1bc69700 (LWP 6284)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f0d226eccd1 in ?? () from /usr/lib/libisc.so.84
#2  0x00007f0d21a2ab50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#3  0x00007f0d2141695d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f0d1b468700 (LWP 6285)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f0d226eccd1 in ?? () from /usr/lib/libisc.so.84
#2  0x00007f0d21a2ab50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#3  0x00007f0d2141695d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f0d1ac67700 (LWP 6286)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
#1  0x00007f0d227030ae in isc_condition_waituntil () from /usr/lib/libisc.so.84
#2  0x00007f0d226efe14 in ?? () from /usr/lib/libisc.so.84
#3  0x00007f0d21a2ab50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#4  0x00007f0d2141695d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f0d1a466700 (LWP 6288)):
#0  0x00007f0d21416fb3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f0d226fe636 in ?? () from /usr/lib/libisc.so.84
#2  0x00007f0d21a2ab50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#3  0x00007f0d2141695d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f0d23d28720 (LWP 6218)):
#0  0x00007f0d21a2bee5 in pthread_join (threadid=139694327658240, thread_return=0x0) at pthread_join.c:89
#1  0x00007f0d226ee0a4 in isc__taskmgr_destroy () from /usr/lib/libisc.so.84
#2  0x00005603a0e426a1 in main ()




Nov 17 14:09:28 xen12 logger: /etc/init.d/slapd stop (pid: 20816, ppid:20799 invoke-rc.d)
Nov 17 14:09:28 xen12 slapd[6166]: daemon: shutdown requested and initiated.
Nov 17 14:09:28 xen12 slapd[6166]: slapd shutdown: waiting for 0 operations/tasks to finish
Nov 17 14:09:28 xen12 slapd[6166]: OVER: db_close
Nov 17 14:09:28 xen12 slapd[6166]: OVER: db_destroy
Nov 17 14:09:28 xen12 slapd[6166]: slapd stopped.


# dpkg -l slapd univention-base-files
...
iU  slapd                                                 2.4.42+dfsg-2.210.201511060842  amd64                           OpenLDAP server (slapd)
iF  univention-base-files                                 5.0.1-1.207.201511130613        all                             UCS - base configuration files



# less /var/log/apt/term.log
...
Vorbereitung zum Ersetzen von slapd 2.4.40-1.202.201509141730 (durch .../slapd_2.4.42+dfsg-2.210.201511060842_amd64.deb) ...
[....] Stopping ldap server(s): slapd ...[1G[ ok [?25hdone.
Ersatz für slapd wird entpackt ...
...
univention-base-files (5.0.1-1.207.201511130613) wird eingerichtet ...
...
Multifile: /etc/network/interfaces
Reloading bind9 daemon: ...ldap...



1. During the upgrade OpenLDAP is upgraded - slapd gets stopped during that
2. Also univention-base-files is upgraded, which triggers the BIND reload
3. BIND tried to contact the LDAP daemon and hangs there.

Recommendations / ideas:
- I think we should switch to "interfaces/restart/auto=no" by default or at least during updates, as this currently is the only exception where we restart services automatically after changing some UCR variables.
- Check BIND if a timeout can be specified
- Add a "timeout 10 ..." before the "rndc reload"


FYI: I've seen similar hangs of processes accessing a stopped / not-responding slapd in the past weeks. I even had the situation, where slapd tried to reverse-lookup the connection IP address, which failed because of BIND not responding, because of LDAP not responding ...
Comment 4 Philipp Hahn univentionstaff 2015-12-04 13:13:40 CET
(In reply to Philipp Hahn from comment #3)
> Another hang:
...
> # gdb -p 6218 --batch  -ex "set pagination off" -ex "thread apply all bt"
> ...
> Thread 10 (Thread 0x7f0d1e46e700 (LWP 6273)):
> #0  0x00007f0d2141787d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #1  0x00007f0d21437fcc in make_request (fd=11, pid=6218,
> seen_ipv4=<optimized out>, seen_ipv6=<optimized out>, in6ai=<optimized out>,
> in6ailen=<optimized out>) at ../sysdeps/unix/sysv/linux/check_pf.c:119
> #2  0x00007f0d2143844a in __check_pf (seen_ipv4=0x7f0d1e46cbef,
> seen_ipv6=0x7f0d1e46cbee, in6ai=0x7f0d1e46cbd0, in6ailen=0x7f0d1e46cbc8) at
> ../sysdeps/unix/sysv/linux/check_pf.c:271
> #3  0x00007f0d213f5007 in *__GI_getaddrinfo (name=0x7f0d14000990
> "127.0.0.1", service=0x7f0d1e46cc70 "7389", hints=0x7f0d1e46ccb0,
> pai=0x7f0d1e46cc78) at ../sysdeps/posix/getaddrinfo.c:2389
> #4  0x00007f0d22495525 in ldap_connect_to_host () from
> /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
> #5  0x00007f0d2247ffce in ldap_int_open_connection () from
> /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
> #6  0x00007f0d22492ce5 in ldap_new_connection () from
> /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
> #7  0x00007f0d2247f65f in ldap_create () from
> /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
> #8  0x00007f0d224940f0 in ?? () from
> /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
> #9  0x00007f0d23d1a796 in ?? ()
> #10 0x0000000000000000 in ?? ()

That is Bug #40059

As the underlying problem is fixed, is this report still relevant / are there any issued left?