Bug 40968 - UMC server hangs when changing network settings
UMC server hangs when changing network settings
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: UMC (Generic)
UCS 4.1
Other Linux
: P5 normal (vote)
: UCS 4.1-1-errata
Assigned To: Florian Best
Alexander Kläser
:
Depends on:
Blocks: 40006
  Show dependency treegraph
 
Reported: 2016-03-31 17:01 CEST by Alexander Kläser
Modified: 2016-08-03 15:56 CEST (History)
5 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Ticket number:
Bug group (optional): Troubleshooting, Usability
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Kläser univentionstaff 2016-03-31 17:01:41 CEST
In the setup wizard (tested with a new app [=branded] appliance), I observed that the UMC server seemed to have troubles when changing the IP settings:

* Initially, DHCP settings are activated, but failed
* I change the bridge settings to be connected, e.g., with the WLAN interface, 
  thus yielding a different IP address range via DHCP
* A first DHCP request fails, a second DHCP request succeeds; however, after
  applying the network settings, the progress bar hangs.

It seems that the UMC server does not accept any request no more. A ctrl+R within in the setup wizard will yield a hanging session as the auth request will not be answered by the UMC server.

The UMC server log says:
---------- 8< ----------
31.03.16 16:48:13.859  SSL         ( WARN    ) : SSL error in _receive: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]. Probably the socket was closed by the client.
---------- 8< ----------

ps shows where the process is hanging:
---------- 8< ----------
> 6935 ?        S      0:00      \_ /usr/bin/python2.7 /usr/lib/univention-system-setup/scripts/30_net/10interfaces --network-only --appliance-mode
> 7190 ?        S      0:00          \_ ifup --all
> 7348 ?        S      0:00              \_ /bin/sh -c run-parts  /etc/network/if-up.d
> 7349 ?        S      0:00                  \_ run-parts /etc/network/if-up.d
> 7425 ?        S      0:00                      \_ /bin/sh /etc/network/if-up.d/90_dns_update
> 7447 ?        S      0:00                          \_ /usr/bin/python2.7 /usr/sbin/univention-register-network-address
> 7448 ?        S      0:00                              \_ /usr/bin/python2.7 /usr/sbin/umc-command -U master$ -y /etc/machine.secret -s master.ucs.example ip/change 
>                                                                              -o ip=172.16.0.213 -o oldip=None -o netmask=255.255.255.0 -o role=domaincontroller_master
---------- 8< ----------

As a workaround, I need to reboot the VM.
Comment 1 Florian Best univentionstaff 2016-04-11 14:46:17 CEST
(In reply to Alexander Kläser from comment #0)
> It seems that the UMC server does not accept any request no more. A ctrl+R
> within in the setup wizard will yield a hanging session as the auth request
> will not be answered by the UMC server.
Yes. The system-setup-script first sets all interfaces, then sets the gateway (from '' to any IP) and then triggers another request (ip/change) in the if-up / if-down scripts during ucr-set.
The request ip/change was successfully executed and the response was sent from the UMC-Module to the UMC-Server. The UMC-Server then responded back to the UMC-Client but hangs there - maybe because there is no route to that client anymore. socket.accept() for new connection is not called anymore as it hangs now in the main-loop (The UMC-Server is an asynchronous select/poll based system which doesn't start threads per connection so everything is executed in the main loop).
I manually killed the umc-command process which does the ip/change request and the UMC-Server immediately did respond again to new requests.

It can be triggered with the following command:
umc-command -U Administrator -P univention setup/net/apply -f wizard -e -o {"values": {"interfaces": {"eth0": {"name": "eth0", "interfaceType": "Ethernet", "ip4": [], "ip6": [], "ip4dynamic": True}}, "interfaces/primary": "eth0", "gateway":"172.16.0.1"}} -d4

The UMC-Server process then shows the following stack trace:
Thread 2 (Thread 0x7f7e1d05b700 (LWP 8720)):
#0  0x00007f7e371726b3 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f7e34a07ef8 in ?? () from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
#2  0x00007f7e34a0920c in ldap_result () from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
#3  0x00007f7e34a0abbf in ldap_search_st () from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
#4  0x00007f7e1dba4311 in ?? () from /lib/x86_64-linux-gnu/libnss_ldap.so.2
#5  0x00007f7e1dba59a5 in ?? () from /lib/x86_64-linux-gnu/libnss_ldap.so.2
#6  0x00007f7e1dba6584 in ?? () from /lib/x86_64-linux-gnu/libnss_ldap.so.2
#7  0x00007f7e1dba7c17 in ?? () from /lib/x86_64-linux-gnu/libnss_ldap.so.2
#8  0x00007f7e1dba81a7 in _nss_ldap_getpwnam_r () from /lib/x86_64-linux-gnu/libnss_ldap.so.2
#9  0x00007f7e3714bf6d in getpwnam_r () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x00007f7e21ecd162 in pam_sm_authenticate () from /lib/security/pam_saml.so
#11 0x00007f7e317bf224 in ?? () from /lib/x86_64-linux-gnu/libpam.so.0
#12 0x00007f7e317bea5f in pam_authenticate () from /lib/x86_64-linux-gnu/libpam.so.0
#13 0x00007f7e319cb6a7 in ?? () from /usr/lib/python2.7/dist-packages/PAMmodule.so
#14 0x00000000004aaf56 in PyEval_EvalFrameEx ()
#15 0x00000000004b2718 in PyEval_EvalCodeEx ()
#16 0x0000000000470277 in ?? ()
#17 0x00000000004ac909 in PyEval_EvalFrameEx ()
#18 0x00000000004b2718 in PyEval_EvalCodeEx ()
#19 0x0000000000470277 in ?? ()
#20 0x00000000004ac909 in PyEval_EvalFrameEx ()
#21 0x00000000004b2718 in PyEval_EvalCodeEx ()
#22 0x000000000047014c in ?? ()
#23 0x000000000046fcc4 in ?? ()
#24 0x00000000004fe6aa in ?? ()
#25 0x00000000004ab5e0 in PyEval_EvalFrameEx ()
#26 0x00000000004b2718 in PyEval_EvalCodeEx ()
#27 0x000000000047014c in ?? ()
#28 0x000000000046fcc4 in ?? ()
#29 0x000000000046df09 in PyEval_CallObjectWithKeywords ()
#30 0x000000000051d028 in ?? ()
#31 0x00007f7e37cd3b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#32 0x00007f7e3717d30d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#33 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f7e380f8700 (LWP 3669)):
#0  0x00007f7e37cda490 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000005637b9 in ?? ()
#2  0x00000000004aaf56 in PyEval_EvalFrameEx ()
#3  0x00000000004b2718 in PyEval_EvalCodeEx ()
#4  0x00000000004ab508 in PyEval_EvalFrameEx ()
#5  0x00000000004ab780 in PyEval_EvalFrameEx ()
#6  0x00000000004ab780 in PyEval_EvalFrameEx ()
#7  0x00000000004ab780 in PyEval_EvalFrameEx ()
#8  0x00000000004ab780 in PyEval_EvalFrameEx ()
#9  0x00000000004b2718 in PyEval_EvalCodeEx ()
#10 0x0000000000514f93 in ?? ()
#11 0x000000000044061e in PyRun_FileExFlags ()
#12 0x000000000043c1b7 in PyRun_SimpleFileExFlags ()
#13 0x000000000043f234 in Py_Main ()
#14 0x00007f7e370bdead in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x00000000004c7151 in _start ()
Comment 2 Florian Best univentionstaff 2016-04-11 15:13:44 CEST
More detailed stack of thread 1:
#0  0x00007f7e37cda490 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000004c888b in PyThread_acquire_lock (waitflag=1, lock=0x1e37320) at ../Python/thread_pthread.h:321
#2  PyEval_RestoreThread (tstate=0x1abd0a0) at ../Python/ceval.c:357
#3  PyGILState_Ensure () at ../Python/pystate.c:608
#4  0x00000000004c7128 in ?? ()
#5  0x0000000000000020 in ?? ()
#6  0x00000000004807a9 in float_mul.14119 (v=<unknown at remote 0x2c41240>, w=1000) at ../Objects/floatobject.c:656
#7  0x0000000000000064 in ?? ()
#8  0x0000000000000004 in ?? ()
#9  0x00007f7e37cda47e in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00000000005637b9 in PyThread_acquire_lock (waitflag=1, lock=0x7f7e35bc05a8) at ../Python/thread_pthread.h:321
#11 PyEval_RestoreThread (tstate=0x1abd0a0) at ../Python/ceval.c:357
#12 poll_poll (self=0x7f7e371726cb, args=<optimized out>) at ../Modules/selectmodule.c:523
#13 0x00000000004aaf56 in call_function (oparg=<optimized out>, pp_stack=0x7ffc55b624d0) at ../Python/ceval.c:4021
#14 PyEval_EvalFrameEx (f=f@entry=
    Frame 0x1d270d0, for file /usr/lib/pymodules/python2.7/notifier/nf_generic.py, line 219, in step (sleep=True, external=True, timeout=100, now=1460385457426, interval=1000, timestamp=1460385458354, callback=<Callback(_args=(<ModuleProcess(_Client__socket=None, _Client__
authenticated=True, _connect_retries=6, _Client__unfinishedRequests={'146038545185241-0': <Message(body={'flavor': 'wizard', 'options': {'values': {'interfaces/primary': 'eth0', 'interfaces': {'eth0': {'interfaceType': 'Ethernet', 'ip6': [], 'name': 'eth0', 'ip4dynamic': T
rue, 'ip4': []}}, 'gateway': '172.16.0.1'}}}, mimetype='application/json', _type=1, command='COMMAND', arguments=['setup/net/apply'], _length=213, _id='146038545185241-0', ptions={...}) at remote 0x2a773d0>}, _Client__unix='/var/run/univention-management-console/3669-14603
85452049.socket', _Client__resend_queue={}, _Client__ssl=False, _Provider__signals={'authenticated': <Signal(_Signal__callbacks=[], name='authenticated') at remote 0x2aa8dd0>, 'finished': <Signal(_Signal__call...(truncated), throwflag=throwflag@entry=0)
    at ../Python/ceval.c:2666
Comment 3 Florian Best univentionstaff 2016-04-11 15:16:21 CEST
It hangs in the poll() call in /usr/lib/pymodules/python2.7/notifier/nf_generic.py (with timeout==100):
                        try:
                                fds = __poll.poll( timeout )
                        except select.error, e:
                                log.error( 'error: poll system call interrupted: %s' % str( e ) )
                                if not _options[ 'catch_select_errors' ]:
                                        raise
Comment 4 Alexander Kläser univentionstaff 2016-04-13 16:57:18 CEST
We traced down the problem to libnss. In fact, libnss (itself, not via nscd) will connect to the LDAP server upon the first login to UMC (as user __systemsetup__). The LDAP connection will be cached and will not be updated when the network settings are changed. This leads to an endless wait when trying to execute an LDAP search which is carried out without any timeout. Below the backtrace. After some research, it seems that this timeout can be specified via:

> echo 'timelimit 10' >>/etc/libnss-ldap.conf

A wrapping function will automatically establish a new connection when receiving the timeout.

This resolved the problem for me in a test setup:
* Have a joined UCS system with a static IP address
* Configure via UMC network settings eth0 to be DHCP, leave gateway empty, save
The system can be reset in a working stage by setting via the CLI in the VM by setting manually, e.g.:
> ucr set gateway=10.200.26.1 interfaces/eth0/{type=static,address=10.200.26.110,netmask=255.255.255.0}

---------- 8< ----------
#0  0x00007f1297b0c6b3 in *__GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f129543a13d in ldap_int_select (ld=ld@entry=0x232dce0, timeout=timeout@entry=0x0) at os-ip.c:1133
#2  0x00007f1295424ef8 in wait4msg (ld=ld@entry=0x232dce0, msgid=msgid@entry=5, all=all@entry=1, timeout=<optimized out>, timeout@entry=0x0, 
    result=result@entry=0x7f127f1de690) at result.c:312
#3  0x00007f129542620c in ldap_result (ld=ld@entry=0x232dce0, msgid=msgid@entry=5, all=all@entry=1, timeout=timeout@entry=0x0, 
    result=result@entry=0x7f127f1de690) at result.c:117
#4  0x00007f1295427bbf in ldap_search_st (ld=0x232dce0, base=<optimized out>, scope=<optimized out>, filter=<optimized out>, 
    attrs=<optimized out>, attrsonly=<optimized out>, timeout=0x0, res=0x7f127f1de690) at search.c:409
#5  0x00007f127e7c4311 in ?? () from /lib/x86_64-linux-gnu/libnss_ldap.so.2
#6  0x00007f127e7c59a5 in ?? () from /lib/x86_64-linux-gnu/libnss_ldap.so.2
#7  0x00007f127e7c6584 in ?? () from /lib/x86_64-linux-gnu/libnss_ldap.so.2
#8  0x00007f127e7c7c17 in ?? () from /lib/x86_64-linux-gnu/libnss_ldap.so.2
#9  0x00007f127e7c81a7 in _nss_ldap_getpwnam_r () from /lib/x86_64-linux-gnu/libnss_ldap.so.2
#10 0x00007f1297ae5f6d in __getpwnam_r (name=0x232afd0 "master40$", resbuf=resbuf@entry=0x7f127f1de840, buffer=buffer@entry=
    0x7f127f1de870 "kdm", buflen=buflen@entry=1024, result=<optimized out>, result@entry=0x7f127f1de7d8) at ../nss/getXXbyYY_r.c:259
#11 0x00007f12828eb162 in pam_sm_authenticate (pamh=0x2297970, flags=<optimized out>, ac=4, av=0x2390970) at pam_saml.c:389
#12 0x00007f12921dc224 in ?? () from /lib/x86_64-linux-gnu/libpam.so.0
#13 0x00007f12921dba5f in pam_authenticate () from /lib/x86_64-linux-gnu/libpam.so.0
#14 0x00007f12923e86a7 in ?? () from /usr/lib/python2.7/dist-packages/PAMmodule.so
#15 0x00000000004aaf56 in call_function (oparg=<optimized out>, pp_stack=0x7f127f1deef0) at ../Python/ceval.c:4021
#16 PyEval_EvalFrameEx (f=f@entry=
    Frame 0x229b810, for file /usr/lib/pymodules/python2.7/univention/management/console/pam.py, line 131, in authenticate (self=<PamAuth(pam=<pam at remote 0x23c2870>, _=<instancemethod at remote 0x22cd3c0>) at remote 0x2236850>, username='master40$', password='MSpc6IUqmAHDATX1n6xI', answers={1: 'MSpc6IUqmAHDATX1n6xI', 2: 'master40$', 3: '', 4: '', 'options': {'ip': '169.254.95.220', 'netmask': '255.255.0.0', 'role': 'domaincontroller_master', 'oldip': 'None'}}, missing=[]), throwflag=throwflag@entry=0) at ../Python/ceval.c:2666
#17 0x00000000004b2718 in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, 
    argcount=<optimized out>, kws=kws@entry=0x23d4fe0, kwcount=1, defs=defs@entry=0x0, defcount=defcount@entry=0, closure=0x0)
    at ../Python/ceval.c:3253
#18 0x0000000000470277 in function_call.15262 (func=<function at remote 0x1cc27d0>, arg=<optimized out>, kw=<optimized out>)
    at ../Objects/funcobject.c:526
#19 0x00000000004ac909 in PyObject_Call (kw=
    {'options': {'ip': '169.254.95.220', 'netmask': '255.255.0.0', 'role': 'domaincontroller_master', 'oldip': 'None'}}, arg=
    (<PamAuth(pam=<pam at remote 0x23c2870>, _=<instancemethod at remote 0x22cd3c0>) at remote 0x2236850>, 'master40$', 'MSpc6IUqmAHDATX1n6xI'), func=<function at remote 0x1cc27d0>) at ../Objects/abstract.c:2529
#20 ext_do_call (nk=24093552, na=3, flags=<optimized out>, pp_stack=0x7f127f1df200, func=<function at remote 0x1cc27d0>)
    at ../Python/ceval.c:4334
#21 PyEval_EvalFrameEx (f=f@entry=
    Frame 0x2375640, for file /usr/lib/pymodules/python2.7/univention/management/console/auth.py, line 107, in __authenticate_thread (self=<AuthHandler(_Provider__signals={'authenticated': <Signal(_Signal__callbacks=[<instancemethod at remote 0x22cd410>], name='authenticated') at remote 0x23dfd90>}, pam=<PamAuth(pam=<pam at remote 0x23c2870>, _=<instancemethod at remote 0x22cd3c0>) at remote 0x2236850>) at remote 0x23dfd50>, username='master40$', password='MSpc6IUqmAHDATX1n6xI', new_password=None, custom_prompts={'options': {'ip': '169.254.95.220', 'netmask': '255.255.0.0', 'role': 'domaincontroller_master', 'oldip': 'None'}}), throwflag=throwflag@entry=0) at ../Python/ceval.c:2705
---------- 8< ----------
Comment 5 Florian Best univentionstaff 2016-04-14 08:36:05 CEST
univention-pam.yaml:
r68627 | YAML Bug #40968

univention-pam (9.0.0-6):
r68626 | Bug #40968: Make timelimit for ldap searches in nss-ldap configurable via the UCR variable nssldap/timelimit
Comment 6 Florian Best univentionstaff 2016-04-14 08:37:23 CEST
The UCR variable nssldap/timelimit has been added with a default value of 30 seconds.
Comment 7 Alexander Kläser univentionstaff 2016-04-15 11:58:43 CEST
(In reply to Florian Best from comment #6)
> The UCR variable nssldap/timelimit has been added with a default value of 30
> seconds.

IMHO 30 sec is quite long. I would rather go for 10 sec.
Comment 8 Stefan Gohmann univentionstaff 2016-04-18 08:44:53 CEST
(In reply to Alexander Kläser from comment #7)
> IMHO 30 sec is quite long. I would rather go for 10 sec.

If we go down to 10 seconds, please apply it only for new installed systems.
Comment 9 Florian Best univentionstaff 2016-04-18 16:09:34 CEST
(In reply to Stefan Gohmann from comment #8)
> (In reply to Alexander Kläser from comment #7)
> > IMHO 30 sec is quite long. I would rather go for 10 sec.
> 
> If we go down to 10 seconds, please apply it only for new installed systems.

Only setting it to 10 seconds temporarily during system-setup-boot:

univention-system-setup (9.1.4-9):
r68734 | Bug #40968: adjust nssldap/timelimit during system setup

Package: univention-system-setup
Version: 9.1.4-9.959.201604181607
Branch: ucs_4.1-0
Scope: appliance
Comment 10 Alexander Kläser univentionstaff 2016-04-20 17:56:18 CEST
Changes: OK, the hang is fixed, timelimit is set to 30 after setup process.
YAML file: OK

→ VERIFIED
Comment 11 Janek Walkenhorst univentionstaff 2016-04-27 17:32:10 CEST
<http://errata.software-univention.de/ucs/4.1/159.html>
Comment 12 Janek Walkenhorst univentionstaff 2016-08-03 15:56:45 CEST
<http://errata.software-univention.de/ucs/4.1/234.html>