Bug 50598 - Stuck univention-cli-server process after update from UCS 4.4-2 errata350 to errata385
Stuck univention-cli-server process after update from UCS 4.4-2 errata350 to ...
Status: NEW
Product: UCS
Classification: Unclassified
Component: UDM - CLI
UCS 4.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: UMC maintainers
UMC maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-12-04 18:59 CET by Arvid Requate
Modified: 2020-01-08 12:17 CET (History)
3 users (show)

See Also:
What kind of report is it?: Development Internal
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
lsof.log (24.60 KB, text/x-log)
2019-12-04 22:13 CET, Arvid Requate
Details
/proc/$PID/status (1.18 KB, text/x-log)
2019-12-04 22:30 CET, Arvid Requate
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2019-12-04 18:59:11 CET
After updating from UCS 4.4-2 errata350 to errata385 there is a stuck univention-cli-server process, which refuses to terminate by normal means:

root@ucs-2009:~# ps ax | grep cli
19725 ?        Ss     0:00 /usr/bin/python2.7 /usr/share/univention-directory-manager-tools/univention-cli-server
19728 ?        Z      0:00 [univention-cli-] <defunct>
25456 pts/1    S+     0:00 grep cli

root@ucs-2009:~# kill 19725
root@ucs-2009:~# ps ax | grep cli
19725 ?        Ss     0:00 /usr/bin/python2.7 /usr/share/univention-directory-manager-tools/univention-cli-server
19728 ?        Z      0:00 [univention-cli-] <defunct>
25459 pts/1    S+     0:00 grep cli
root@ucs-2009:~# 

This causes issues e.g. for joinscripts that install new udm modules and cannot use them afterwards.

Note: While writing this Bug report the process terminated itself apparently.

Jürn just told me that he recently observed a similar situation, that's why I think we should check this.
Comment 1 Jürn Brodersen univentionstaff 2019-12-04 19:13:53 CET
Some previous discussion can be found on bug 50403.
I closed that one because I thought it was an isolated incident, seems I was wrong.
Comment 2 Florian Best univentionstaff 2019-12-04 19:16:55 CET
#0  0x00007f77c9bf65e3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1  0x000055ee6804e119 in select_select.lto_priv () at ../Modules/selectmodule.c:267
#2  0x000055ee67f7a84a in call_function (oparg=<optimized out>, pp_stack=0x7ffd81072148) at ../Python/ceval.c:4352
#3  PyEval_EvalFrameEx () at ../Python/ceval.c:2989
#4  0x000055ee67f789f5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3584
#5  0x000055ee67f80b78 in fast_function (nk=0, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffd81072358, func=<function at remote 0x7f77c9099c80>) at ../Python/ceval.c:4447
#6  call_function (oparg=<optimized out>, pp_stack=0x7ffd81072358) at ../Python/ceval.c:4372
#7  PyEval_EvalFrameEx () at ../Python/ceval.c:2989
#8  0x000055ee67f789f5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3584
#9  0x000055ee67f80b78 in fast_function (nk=0, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffd81072568, func=<function at remote 0x7f77c9099ed8>) at ../Python/ceval.c:4447
#10 call_function (oparg=<optimized out>, pp_stack=0x7ffd81072568) at ../Python/ceval.c:4372
#11 PyEval_EvalFrameEx () at ../Python/ceval.c:2989
#12 0x000055ee67f789f5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3584
#13 0x000055ee67f787b9 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:669
#14 0x000055ee67fa8bff in run_mod.lto_priv () at ../Python/pythonrun.c:1376
#15 0x000055ee67fa3b52 in PyRun_FileExFlags () at ../Python/pythonrun.c:1362
#16 0x000055ee67fa369e in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:948
#17 0x000055ee67f54771 in Py_Main () at ../Modules/main.c:640
#18 0x00007f77c9b352e1 in __libc_start_main (main=0x55ee67f540a0 <main>, argc=2, argv=0x7ffd810729a8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd81072998) at ../csu/libc-start.c:291
#19 0x000055ee67f53f9a in _start ()
Comment 3 Florian Best univentionstaff 2019-12-04 19:24:19 CET
Broken socket?
lsof -p $PID output would be nice.
I will investigate deeper somewhen.

We should fix Bug #34836 and Bug #34812.
Comment 4 Arvid Requate univentionstaff 2019-12-04 22:04:31 CET
VM: arequate_o365multi.8.100 (spoka)
Snapshot: office365-app-connection-e385-udm-cli

=======================================================================
root@ucs-2009:~# ps ax | grep cli
19777 ?        Ss     0:00 /usr/bin/python2.7 /usr/share/univention-directory-manager-tools/univention-cli-server
19780 ?        Z      0:00 [univention-cli-] <defunct>
=======================================================================


In this situation the update of the office365 App fails:
=======================================================================
root@ucs-2009:~# univention-app update
root@ucs-2009:~# univention-app upgrade office365
[...]
Die folgenden Pakete werden aktualisiert (Upgrade):
  univention-management-console-module-office365 univention-office365
[...]
Waiting for file /usr/share/univention-office365/syntax.d/office365.py: OK
Object exists: cn=udm_module,cn=univention,dc=o365multi,dc=net
Object created: cn=office365/profile,cn=udm_module,cn=univention,dc=o365multi,dc=net

Waiting for activation of the extension object office365/profile: OK
Waiting for file /usr/share/univention-office365/handler.d/profile.py: OK
Terminating running univention-cli-server processes.
Object exists: cn=udm_module,cn=univention,dc=o365multi,dc=net
Object created: cn=office365/ad-connection,cn=udm_module,cn=univention,dc=o365multi,dc=net

Waiting for activation of the extension object office365/ad-connection: OK
Waiting for file /usr/share/univention-office365/handler.d/ad-connection.py: OK
Terminating running univention-cli-server processes.
Object exists: cn=office365,dc=o365multi,dc=net
Object exists: cn=profiles,cn=office365,dc=o365multi,dc=net
Object created: cn=ad-connections,cn=office365,dc=o365multi,dc=net
unknown module office365/profile.

Available Modules are:
  appcenter/app
  [...]

root@ucs-2009:~# univention-check-join-status 
Warning: 'univention-office365' is not configured.
Error: Not all install files configured: 1 missing
=======================================================================
Comment 5 Arvid Requate univentionstaff 2019-12-04 22:13:14 CET
Created attachment 10251 [details]
lsof.log
Comment 6 Arvid Requate univentionstaff 2019-12-04 22:30:43 CET
Created attachment 10252 [details]
/proc/$PID/status

SigBlk: 0000000000015027

>>> bin(0x5027)
'0b101000000100111'
   ^ SIGTERM (bit 15) blocked.
Comment 7 Florian Best univentionstaff 2019-12-04 22:56:12 CET
lsof shows the following:
univentio 19777 root    5u  IPv4             198036      0t0     TCP ucs-2009.o365multi.net:59286->ucs-2009.o365multi.net:6669 (ESTABLISHED)
univentio 19777 root    6u   REG              253,0   163360 5758749 /var/log/univention/directory-manager-cmd.log
univentio 19777 root    7u   REG              253,0  1372160 5770779 /var/lib/univention-directory-listener/cache/data.mdb
univentio 19777 root    8u  unix 0xffff92b63bd43800      0t0  198340 /tmp/admincli_0/sock type=STREAM
univentio 19777 root   10w   REG              253,0        0 5770781 /var/log/univention/listener_modules/portal_groups.log
univentio 19777 root   11w   REG              253,0      644 5756199 /var/log/univention/listener_modules/app_attributes.log.2019-11-04
univentio 19777 root   12r   CHR                1,9      0t0    6508 /dev/urandom

Bug #50403 had the same problem, as well when installing the office356 app.

There we see, that the signal handling ignores the kill:
(In reply to Jürn Brodersen from comment #3)
> Turns out this was not the problem and is not really a problem generally.
> 
> ucs_registerLDAPExtension --udm_syntax does wait for the activation
> attribute to be set to true, which is set only after the listener wrote out
> the syntax.
> 
> The problem was that the univention-cli-server was blocking signals and was
> therefore not restarted!
> 
> '''
> root@qa442:~# ps -eo blocked,pid,cmd | egrep -v '^0+ '
>          BLOCKED   PID CMD
> 7be3c0fe28014a03     1 /sbin/init splash
> 0000000400004a02   238 /lib/systemd/systemd-journald
> 0000000000014003   282 /lib/systemd/systemd-udevd
> 0000000200000001   402 /lib/systemd/systemd-logind
> 0000000000010000   451 runsvdir -P /etc/service log: 
> fffffffe3bfa2800   700 /usr/bin/containerd
> fffffffe3bfa2800   701 /usr/bin/dockerd -H fd://
> --containerd=/run/containerd/containerd.sock
> 0000000000015027 17363 [univention-cli-] <defunct>
> 0000000000010000 17495 -bash
> 0000000000010000 25224 /bin/bash /usr/bin/univention-welcome-screen
> 0000000000015027 25574 /usr/bin/python2.7
> /usr/share/univention-directory-manager-tools/univention-cli-server
> '''
> 
> Keyword: SIGPROCMASK
> 
> I have no clue how that happened and I can't reproduce this. (I do have a
> snapshot of this though)

(In reply to Sönke Schwardt-Krummrich from comment #5)
> (In reply to Jürn Brodersen from comment #4)
> > The univention-cli-server just ignored that signal completely :( see comment
> > 3.
> 
> AFAIR signals are ignored by the listener, if the listener is currently
> initializing/resync a listener module.

We also know that due to Bug #34812 file descriptors are inherited instead of being closed.
The process has at least some listener logfiles inherited.

→ Can a process also inherit some signal handling settings?
Comment 8 Arvid Requate univentionstaff 2019-12-04 23:12:35 CET
> → Can a process also inherit some signal handling settings?

man 7 signal (http://man7.org/linux/man-pages/man7/signal.7.html) says:

       A child created via fork(2) inherits a copy of its parent's signal
       dispositions.  During an execve(2), the dispositions of handled
       signals are reset to the default; the dispositions of ignored signals
       are left unchanged.
Comment 9 Florian Best univentionstaff 2019-12-05 11:07:07 CET
(In reply to Arvid Requate from comment #8)
> > → Can a process also inherit some signal handling settings?
> 
> man 7 signal (http://man7.org/linux/man-pages/man7/signal.7.html) says:
> 
>        A child created via fork(2) inherits a copy of its parent's signal
>        dispositions.  During an execve(2), the dispositions of handled
>        signals are reset to the default; the dispositions of ignored signals
>        are left unchanged.

The univention-cli-client makes a fork() to create the univention-cli-server process.
But the listener probably only calls UDM-CLI without fork().
Comment 10 Florian Best univentionstaff 2019-12-12 09:43:10 CET
From code review I found a bug/crash, which might occur very seldom:

diff --git management/univention-directory-manager-modules/univention-cli-server management/univention-directory-manager-modules/univention-cli-server
index 8c036e807d..ede4655e19 100755
--- management/univention-directory-manager-modules/univention-cli-server
+++ management/univention-directory-manager-modules/univention-cli-server
@@ -60,7 +60,7 @@ class MyRequestHandler(SocketServer.BaseRequestHandler):
                sarglist = ''
                while True:
                        buf = self.request.recv(1024)
-                       if buf[-1] == '\0':
+                       if buf.endswith('\0'):
                                buf = buf[:-1]
                                sarglist += buf
                                break

Not sure, if it's the reason.
Comment 11 Ingo Steuwer univentionstaff 2020-01-08 12:17:56 CET
Is this bug reproducable / does it happen frequently when installing the O365 App or in other situations?