Bug 50598 - Stuck univention-cli-server process after update from UCS 4.4-2 errata350 to errata385
Summary: Stuck univention-cli-server process after update from UCS 4.4-2 errata350 to ...
Status: RESOLVED WONTFIX
Alias: None
Product: UCS
Classification: Unclassified
Component: UDM - CLI
Version: UCS 4.4
Hardware: Other Linux
: P5 normal
Target Milestone: ---
Assignee: UMC maintainers
QA Contact: UMC maintainers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-12-04 18:59 CET by Arvid Requate
Modified: 2025-02-05 09:28 CET (History)
3 users (show)

See Also:
What kind of report is it?: Development Internal
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Customer ID:
Max CVSS v3 score:


Attachments
lsof.log (24.60 KB, text/x-log)
2019-12-04 22:13 CET, Arvid Requate
Details
/proc/$PID/status (1.18 KB, text/x-log)
2019-12-04 22:30 CET, Arvid Requate
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2019-12-04 18:59:11 CET
After updating from UCS 4.4-2 errata350 to errata385 there is a stuck univention-cli-server process, which refuses to terminate by normal means:

root@ucs-2009:~# ps ax | grep cli
19725 ?        Ss     0:00 /usr/bin/python2.7 /usr/share/univention-directory-manager-tools/univention-cli-server
19728 ?        Z      0:00 [univention-cli-] <defunct>
25456 pts/1    S+     0:00 grep cli

root@ucs-2009:~# kill 19725
root@ucs-2009:~# ps ax | grep cli
19725 ?        Ss     0:00 /usr/bin/python2.7 /usr/share/univention-directory-manager-tools/univention-cli-server
19728 ?        Z      0:00 [univention-cli-] <defunct>
25459 pts/1    S+     0:00 grep cli
root@ucs-2009:~# 

This causes issues e.g. for joinscripts that install new udm modules and cannot use them afterwards.

Note: While writing this Bug report the process terminated itself apparently.

Jürn just told me that he recently observed a similar situation, that's why I think we should check this.
Comment 1 Jürn Brodersen univentionstaff 2019-12-04 19:13:53 CET
Some previous discussion can be found on bug 50403.
I closed that one because I thought it was an isolated incident, seems I was wrong.
Comment 2 Florian Best univentionstaff 2019-12-04 19:16:55 CET
#0  0x00007f77c9bf65e3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1  0x000055ee6804e119 in select_select.lto_priv () at ../Modules/selectmodule.c:267
#2  0x000055ee67f7a84a in call_function (oparg=<optimized out>, pp_stack=0x7ffd81072148) at ../Python/ceval.c:4352
#3  PyEval_EvalFrameEx () at ../Python/ceval.c:2989
#4  0x000055ee67f789f5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3584
#5  0x000055ee67f80b78 in fast_function (nk=0, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffd81072358, func=<function at remote 0x7f77c9099c80>) at ../Python/ceval.c:4447
#6  call_function (oparg=<optimized out>, pp_stack=0x7ffd81072358) at ../Python/ceval.c:4372
#7  PyEval_EvalFrameEx () at ../Python/ceval.c:2989
#8  0x000055ee67f789f5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3584
#9  0x000055ee67f80b78 in fast_function (nk=0, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffd81072568, func=<function at remote 0x7f77c9099ed8>) at ../Python/ceval.c:4447
#10 call_function (oparg=<optimized out>, pp_stack=0x7ffd81072568) at ../Python/ceval.c:4372
#11 PyEval_EvalFrameEx () at ../Python/ceval.c:2989
#12 0x000055ee67f789f5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3584
#13 0x000055ee67f787b9 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:669
#14 0x000055ee67fa8bff in run_mod.lto_priv () at ../Python/pythonrun.c:1376
#15 0x000055ee67fa3b52 in PyRun_FileExFlags () at ../Python/pythonrun.c:1362
#16 0x000055ee67fa369e in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:948
#17 0x000055ee67f54771 in Py_Main () at ../Modules/main.c:640
#18 0x00007f77c9b352e1 in __libc_start_main (main=0x55ee67f540a0 <main>, argc=2, argv=0x7ffd810729a8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd81072998) at ../csu/libc-start.c:291
#19 0x000055ee67f53f9a in _start ()
Comment 3 Florian Best univentionstaff 2019-12-04 19:24:19 CET
Broken socket?
lsof -p $PID output would be nice.
I will investigate deeper somewhen.

We should fix Bug #34836 and Bug #34812.
Comment 4 Arvid Requate univentionstaff 2019-12-04 22:04:31 CET
VM: arequate_o365multi.8.100 (spoka)
Snapshot: office365-app-connection-e385-udm-cli

=======================================================================
root@ucs-2009:~# ps ax | grep cli
19777 ?        Ss     0:00 /usr/bin/python2.7 /usr/share/univention-directory-manager-tools/univention-cli-server
19780 ?        Z      0:00 [univention-cli-] <defunct>
=======================================================================


In this situation the update of the office365 App fails:
=======================================================================
root@ucs-2009:~# univention-app update
root@ucs-2009:~# univention-app upgrade office365
[...]
Die folgenden Pakete werden aktualisiert (Upgrade):
  univention-management-console-module-office365 univention-office365
[...]
Waiting for file /usr/share/univention-office365/syntax.d/office365.py: OK
Object exists: cn=udm_module,cn=univention,dc=o365multi,dc=net
Object created: cn=office365/profile,cn=udm_module,cn=univention,dc=o365multi,dc=net

Waiting for activation of the extension object office365/profile: OK
Waiting for file /usr/share/univention-office365/handler.d/profile.py: OK
Terminating running univention-cli-server processes.
Object exists: cn=udm_module,cn=univention,dc=o365multi,dc=net
Object created: cn=office365/ad-connection,cn=udm_module,cn=univention,dc=o365multi,dc=net

Waiting for activation of the extension object office365/ad-connection: OK
Waiting for file /usr/share/univention-office365/handler.d/ad-connection.py: OK
Terminating running univention-cli-server processes.
Object exists: cn=office365,dc=o365multi,dc=net
Object exists: cn=profiles,cn=office365,dc=o365multi,dc=net
Object created: cn=ad-connections,cn=office365,dc=o365multi,dc=net
unknown module office365/profile.

Available Modules are:
  appcenter/app
  [...]

root@ucs-2009:~# univention-check-join-status 
Warning: 'univention-office365' is not configured.
Error: Not all install files configured: 1 missing
=======================================================================
Comment 5 Arvid Requate univentionstaff 2019-12-04 22:13:14 CET
Created attachment 10251 [details]
lsof.log
Comment 6 Arvid Requate univentionstaff 2019-12-04 22:30:43 CET
Created attachment 10252 [details]
/proc/$PID/status

SigBlk: 0000000000015027

>>> bin(0x5027)
'0b101000000100111'
   ^ SIGTERM (bit 15) blocked.
Comment 7 Florian Best univentionstaff 2019-12-04 22:56:12 CET
lsof shows the following:
univentio 19777 root    5u  IPv4             198036      0t0     TCP ucs-2009.o365multi.net:59286->ucs-2009.o365multi.net:6669 (ESTABLISHED)
univentio 19777 root    6u   REG              253,0   163360 5758749 /var/log/univention/directory-manager-cmd.log
univentio 19777 root    7u   REG              253,0  1372160 5770779 /var/lib/univention-directory-listener/cache/data.mdb
univentio 19777 root    8u  unix 0xffff92b63bd43800      0t0  198340 /tmp/admincli_0/sock type=STREAM
univentio 19777 root   10w   REG              253,0        0 5770781 /var/log/univention/listener_modules/portal_groups.log
univentio 19777 root   11w   REG              253,0      644 5756199 /var/log/univention/listener_modules/app_attributes.log.2019-11-04
univentio 19777 root   12r   CHR                1,9      0t0    6508 /dev/urandom

Bug #50403 had the same problem, as well when installing the office356 app.

There we see, that the signal handling ignores the kill:
(In reply to Jürn Brodersen from comment #3)
> Turns out this was not the problem and is not really a problem generally.
> 
> ucs_registerLDAPExtension --udm_syntax does wait for the activation
> attribute to be set to true, which is set only after the listener wrote out
> the syntax.
> 
> The problem was that the univention-cli-server was blocking signals and was
> therefore not restarted!
> 
> '''
> root@qa442:~# ps -eo blocked,pid,cmd | egrep -v '^0+ '
>          BLOCKED   PID CMD
> 7be3c0fe28014a03     1 /sbin/init splash
> 0000000400004a02   238 /lib/systemd/systemd-journald
> 0000000000014003   282 /lib/systemd/systemd-udevd
> 0000000200000001   402 /lib/systemd/systemd-logind
> 0000000000010000   451 runsvdir -P /etc/service log: 
> fffffffe3bfa2800   700 /usr/bin/containerd
> fffffffe3bfa2800   701 /usr/bin/dockerd -H fd://
> --containerd=/run/containerd/containerd.sock
> 0000000000015027 17363 [univention-cli-] <defunct>
> 0000000000010000 17495 -bash
> 0000000000010000 25224 /bin/bash /usr/bin/univention-welcome-screen
> 0000000000015027 25574 /usr/bin/python2.7
> /usr/share/univention-directory-manager-tools/univention-cli-server
> '''
> 
> Keyword: SIGPROCMASK
> 
> I have no clue how that happened and I can't reproduce this. (I do have a
> snapshot of this though)

(In reply to Sönke Schwardt-Krummrich from comment #5)
> (In reply to Jürn Brodersen from comment #4)
> > The univention-cli-server just ignored that signal completely :( see comment
> > 3.
> 
> AFAIR signals are ignored by the listener, if the listener is currently
> initializing/resync a listener module.

We also know that due to Bug #34812 file descriptors are inherited instead of being closed.
The process has at least some listener logfiles inherited.

→ Can a process also inherit some signal handling settings?
Comment 8 Arvid Requate univentionstaff 2019-12-04 23:12:35 CET
> → Can a process also inherit some signal handling settings?

man 7 signal (http://man7.org/linux/man-pages/man7/signal.7.html) says:

       A child created via fork(2) inherits a copy of its parent's signal
       dispositions.  During an execve(2), the dispositions of handled
       signals are reset to the default; the dispositions of ignored signals
       are left unchanged.
Comment 9 Florian Best univentionstaff 2019-12-05 11:07:07 CET
(In reply to Arvid Requate from comment #8)
> > → Can a process also inherit some signal handling settings?
> 
> man 7 signal (http://man7.org/linux/man-pages/man7/signal.7.html) says:
> 
>        A child created via fork(2) inherits a copy of its parent's signal
>        dispositions.  During an execve(2), the dispositions of handled
>        signals are reset to the default; the dispositions of ignored signals
>        are left unchanged.

The univention-cli-client makes a fork() to create the univention-cli-server process.
But the listener probably only calls UDM-CLI without fork().
Comment 10 Florian Best univentionstaff 2019-12-12 09:43:10 CET
From code review I found a bug/crash, which might occur very seldom:

diff --git management/univention-directory-manager-modules/univention-cli-server management/univention-directory-manager-modules/univention-cli-server
index 8c036e807d..ede4655e19 100755
--- management/univention-directory-manager-modules/univention-cli-server
+++ management/univention-directory-manager-modules/univention-cli-server
@@ -60,7 +60,7 @@ class MyRequestHandler(SocketServer.BaseRequestHandler):
                sarglist = ''
                while True:
                        buf = self.request.recv(1024)
-                       if buf[-1] == '\0':
+                       if buf.endswith('\0'):
                                buf = buf[:-1]
                                sarglist += buf
                                break

Not sure, if it's the reason.
Comment 11 Ingo Steuwer univentionstaff 2020-01-08 12:17:56 CET
Is this bug reproducable / does it happen frequently when installing the O365 App or in other situations?
Comment 12 Jan-Luca Kiok univentionstaff 2025-02-05 09:28:51 CET
This issue has been filed against UCS 4.4.

UCS 4.4 is out of maintenance and components may have vastly changed in later releases. Thus, this issue is now being closed.

If this issue still occurs in newer versions, please use "Clone this bug" or reopen this issue. In this case please provide information on how this issue is affecting you.