Bug 52518 - python-notifier: Shutdown of several services (s4-conn, portal, console-web, console-serv) using python2.7
python-notifier: Shutdown of several services (s4-conn, portal, console-web, ...
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: UMC (Generic)
UCS 4.4
Other Linux
: P5 normal (vote)
: UCS 4.4-7-errata
Assigned To: Philipp Hahn
Felix Botner
https://luns.knut.univention.de/ether...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-12-18 17:57 CET by Dirk Schnick
Modified: 2021-01-15 12:41 CET (History)
8 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.171
Enterprise Customer affected?: Yes
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2020102821000549, 2020121521000293, 2020112621000569
Bug group (optional):
Max CVSS v3 score:


Attachments
Patches (1.59 KB, patch)
2020-12-18 17:57 CET, Dirk Schnick
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Schnick univentionstaff 2020-12-18 17:57:37 CET
Created attachment 10581 [details]
Patches

This problem is now reported by 4 customers. All customers are school customers! One customer impact was reported by Marc in our Chat; currently not ticket. The logfiles are showing a "normal" shutdown of these services:

univention-management-console-web-server
univention-management-console-server
univention-portal-server
univention-s4-connector
and in one case also the rest-api was shut down at the same time. 

Log example of webserver shutdown:
13.11.20 02:30:31.033  MAIN        ( PROCESS ) : CPCommand (10.0.0.10:47208) response message: The connection to the Univention Management Console Server broke up unexpectedly. 
If you have root permissions on the system you can restart UMC by executing the following commands:
 * service univention-management-console-server restart
 * service univention-management-console-web-server restart
Otherwise please contact an administrator or try again later.

Log example of s4 shutdown:
Nov 13 02:30:32 e142-sl01 univention-s4-connector[28114]: Stopping Univention S4 Connector: univention-s4-connectorstart-stop-daemon: warning: failed to kill 2086: No such process
Nov 13 02:30:32 e142-sl01 univention-s4-connector[28114]: .

A log example of console-server is not really meaningful. The portal log shows nothing. 

Sönke already wrote a patch for s4 and main.py to write ps fax when shutdown/restart the services. We placed the patch in two customer environments to get more information about the system status when shutting down.

This seems to happen with 4.4-6 and higher.
Comment 2 Dirk Schnick univentionstaff 2020-12-22 13:40:23 CET
Another time the problem has occurred. Logs are attached to the first ticket (2020102821000549).
Comment 6 Erik Damrose univentionstaff 2021-01-13 08:58:48 CET
*** Bug 52511 has been marked as a duplicate of this bug. ***
Comment 7 Philipp Hahn univentionstaff 2021-01-14 18:21:36 CET
Bug in python-notifier: <https://git.knut.univention.de/univention/python-notifier/-/blob/ucs-5.0-0/notifier/popen.py#L249>

Pre-Condition
1. system is busy
2. because of this notifier.popen._watcher() is not called regularly. Or some reason like too many pending timers/connections or some other reason preventing the Notifier from detecting dead children and setting their .__dead:=True.

Sequence of events:
1. UMC-server tries to start a new module process
2. module process does not start up within 3s
3. UMC-server tries to kill the new module process with .stop()
4. UMC-server schedules an immediate callback for _kill(15)
4. UMC-server run _kill(15) to kill the child process with SIGTERM
5. child module process survives this signal
6. UMC-server setups a timer for 3s later to call _kill(9)
7. UMC-server runs _kill(9) to kill the child process with SIGKILL
8. child cannot ignore this and dies
9. UMC-server setups a timer for 3s later to call __killall(15)
10. UMC-server runs __killall(15) to kill all `/usr/bin/python2.7` processes
11. This kills the process itself but all others as well!

Fix:
Disable and remove the call to __killall(): It is completely broken as it assumes that argv[0] == 'python2.7' is unique and killing all processes starting with this is plain wrong.


[ucs-4.4-7] 5357dd6 Bug #52518: Remove __killall() function killing wrong processes
 debian/changelog  |  6 ++++++
 notifier/popen.py | 48 +-----------------------------------------------
 2 files changed, 7 insertions(+), 47 deletions(-)
[ucs-4.4-7] 7ecac11 Bug #52518: Remove __killall() function killing wrong processes
 debian/changelog | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

ssh -t omar repo_admin.py -G "git@git.knut.univention.de:univention/python-notifier" -b ucs-4.4-7 -P . -p python-notifier -r 4.4 -s errata4.4-7 && ssh -t dimma build-package-ng -r 4.4 -s errata4.4-7 -p python-notifier

Package: python-notifier
Version: 0.9.7-11A~4.4.0.202101141808
Branch: ucs_4.4-0
Scope: errata4.4-7

[4.4-7] 860820e962 Bug #52518: python-notifier 0.9.7-11A~4.4.0.202101141808
 doc/errata/staging/python-notifier.yaml | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)




[ucs-5.0-0] 374993e Bug #52518: Remove __killall() function killing wrong processes
 debian/changelog  |  6 ++++++
 notifier/popen.py | 48 +-----------------------------------------------
 2 files changed, 7 insertions(+), 47 deletions(-)

ssh -t omar repo_admin.py -G "git@git.knut.univention.de:univention/python-notifier" -b ucs-5.0-0 -P . -p python-notifier -r 5.0 && ssh -t ladda build-package-ng -r 5.0 -p python-notifier

Package: python-notifier
Version: 0.9.8-5A~5.0.0.202101141747
Branch: ucs_5.0-0
Comment 8 Felix Botner univentionstaff 2021-01-15 10:45:06 CET
FAIL - erik noticed umc i not restarted during the update,

       Should p-n restart the umc-server during this update
       or a separate umc errata update ??
      

TODO - wait for customer tests

OK - yaml
OK - 4.4-7
OK - jenkins/manual tests
OK - 5.0-0
Comment 9 Philipp Hahn univentionstaff 2021-01-15 11:43:12 CET
(In reply to Felix Botner from comment #8)
> FAIL - erik noticed umc i not restarted during the update,
> 
>        Should p-n restart the umc-server during this update
>        or a separate umc errata update ??

[4.4-7] 2f8b8577b2 Bug #52518: Force UMC-server restart after python-notifier update
 doc/errata/staging/univention-management-console.yaml     | 24 ++++++++++++++++++++++++
 management/univention-management-console/debian/changelog |  6 ++++++
 management/univention-management-console/debian/control   |  4 ++--
 3 files changed, 32 insertions(+), 2 deletions(-)

Package: univention-management-console
Version: 11.0.6-4A~4.4.0.202101151138
Branch: ucs_4.4-0
Scope: errata4.4-7

[4.4-7] abf4ba78d2 Bug #52518: univention-management-console 11.0.6-4A~4.4.0.202101151138
 doc/errata/staging/univention-management-console.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


> TODO - wait for customer tests

As discussed we will not wait for feedback from them as other customers are affected as well.
Comment 10 Felix Botner univentionstaff 2021-01-15 11:47:36 CET
OK - umc update (with correct p-n dependencies restarts) the umc server
     after the notifier update
OK - yaml