Univention Bugzilla – Bug 40510
UMC Server crashes in UCS@school ucs-test
Last modified: 2020-11-26 23:33:53 CET
It seems the UMC-Server crashes during the UCS@school ucs-tests. http://jenkins.knut.univention.de:8080/job/UCSschool%204.1/job/UCSschool%204.1%20Singleserver/lastCompletedBuild/SambaVersion=s3/testReport/90_ucsschool/40_schoolwizard_school_create/test/ All following tests which interact with UMC are failing then. We need to find out why/if it crashes and fix it.
Okay, I got the error: python-notifier crashes with a KeyError for a file descriptor it is listening on after it got a broken pipe error. The socket cleanup seems not to be correctly done and python-notifier is not graceful. Maybe it is caused because from starting the module process until it listens to the UNIX socket 3 seconds pass. After the crash the module process still runs. /var/log/univention/management-console-server.log 27.01.16 09:26:55.549 MODULE ( INFO ) : Executing ['schoolwizards/classes/add'] 27.01.16 09:26:55.549 RESOURCES ( INFO ) : Searching for module providing command schoolwizards/classes/add 27.01.16 09:26:55.550 RESOURCES ( INFO ) : Found module schoolwizards 27.01.16 09:26:55.550 MAIN ( INFO ) : Starting new module process and passing new request to module schoolwizards: 145390481485352-455 27.01.16 09:26:55.550 MAIN ( PROCESS ) : running: ['/usr/sbin/univention-management-console-module', '-m', 'schoolwizards', '-s', '/var/run/univention-management-console/6071-1453904815550.socket', '-d', '4', '-l', 'de_DE.UTF-8'] 27.01.16 09:26:55.557 MAIN ( INFO ) : The module schoolusers is inactive for too long. Sending EXIT request to module 27.01.16 09:26:55.557 PROTOCOL ( INFO ) : Sending UMCP EXIT REQUEST 145390481555774-69 27.01.16 09:26:55.557 MAIN ( INFO ) : Client: _resend: socket is damaged: [Errno 32] Daten?bergabe unterbrochen (broken pipe) 27.01.16 09:26:55.557 MAIN ( WARN ) : Socket died (module=schoolusers) 27.01.16 09:26:55.557 MAIN ( WARN ) : Module process schoolusers died (pid: 28150, exit status: -1, signal: -1, status: -1) 27.01.16 09:26:55.557 MAIN ( INFO ) : Checking for kill timer () 27.01.16 09:26:55.557 MAIN ( WARN ) : Cleaning up requests 27.01.16 09:26:55.557 MAIN ( INFO ) : No pending requests found 27.01.16 09:26:55.557 MAIN ( WARN ) : Remove inactivity timer 27.01.16 09:26:55.559 MAIN ( INFO ) : Ask module schoolusers to shutdown gracefully 27.01.16 09:26:55.592 MAIN ( ERROR ) : Traceback (most recent call last): File "/usr/sbin/univention-management-console-server", line 226, in <module> umc_daemon.do_action() File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action func(self) File "/usr/sbin/univention-management-console-server", line 153, in _restart self._start() File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start self.app.run() File "/usr/sbin/univention-management-console-server", line 203, in run notifier.loop() File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 283, in loop step() File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 258, in step sock_obj = __sock_objects[ fd ] KeyError: 61 root@master200:~# ps aufx | grep wizards root 30000 0.2 1.2 397884 47456 ? S 09:26 0:00 /usr/bin/python2.7 /usr/sbin/univention-management-console-module -m schoolwizards -s /var/run/univention-management-console/6071-1453904815550.socket -d 4 -l de_DE.UTF-8 management-console-module-schoolwizards.log: 27.01.16 09:26:57.132 DEBUG_INIT 27.01.16 09:26:57.725 ADMIN ( INFO ) : admin.syntax.import_syntax_files: importing "/usr/lib/pymodules/python2.7/univention/admin/syntax.d/mp0bb4qtti.py" … 27.01.16 09:26:58.734 ADMIN ( INFO ) : admin.modules.update: importing "univention/admin/handlers/saml/serviceprovider" 27.01.16 09:26:58.862 MAIN ( INFO ) : Initialising server process 27.01.16 09:26:58.862 MAIN ( INFO ) : Using a UNIX socket 27.01.16 09:26:58.862 SSL ( INFO ) : Server listening to connections
Ok, the problem is python-notifier: step() in nf_generic does the following: Shortened pseudocode: """ fds = poll() for timer in timers: timer() for fd in fds: sock_obj = __sock_objects[fd] """ If a timer now removes this socket by calling notifier.socket_remove() this crash occurs. The condition is very rare as there must be "something" in the socket which triggers that poll() returns the file descriptor. Here is the UMC stack, it seems we can't enhance anything here: in _mod_inactive() > MAIN ( INFO ) : The module schoolusers is inactive for too long. Sending EXIT request to module in request() (called by handle_request_exit()) > PROTOCOL ( INFO ) : Sending UMCP EXIT REQUEST 145390481555774-69 in client.py:_resend() > MAIN ( INFO ) : Client: _resend: socket is damaged: [Errno 32] Daten?bergabe unterbrochen (broken pipe) in _socket_died() > MAIN ( WARN ) : Socket died (module=schoolusers) in _mod_died() > MAIN ( WARN ) : Module process schoolusers died (pid: 28150, exit status: -1, signal: -1, status: -1) in _mod_died() > MAIN ( INFO ) : Checking for kill timer () in _mod_died() > MAIN ( WARN ) : Cleaning up requests in invalidate_all_requests() > MAIN ( INFO ) : No pending requests found in _mod_died() > MAIN ( WARN ) : Remove inactivity timer in handle_request_exit() > MAIN ( INFO ) : Ask module schoolusers to shutdown gracefully
Created attachment 7432 [details] reproducer
python-notifier.yaml: r67028 | YAML Bug #40510 python-notifier (0.9.7-4): r67027 | Bug #40510: fix KeyError in step() which was caused by socket removal in a timer()
Added also to ucs-test. ucs-test (6.0.31-23): r67067 | Bug #40510: add test
OK: code change OK: functional test OK: YAML (updated) OK: ucs-test
<http://errata.software-univention.de/ucs/4.1/77.html>
I added a test case for this issue: abd65b76a263 | Bug #40510: add test case test_socket_remove_in_timer()