Univention Bugzilla – Full Text Bug Listing |
Summary: | UMC Server crashes in UCS@school ucs-test | ||
---|---|---|---|
Product: | UCS | Reporter: | Florian Best <best> |
Component: | UMC (Generic) | Assignee: | Florian Best <best> |
Status: | CLOSED FIXED | QA Contact: | Sönke Schwardt-Krummrich <schwardt> |
Severity: | normal | ||
Priority: | P5 | CC: | gohmann, walkenhorst |
Version: | UCS 4.1 | ||
Target Milestone: | UCS 4.1-0-errata | ||
Hardware: | Other | ||
OS: | Linux | ||
URL: | https://github.com/crunchy-github/python-notifier/pull/11 | ||
What kind of report is it?: | --- | What type of bug is this?: | --- |
Who will be affected by this bug?: | --- | How will those affected feel about the bug?: | --- |
User Pain: | Enterprise Customer affected?: | ||
School Customer affected?: | ISV affected?: | ||
Waiting Support: | Flags outvoted (downgraded) after PO Review: | ||
Ticket number: | Bug group (optional): | Error handling, Troubleshooting | |
Max CVSS v3 score: | |||
Attachments: | reproducer |
Description
Florian Best
2016-01-27 11:32:56 CET
Okay, I got the error: python-notifier crashes with a KeyError for a file descriptor it is listening on after it got a broken pipe error. The socket cleanup seems not to be correctly done and python-notifier is not graceful. Maybe it is caused because from starting the module process until it listens to the UNIX socket 3 seconds pass. After the crash the module process still runs. /var/log/univention/management-console-server.log 27.01.16 09:26:55.549 MODULE ( INFO ) : Executing ['schoolwizards/classes/add'] 27.01.16 09:26:55.549 RESOURCES ( INFO ) : Searching for module providing command schoolwizards/classes/add 27.01.16 09:26:55.550 RESOURCES ( INFO ) : Found module schoolwizards 27.01.16 09:26:55.550 MAIN ( INFO ) : Starting new module process and passing new request to module schoolwizards: 145390481485352-455 27.01.16 09:26:55.550 MAIN ( PROCESS ) : running: ['/usr/sbin/univention-management-console-module', '-m', 'schoolwizards', '-s', '/var/run/univention-management-console/6071-1453904815550.socket', '-d', '4', '-l', 'de_DE.UTF-8'] 27.01.16 09:26:55.557 MAIN ( INFO ) : The module schoolusers is inactive for too long. Sending EXIT request to module 27.01.16 09:26:55.557 PROTOCOL ( INFO ) : Sending UMCP EXIT REQUEST 145390481555774-69 27.01.16 09:26:55.557 MAIN ( INFO ) : Client: _resend: socket is damaged: [Errno 32] Daten?bergabe unterbrochen (broken pipe) 27.01.16 09:26:55.557 MAIN ( WARN ) : Socket died (module=schoolusers) 27.01.16 09:26:55.557 MAIN ( WARN ) : Module process schoolusers died (pid: 28150, exit status: -1, signal: -1, status: -1) 27.01.16 09:26:55.557 MAIN ( INFO ) : Checking for kill timer () 27.01.16 09:26:55.557 MAIN ( WARN ) : Cleaning up requests 27.01.16 09:26:55.557 MAIN ( INFO ) : No pending requests found 27.01.16 09:26:55.557 MAIN ( WARN ) : Remove inactivity timer 27.01.16 09:26:55.559 MAIN ( INFO ) : Ask module schoolusers to shutdown gracefully 27.01.16 09:26:55.592 MAIN ( ERROR ) : Traceback (most recent call last): File "/usr/sbin/univention-management-console-server", line 226, in <module> umc_daemon.do_action() File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action func(self) File "/usr/sbin/univention-management-console-server", line 153, in _restart self._start() File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start self.app.run() File "/usr/sbin/univention-management-console-server", line 203, in run notifier.loop() File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 283, in loop step() File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 258, in step sock_obj = __sock_objects[ fd ] KeyError: 61 root@master200:~# ps aufx | grep wizards root 30000 0.2 1.2 397884 47456 ? S 09:26 0:00 /usr/bin/python2.7 /usr/sbin/univention-management-console-module -m schoolwizards -s /var/run/univention-management-console/6071-1453904815550.socket -d 4 -l de_DE.UTF-8 management-console-module-schoolwizards.log: 27.01.16 09:26:57.132 DEBUG_INIT 27.01.16 09:26:57.725 ADMIN ( INFO ) : admin.syntax.import_syntax_files: importing "/usr/lib/pymodules/python2.7/univention/admin/syntax.d/mp0bb4qtti.py" … 27.01.16 09:26:58.734 ADMIN ( INFO ) : admin.modules.update: importing "univention/admin/handlers/saml/serviceprovider" 27.01.16 09:26:58.862 MAIN ( INFO ) : Initialising server process 27.01.16 09:26:58.862 MAIN ( INFO ) : Using a UNIX socket 27.01.16 09:26:58.862 SSL ( INFO ) : Server listening to connections Ok, the problem is python-notifier: step() in nf_generic does the following: Shortened pseudocode: """ fds = poll() for timer in timers: timer() for fd in fds: sock_obj = __sock_objects[fd] """ If a timer now removes this socket by calling notifier.socket_remove() this crash occurs. The condition is very rare as there must be "something" in the socket which triggers that poll() returns the file descriptor. Here is the UMC stack, it seems we can't enhance anything here: in _mod_inactive() > MAIN ( INFO ) : The module schoolusers is inactive for too long. Sending EXIT request to module in request() (called by handle_request_exit()) > PROTOCOL ( INFO ) : Sending UMCP EXIT REQUEST 145390481555774-69 in client.py:_resend() > MAIN ( INFO ) : Client: _resend: socket is damaged: [Errno 32] Daten?bergabe unterbrochen (broken pipe) in _socket_died() > MAIN ( WARN ) : Socket died (module=schoolusers) in _mod_died() > MAIN ( WARN ) : Module process schoolusers died (pid: 28150, exit status: -1, signal: -1, status: -1) in _mod_died() > MAIN ( INFO ) : Checking for kill timer () in _mod_died() > MAIN ( WARN ) : Cleaning up requests in invalidate_all_requests() > MAIN ( INFO ) : No pending requests found in _mod_died() > MAIN ( WARN ) : Remove inactivity timer in handle_request_exit() > MAIN ( INFO ) : Ask module schoolusers to shutdown gracefully Created attachment 7432 [details]
reproducer
python-notifier.yaml: r67028 | YAML Bug #40510 python-notifier (0.9.7-4): r67027 | Bug #40510: fix KeyError in step() which was caused by socket removal in a timer() Added also to ucs-test. ucs-test (6.0.31-23): r67067 | Bug #40510: add test OK: code change OK: functional test OK: YAML (updated) OK: ucs-test I added a test case for this issue: abd65b76a263 | Bug #40510: add test case test_socket_remove_in_timer() |