Bug 40510 - UMC Server crashes in UCS@school ucs-test
UMC Server crashes in UCS@school ucs-test
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: UMC (Generic)
UCS 4.1
Other Linux
: P5 normal (vote)
: UCS 4.1-0-errata
Assigned To: Florian Best
Sönke Schwardt-Krummrich
https://github.com/crunchy-github/pyt...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-27 11:32 CET by Florian Best
Modified: 2020-11-26 23:33 CET (History)
2 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional): Error handling, Troubleshooting
Max CVSS v3 score:


Attachments
reproducer (418 bytes, text/x-python)
2016-01-27 17:58 CET, Florian Best
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Best univentionstaff 2016-01-27 11:32:56 CET
It seems the UMC-Server crashes during the UCS@school ucs-tests.

http://jenkins.knut.univention.de:8080/job/UCSschool%204.1/job/UCSschool%204.1%20Singleserver/lastCompletedBuild/SambaVersion=s3/testReport/90_ucsschool/40_schoolwizard_school_create/test/

All following tests which interact with UMC are failing then.
We need to find out why/if it crashes and fix it.
Comment 1 Florian Best univentionstaff 2016-01-27 15:42:23 CET
Okay, I got the error: python-notifier crashes with a KeyError for a file descriptor it is listening on after it got a broken pipe error.
The socket cleanup seems not to be correctly done and python-notifier is not graceful.
Maybe it is caused because from starting the module process until it listens to the UNIX socket 3 seconds pass.
After the crash the module process still runs.

/var/log/univention/management-console-server.log
27.01.16 09:26:55.549  MODULE      ( INFO    ) : Executing ['schoolwizards/classes/add']
27.01.16 09:26:55.549  RESOURCES   ( INFO    ) : Searching for module providing command schoolwizards/classes/add
27.01.16 09:26:55.550  RESOURCES   ( INFO    ) : Found module schoolwizards
27.01.16 09:26:55.550  MAIN        ( INFO    ) : Starting new module process and passing new request to module schoolwizards: 145390481485352-455
27.01.16 09:26:55.550  MAIN        ( PROCESS ) : running: ['/usr/sbin/univention-management-console-module', '-m', 'schoolwizards', '-s', '/var/run/univention-management-console/6071-1453904815550.socket', '-d', '4', '-l', 'de_DE.UTF-8']
27.01.16 09:26:55.557  MAIN        ( INFO    ) : The module schoolusers is inactive for too long. Sending EXIT request to module
27.01.16 09:26:55.557  PROTOCOL    ( INFO    ) : Sending UMCP EXIT REQUEST 145390481555774-69
27.01.16 09:26:55.557  MAIN        ( INFO    ) : Client: _resend: socket is damaged: [Errno 32] Daten?bergabe unterbrochen (broken pipe)
27.01.16 09:26:55.557  MAIN        ( WARN    ) : Socket died (module=schoolusers)
27.01.16 09:26:55.557  MAIN        ( WARN    ) : Module process schoolusers died (pid: 28150, exit status: -1, signal: -1, status: -1)
27.01.16 09:26:55.557  MAIN        ( INFO    ) : Checking for kill timer ()
27.01.16 09:26:55.557  MAIN        ( WARN    ) : Cleaning up requests
27.01.16 09:26:55.557  MAIN        ( INFO    ) : No pending requests found
27.01.16 09:26:55.557  MAIN        ( WARN    ) : Remove inactivity timer
27.01.16 09:26:55.559  MAIN        ( INFO    ) : Ask module schoolusers to shutdown gracefully
27.01.16 09:26:55.592  MAIN        ( ERROR   ) : Traceback (most recent call last):
  File "/usr/sbin/univention-management-console-server", line 226, in <module>
    umc_daemon.do_action()
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action
    func(self)
  File "/usr/sbin/univention-management-console-server", line 153, in _restart
    self._start()
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start
    self.app.run()
  File "/usr/sbin/univention-management-console-server", line 203, in run
    notifier.loop()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 283, in loop
    step()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 258, in step
    sock_obj = __sock_objects[ fd ]
KeyError: 61

root@master200:~# ps aufx | grep wizards
root     30000  0.2  1.2 397884 47456 ?        S    09:26   0:00 /usr/bin/python2.7 /usr/sbin/univention-management-console-module -m schoolwizards -s /var/run/univention-management-console/6071-1453904815550.socket -d 4 -l de_DE.UTF-8

management-console-module-schoolwizards.log:
27.01.16 09:26:57.132  DEBUG_INIT
27.01.16 09:26:57.725  ADMIN       ( INFO    ) : admin.syntax.import_syntax_files: importing "/usr/lib/pymodules/python2.7/univention/admin/syntax.d/mp0bb4qtti.py"
…
27.01.16 09:26:58.734  ADMIN       ( INFO    ) : admin.modules.update: importing "univention/admin/handlers/saml/serviceprovider"
27.01.16 09:26:58.862  MAIN        ( INFO    ) : Initialising server process
27.01.16 09:26:58.862  MAIN        ( INFO    ) : Using a UNIX socket
27.01.16 09:26:58.862  SSL         ( INFO    ) : Server listening to connections
Comment 2 Florian Best univentionstaff 2016-01-27 17:31:33 CET
Ok, the problem is python-notifier: step() in nf_generic does the following:

Shortened pseudocode:
"""
fds = poll()
for timer in timers:
    timer()
for fd in fds:
    sock_obj = __sock_objects[fd]
"""

If a timer now removes this socket by calling notifier.socket_remove() this crash occurs. The condition is very rare as there must be "something" in the socket which triggers that poll() returns the file descriptor.

Here is the UMC stack, it seems we can't enhance anything here:

in _mod_inactive()
> MAIN        ( INFO    ) : The module schoolusers is inactive for too long. Sending EXIT request to module
in request() (called by handle_request_exit())
> PROTOCOL    ( INFO    ) : Sending UMCP EXIT REQUEST 145390481555774-69
in client.py:_resend()
> MAIN        ( INFO    ) : Client: _resend: socket is damaged: [Errno 32] Daten?bergabe unterbrochen (broken pipe)
in _socket_died()
> MAIN        ( WARN    ) : Socket died (module=schoolusers)
in _mod_died()
> MAIN        ( WARN    ) : Module process schoolusers died (pid: 28150, exit status: -1, signal: -1, status: -1)
in _mod_died()
> MAIN        ( INFO    ) : Checking for kill timer ()
in _mod_died()
> MAIN        ( WARN    ) : Cleaning up requests
in invalidate_all_requests()
> MAIN        ( INFO    ) : No pending requests found
in _mod_died()
> MAIN        ( WARN    ) : Remove inactivity timer
in handle_request_exit()
> MAIN        ( INFO    ) : Ask module schoolusers to shutdown gracefully
Comment 3 Florian Best univentionstaff 2016-01-27 17:58:36 CET
Created attachment 7432 [details]
reproducer
Comment 4 Florian Best univentionstaff 2016-01-27 18:19:49 CET
python-notifier.yaml:
r67028 | YAML Bug #40510

python-notifier (0.9.7-4):
r67027 | Bug #40510: fix KeyError in step() which was caused by socket removal in a timer()
Comment 5 Florian Best univentionstaff 2016-01-29 14:44:43 CET
Added also to ucs-test.

ucs-test (6.0.31-23):
r67067 | Bug #40510: add test
Comment 6 Sönke Schwardt-Krummrich univentionstaff 2016-02-03 22:20:27 CET
OK: code change
OK: functional test
OK: YAML (updated)
OK: ucs-test
Comment 7 Janek Walkenhorst univentionstaff 2016-02-04 13:46:03 CET
<http://errata.software-univention.de/ucs/4.1/77.html>
Comment 8 Florian Best univentionstaff 2020-11-26 23:33:53 CET
I added a test case for this issue:
abd65b76a263 | Bug #40510: add test case test_socket_remove_in_timer()