Bug 39909 - UMC-Server: crashes if file limits are reached
UMC-Server: crashes if file limits are reached
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: UMC (Generic)
UCS 4.0
amd64 Linux
: P5 critical (vote)
: UCS 4.1-1-errata
Assigned To: Florian Best
Felix Botner
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-12 13:35 CET by Andreas Lemke
Modified: 2016-05-04 18:11 CEST (History)
4 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Ticket number:
Bug group (optional): Error handling, External feedback, Large environments
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Lemke 2015-11-12 13:35:06 CET
12.11.15 13:30:06.297  ACL         ( PROCESS ) : Allowed UMC operations:
12.11.15 13:30:06.297  ACL         ( PROCESS ) :  User  | Host                 | Flavor          | Command              | Options             
12.11.15 13:30:06.297  ACL         ( PROCESS ) : ******************************************************************************
12.11.15 13:30:06.297  ACL         ( PROCESS ) :  False | zeus                 | *               | appcenter/*          | {}                  
12.11.15 13:30:06.297  ACL         ( PROCESS ) :  False | zeus                 | *               | apps/*               | {}                  
12.11.15 13:30:06.297  ACL         ( PROCESS ) :  False | zeus                 | *               | diagnostic/*         | {}                  
12.11.15 13:30:06.297  ACL         ( PROCESS ) :  False | zeus                 | *               | ip/change            | {}                  
12.11.15 13:30:06.297  ACL         ( PROCESS ) :  False | zeus                 | *               | join/*               | {}                  
12.11.15 13:30:06.297  ACL         ( PROCESS ) :  False | zeus                 | *               | lib/*                | {}                  
12.11.15 13:30:06.297  ACL         ( PROCESS ) :  False | zeus                 | *               | lib/server/*         | {}                  
12.11.15 13:30:06.297  ACL         ( PROCESS ) :  False | zeus                 | *               | lib/server/reboot    | {}                  
12.11.15 13:30:06.297  ACL         ( PROCESS ) :  False | zeus                 | *               | lib/server/shutdown  | {}                  
12.11.15 13:30:06.297  ACL         ( PROCESS ) :  False | zeus                 | *               | lib/sso/*            | {}                  
12.11.15 13:30:06.298  ACL         ( PROCESS ) :  False | zeus                 | *               | mrtg/*               | {}                  
12.11.15 13:30:06.298  ACL         ( PROCESS ) :  False | zeus                 | *               | passwordchange/*     | {}                  
12.11.15 13:30:06.298  ACL         ( PROCESS ) :  False | zeus                 | *               | pkgdb/*              | {}                  
12.11.15 13:30:06.298  ACL         ( PROCESS ) :  False | zeus                 | *               | printers/*           | {}                  
12.11.15 13:30:06.298  ACL         ( PROCESS ) :  False | zeus                 | *               | quota/*              | {}                  
12.11.15 13:30:06.298  ACL         ( PROCESS ) :  False | zeus                 | *               | reboot/*             | {}                  
12.11.15 13:30:06.298  ACL         ( PROCESS ) :  False | zeus                 | *               | services/*           | {}                  
12.11.15 13:30:06.298  ACL         ( PROCESS ) :  False | zeus                 | *               | setup/*              | {}                  
12.11.15 13:30:06.298  ACL         ( PROCESS ) :  False | zeus                 | *               | sysinfo/*            | {}                  
12.11.15 13:30:06.298  ACL         ( PROCESS ) :  False | zeus                 | *               | top/*                | {}                  
12.11.15 13:30:06.298  ACL         ( PROCESS ) :  False | zeus                 | *               | ucr/*                | {}                  
12.11.15 13:30:06.299  ACL         ( PROCESS ) :  False | zeus                 | *               | udm/*                | {}                  
12.11.15 13:30:06.299  ACL         ( PROCESS ) :  False | zeus                 | *               | udm/request_new_license | {}                  
12.11.15 13:30:06.299  ACL         ( PROCESS ) :  False | zeus                 | *               | updater/*            | {}                  
12.11.15 13:30:06.299  ACL         ( PROCESS ) :  True  | zeus                 | *               | udm/*                | {}                  
12.11.15 13:30:06.299  ACL         ( PROCESS ) : 
12.11.15 13:30:06.376  MAIN        ( PROCESS ) : running: ['/usr/sbin/univention-management-console-module', '-m', 'udm', '-s', '/var/run/univention-management-console/24495-1447331406376.socket', '-d', '2', '-l', 'de_DE.UTF-8']
12.11.15 13:30:06.426  MODULE      ( PROCESS ) : Die Ausführung des Kommandos udm/get users/user ist fehlgeschlagen:

Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/univention/management/console/base.py", line 282, in execute
    function(self, request)
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/session.py", line 785, in handle_request_command
    mod_proc = ModuleProcess(module_name, debug=MODULE_DEBUG_LEVEL, locale=self.i18n.locale)
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/session.py", line 153, in __init__
    self.__pid = self.__process.start()
  File "/usr/lib/pymodules/python2.7/notifier/popen.py", line 135, in start
    self.pid =  subprocess.Popen( cmd, shell = self._shell ).pid
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1145, in _execute_child
    errpipe_read, errpipe_write = self.pipe_cloexec()
  File "/usr/lib/python2.7/subprocess.py", line 1101, in pipe_cloexec
    r, w = os.pipe()
OSError: [Errno 24] Zu viele offene Dateien

12.11.15 13:30:06.829  MAIN        ( ERROR   ) : Traceback (most recent call last):
  File "/usr/sbin/univention-management-console-server", line 212, in <module>
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action
  File "/usr/sbin/univention-management-console-server", line 144, in _restart
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start
  File "/usr/sbin/univention-management-console-server", line 194, in run
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 283, in loop
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 270, in step
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 152, in _receive
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 193, in _handle
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 426, in reload
  File "/usr/lib/pymodules/python2.7/univention/management/console/module.py", line 354, in load
    for filename in os.listdir(Manager.DIRECTORY):
OSError: [Errno 24] Zu viele offene Dateien: '/usr/share/univention-management-console/modules'
Comment 1 Andreas Lemke 2015-11-12 13:44:52 CET
Das Problem tritt erst auf wenn mehrere hundert oder tausend Verbindungen nacheinander hergestellt werden.
Comment 2 Florian Best univentionstaff 2015-11-12 13:49:43 CET
Danke für das Feedback. Wenn ich das richtig sehe stürzt der UMC-Dienst komplett ab in dem Fall? Oder werden die Anfragen nur nicht beantwortet?
Comment 3 Andreas Lemke 2015-11-12 13:51:05 CET
Der Dienst ist danach komplett weg. Wenn er sich wenigsten neu starten würde...
Comment 4 Alexander Kläser univentionstaff 2015-11-12 13:52:34 CET
(In reply to Andreas Lemke from comment #3)
> Der Dienst ist danach komplett weg. Wenn er sich wenigsten neu starten
> würde...

This has also been requested at Bug 33427.
Comment 5 Stefan Gohmann univentionstaff 2016-04-19 15:35:47 CEST
Move to 4.0-5-errata.
Comment 6 Florian Best univentionstaff 2016-04-22 10:30:11 CEST
Tagging to UCS 4.1-1 errata.

Playing around with some different max open file limits reveals crashes of the UMC-Server in various places.
By default the UMC-Server has 8 opened file descriptors.
PID=$(pgrep -f 'python.*console-server');
ls "/proc/$PID/fd/" | wc -l;
8
grep files "/proc/$PID/limits"
Max open files            1024                 4096                 files

With a limit of the current number of opened fds a crash during accepting new socket connections occur.
echo 'import resource; resource.setrlimit(resource.RLIMIT_NOFILE, (8, 8))' > /tmp/payload; pyrasite "$PID" /tmp/payload

Traceback (most recent call last):
  File "/usr/sbin/univention-management-console-server", line 235, in <module>
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action
  File "/usr/sbin/univention-management-console-server", line 153, in _restart
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start
  File "/usr/sbin/univention-management-console-server", line 203, in run
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 273, in step
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 404, in _connection
  File "/usr/lib/python2.7/socket.py", line 202, in accept
error: [Errno 24] Zu viele offene Dateien

With a limit of 2 + the number of opened file descriptors PAM fails to authenticate the user in a new socket connection:
echo 'import resource; resource.setrlimit(resource.RLIMIT_NOFILE, (10, 10))' > /tmp/payload; pyrasite "$PID" /tmp/payload

Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 151, in _receive
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 202, in _handle
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/session.py", line 134, in authenticate
  File "/usr/lib/pymodules/python2.7/univention/management/console/auth.py", line 99, in authenticate
  File "/usr/lib/pymodules/python2.7/univention/management/console/pam.py", line 118, in __init__
  File "/usr/lib/pymodules/python2.7/univention/management/console/pam.py", line 165, in init
error: ('Critical error - immediate abort', 26)

The signal handler for SIGHUP and SIGUSR1 are causing a crash during reloading modules/categories:
kill -10 "$PID"; kill -1 "$PID"

Traceback (most recent call last):
  File "/usr/sbin/univention-management-console-server", line 226, in <module>
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start
  File "/usr/sbin/univention-management-console-server", line 203, in run
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 219, in step
  File "/usr/sbin/univention-management-console-server", line 212, in signal_user1
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 428, in reload
  File "/usr/lib/pymodules/python2.7/univention/management/console/module.py", line 390, in load
OSError: [Errno 24] Zu viele offene Dateien: '/usr/share/univention-management-console/modules'

The same is the case when having limit = open_fd + 1 and a new connection with authentication comes in:
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 151, in _receive
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 200, in _handle
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 436, in reload
  File "/usr/lib/pymodules/python2.7/univention/management/console/module.py", line 390, in load
OSError: [Errno 24] Zu viele offene Dateien: '/usr/share/univention-management-console/modules

The hard limit of maximum opened files has been raised to 64512.
All places where the server could crash it now either sends the traceback to the client or closes the socket connection.

univention-management-console.yaml:
r68871 | YAML Bug #39909

univention-management-console (8.0.28-14):
r68870 | Bug #39909: autopep8
r68869 | Bug #39909: enhance robustness of UMC-Server during request handling.  Raise limit of max open files to a higher value.
Comment 7 Felix Botner univentionstaff 2016-04-29 15:18:51 CEST
Still get a traceback in umc after

PID=$(pgrep -f 'python.*console-server');
echo 'import resource; resource.setrlimit(resource.RLIMIT_NOFILE, (8, 8))' > /tmp/payload; pyrasite "$PID" /tmp/payload
kill -10 "$PID"; kill -1 "$PID"

29.04.16 15:16:55.064  MODULE      ( PROCESS ) : Die Ausführung des Kommandos setup/load network ist fehlgeschlagen:

Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/univention/management/console/base.py", line 283, in execute
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/session.py", line 791, in handle_request_command
  File "/usr/lib/pymodules/python2.7/univention/management/console/module.py", line 390, in load
OSError: [Errno 24] Zu viele offene Dateien: '/usr/share/univention-management-console/modules'

OK - new limit of umc server
Comment 8 Florian Best univentionstaff 2016-04-29 15:45:13 CEST
(In reply to Felix Botner from comment #7)
> Still get a traceback in umc after
This is okay, as the traceback is presented to the user in a dialog but doesn't cause the UMC-Server to crash anymore.
The error handling in general will be improved in the future but we can't add a try-except around every line in UMC :)
Comment 9 Felix Botner univentionstaff 2016-05-02 11:37:28 CEST
(In reply to Florian Best from comment #6) 
> The hard limit of maximum opened files has been raised to 64512.
> All places where the server could crash it now either sends the traceback to
> the client or closes the socket connection.

OK 
 
> univention-management-console.yaml:
> r68871 | YAML Bug #39909

OK
 
> univention-management-console (8.0.28-14):
> r68870 | Bug #39909: autopep8
> r68869 | Bug #39909: enhance robustness of UMC-Server during request
> handling.  Raise limit of max open files to a higher value.

OK
Comment 10 Janek Walkenhorst univentionstaff 2016-05-04 18:11:03 CEST
<http://errata.software-univention.de/ucs/4.1/170.html>