Bug 48157 - UMC: Internal server error after opening UMC portal
UMC: Internal server error after opening UMC portal
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: UMC (Generic)
UCS 4.3
Other Linux
: P5 normal (vote)
: UCS 4.4-0-errata
Assigned To: Florian Best
Ole Schwiegert
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-11-16 09:06 CET by Felix Botner
Modified: 2019-05-08 13:26 CEST (History)
1 user (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 6: Setup Problem: Issue for the setup process
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.206
Enterprise Customer affected?:
School Customer affected?:
ISV affected?: Yes
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
portal.jpg (802.25 KB, image/jpeg)
2018-11-16 09:07 CET, Felix Botner
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Botner univentionstaff 2018-11-16 09:06:55 CET
management-console-server.log:

  File "/usr/sbin/univention-management-console-server", line 243, in <module>
    umc_daemon.do_action()
  File "/usr/lib/python2.7/dist-packages/daemon/runner.py", line 267, in do_action
    func(self)
  File "/usr/lib/python2.7/dist-packages/daemon/runner.py", line 186, in _start
    self.app.run()
  File "/usr/sbin/univention-management-console-server", line 210, in run
    notifier.loop()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 287, in loop
    step()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 243, in step
    if not timer[ CALLBACK ]():
  File "/usr/lib/pymodules/python2.7/notifier/__init__.py", line 104, in __call__
    return self._function( *tmp, **self._kwargs )
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/session.py", line 544, in _mod_connect
    _send_error()
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/session.py", line 530, in _send_error
    self.result(res)
  File "/usr/lib/pymodules/python2.7/univention/management/console/base.py", line 466, in result
    self.signal_emit('success', response)
  File "/usr/lib/pymodules/python2.7/notifier/signals.py", line 75, in signal_emit
    self.__signals[ signal ].emit( *args )
  File "/usr/lib/pymodules/python2.7/notifier/signals.py", line 41, in emit
    cb( *args )
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/session.py", line 1004, in _response
    self.signal_emit('success', response)
  File "/usr/lib/pymodules/python2.7/notifier/signals.py", line 75, in signal_emit
    self.__signals[ signal ].emit( *args )
  File "/usr/lib/pymodules/python2.7/notifier/signals.py", line 41, in emit
    cb( *args )
  File "/usr/lib/pymodules/python2.7/notifier/__init__.py", line 104, in __call__
    return self._function( *tmp, **self._kwargs )
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 218, in _response
    notifier.socket_add(state.socket, self._do_send, notifier.IO_WRITE)
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 88, in socket_add
    fd = _get_fd( id )
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 72, in _get_fd
    return obj.fileno()
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
  File "/usr/lib/python2.7/socket.py", line 174, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor
Comment 1 Felix Botner univentionstaff 2018-11-16 09:07:15 CET
Created attachment 9747 [details]
portal.jpg
Comment 2 Felix Botner univentionstaff 2019-01-17 17:18:40 CET
Also happens in branch tests, added workaround in restart_umc_bug_48157 in run_workarounds_before_starting_the_tests, please remove if this is fixed
Comment 3 Florian Best univentionstaff 2019-05-07 10:51:39 CEST
Let's analyze the stacktrace a little bit:

>   File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/session.py", line 544, in _mod_connect
>     _send_error()
This means, we tried connecting to a module process but the module process did not start a UNIX socket after 200 connection retries.
Most often this means that the server has not enough RAM. Sometimes it also occurrs if the module process crashed due to an ImportError.

>   File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/session.py", line 530, in _send_error
>     self.result(res)
Here we are trying to send an error message to the connected client - in our case the UMC-Webserver.

>   File "/usr/lib/pymodules/python2.7/univention/management/console/base.py", line 466, in result
>     self.signal_emit('success', response)
This is done by emiting the notifier signal "success".
It then calls all signal handlers for "success".
…
>   File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/session.py", line 1004, in _response
>     self.signal_emit('success', response)
In the signal handler for "success" another signal "success" is emited, which again calls some other handlers.

>   File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 218, in _response
>     notifier.socket_add(state.socket, self._do_send, notifier.IO_WRITE)
Now we are in the actual signal handler which wants to write the response back to the UMC-Webserver client.
There we are inside any "SSL.WantReadError, SSL.WantWriteError, SSL.WantX509LookupError" exception handler.
Meaning: We are working with non-blocking sockets.
Something needs to be done e.g. the TLS handshake or resending some data, etc.

Therefore we are trying to re-add the socket.
Here the exception in the lower frames should probably be handled.
We are getting socket.error(errno=EBADF, "Bad file descriptor").
This probably means the socket was already closed by another instance, (e.g. garbage collector?).

Therefore I am suggesting this patch:

diff --git a/management/univention-management-console/src/univention/management/console/protocol/server.py b/management/univention-management-console/src/univention/management/console/protocol/server.py
index 195a30bfd4..2876a141cb 100644
--- a/management/univention-management-console/src/univention/management/console/protocol/server.py
+++ b/management/univention-management-console/src/univention/management/console/protocol/server.py
@@ -215,8 +215,12 @@ class MagicBucket(object):
                                        del state.requests[msg.id]
                except (SSL.WantReadError, SSL.WantWriteError, SSL.WantX509LookupError):
                        CRYPT.info('UMCP: SSL error need to re-send chunk')
-                       notifier.socket_add(state.socket, self._do_send, notifier.IO_WRITE)
-                       state.resend_queue.append(data)
+                       try:
+                               notifier.socket_add(state.socket, self._do_send, notifier.IO_WRITE)
+                               state.resend_queue.append(data)
+                       except socket.error as error:
+                               CRYPT.warn('Socket error in _response: %s. Probably the socket was closed by the client.' % str(error))
+                               self._cleanup(state.socket)
                except (SSL.SysCallError, SSL.Error, socket.error) as error:
                        CRYPT.warn('SSL error in _response: %s. Probably the socket was closed by the client.' % str(error))
                        self._cleanup(state.socket)
Comment 4 Florian Best univentionstaff 2019-05-07 11:59:41 CEST
I tried a lot of things to reproduce it but could only by manipulating some code.
The error condition is very hard to trigger, you have to make sure that the socket needs to resend data.
There was a follow up error, so even if that "Bad file descriptor" error occurred the server would crash anyway.

To trigger this:
Add "import time; time.sleep(10)" to the first line in /usr/share/pyshared/univention/management/console/modules/appcenter/__init__.py.
Modify server.py in _response after the line "data = str(msg)"
if msg.status == 511:
    state.socket.close()
    raise SSL.WantReadError()

Call umc-command -U Administrator -P univention appcenter/packages/query -o section=all -o key=package -o pattern='*'
→ TADA: same traceback

My fix causes that the connection is closed, causing the error message in umc-client:
"Error: The connection to UMC was closed unexpectedly. Make sure the server is running"

And UMC displays a pop up:
"The connection to the Univention Management Console Server broke up unexpectedly."

This is similar to the Screenshot in attachment 9747 [details] but better than a server crash.

The traceback which would occurr if the socket was not closed would be:
04.05.19 03:14:34.379  MAIN        ( INFO    ) : No connection to module process yet
04.05.19 03:14:34.429  MAIN        ( INFO    ) : Connection to module appcenter process failed
04.05.19 03:14:34.431  SSL         ( INFO   ) : UMCP: SSL error need to re-send chunk
04.05.19 03:14:34.431  MAIN        ( PROCESS ) : ModuleProcess: dying
04.05.19 03:14:34.431  MAIN        ( PROCESS ) : ModuleProcess: child stopped
04.05.19 03:14:34.433  MAIN        ( ERROR   ) : Traceback (most recent call last):
  File "/usr/sbin/univention-management-console-server", line 243, in <module>
    umc_daemon.do_action()
  File "/usr/lib/python2.7/dist-packages/daemon/runner.py", line 267, in do_action
    func(self)
  File "/usr/lib/python2.7/dist-packages/daemon/runner.py", line 186, in _start
    self.app.run()
  File "/usr/sbin/univention-management-console-server", line 210, in run
    notifier.loop()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 287, in loop
    step()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 274, in step
    not __sockets[ cond ][ fd ]( sock_obj ):
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 170, in _do_send
    id, first = state.resend_queue.pop(0)
ValueError: too many values to unpack

Fixed in:

univention-management-console.yaml
f28d88b15d57 | YAML Bug #48157
ee99cae6806e | YAML Bug #48157

univention-management-console (11.0.4-12)
d90da5d96cf4 | Bug #48157: prevent UMC server crash in rare situations
Comment 5 Ole Schwiegert univentionstaff 2019-05-08 09:52:24 CEST
Changelog&Advisory: OK
Package installs: OK

I could reproduce the 'original' error as described with the manual code changes. After updating to the test packages and reintroducing the same code modifications to cause the error again I encountered the handling introduced in the fix as expected. The exception is handled and the error popup shown. The server crash is prevented.

I will deactivate the workaround for the jenkins tests mentioned in comment #2
Comment 6 Arvid Requate univentionstaff 2019-05-08 13:26:17 CEST
<http://errata.software-univention.de/ucs/4.4/84.html>