Bug 37367 - umc-webserver hangs
umc-webserver hangs
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: UMC (Generic)
UCS 4.0
Other Linux
: P5 normal (vote)
: UCS 4.0-0-errata
Assigned To: Florian Best
Sönke Schwardt-Krummrich
:
: 27586 (view as bug list)
Depends on:
Blocks: 37379 37380 37485 37487
  Show dependency treegraph
 
Reported: 2014-12-16 18:01 CET by Florian Best
Modified: 2015-01-22 11:54 CET (History)
4 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
test_umc_webserver_queue.sh (751 bytes, application/x-shellscript)
2014-12-16 18:01 CET, Florian Best
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Best univentionstaff 2014-12-16 18:01:20 CET
Created attachment 6541 [details]
test_umc_webserver_queue.sh

The UMC-Webserver hangs when module processes does not answer to many requests.

Attached is a script which can provoke this. Just execute the script, wait 5 minutes (5 = the values provided in umc/http/session/timeout).

After this the server does not respond to any request anymore.
Comment 1 Florian Best univentionstaff 2014-12-16 18:04:53 CET
After the timeout apache answers with:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/umcp/command/ucr/query">POST&nbsp;/umcp/command/ucr/query</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
<hr>
<address>Apache/2.2.16 (Univention) Server at 10.200.27.10 Port 80</address>
</body></html>
Comment 2 Florian Best univentionstaff 2014-12-16 18:09:16 CET
Of course a lot of dead sockets + symlinks:

ls -l "/proc/$(cat /var/run/umc-web-server.pid)/fd"
lrwx------ 1 root root 64 16. Dez 18:07 20 -> socket:[171939]
lrwx------ 1 root root 64 16. Dez 18:07 21 -> /tmp/tmpu5r0Xr (deleted)
lrwx------ 1 root root 64 16. Dez 18:07 22 -> socket:[171960]
lrwx------ 1 root root 64 16. Dez 18:07 23 -> /tmp/tmpuQCYg3 (deleted)
lrwx------ 1 root root 64 16. Dez 18:07 24 -> socket:[172078]
lrwx------ 1 root root 64 16. Dez 18:07 25 -> /tmp/tmpc107Ff (deleted)
lrwx------ 1 root root 64 16. Dez 18:07 26 -> socket:[172082]
lrwx------ 1 root root 64 16. Dez 18:07 27 -> /tmp/tmpGGb4QI (deleted)
lrwx------ 1 root root 64 16. Dez 18:07 28 -> socket:[172086]
lrwx------ 1 root root 64 16. Dez 18:07 29 -> /tmp/tmp8wFjXj (deleted)
Comment 3 Florian Best univentionstaff 2014-12-16 18:13:07 CET
The last lines of the logfile:
16.12.14 17:52:15.817  PROTOCOL    ( INFO    ) : Sending UMCP COMMAND REQUEST 141874873580320-1687
16.12.14 17:53:35.222  MAIN        ( INFO    ) : SessionClient(0x2439a90): timed out!
16.12.14 17:53:35.232  MAIN        ( INFO    ) : SessionClient(0x2439a90): There are open requests (141874821073182-1664,141874821077562-1665). Postpone session shutdown
16.12.14 17:55:55.249  MAIN        ( INFO    ) : SessionClient(0x242a610): timed out!
16.12.14 17:55:55.249  MAIN        ( INFO    ) : SessionClient(0x242a610): Closing connection to UMC server
16.12.14 17:55:55.249  MAIN        ( INFO    ) : Open sessions: 9f62a8b8-49b8-443b-a12c-5e484c4d1e1b,9f11bd94-27d9-495f-b178-8003e506f3b1,92245ca6-a84d-448a-a649-26d07d29110d
16.12.14 17:55:55.249  MAIN        ( INFO    ) : Cleaning up session 9f62a8b8-49b8-443b-a12c-5e484c4d1e1b
16.12.14 18:02:20.828  MAIN        ( INFO    ) : SessionClient(0x2416490): timed out!
16.12.14 18:02:20.828  MAIN        ( INFO    ) : SessionClient(0x2416490): There are open requests (141874873576725-1684,141874870544253-1682,141874870181093-1681,141874869742753-1680,141874870645821-1683,141874873577771-1685,141874873580320-1687,141874873578477-1686). Postpone session shutdown
Comment 4 Florian Best univentionstaff 2014-12-17 16:51:06 CET
The main cause for this bug was that module processes which were dead blocked responding to the client. The UMC-Webserver (cherrypy) uses at maximum 10 threads (the process sums to maximum 14 threads) to respond to HTTP request. HTTP has no channeling and requires therefor the messages to be sent in the same order as the ingoing requests. Having 10 unanswered requests blocks therefore every further incoming request.

When a module process dies the UMC-Server responds to the client (either the UMC-Web-Server or umc-command CLI tool) with the UMCP status code "510 Module Process Died". During responding an error occured. The bytes for the response message were not composed correctly so that a invalid UMCP message was sent to the client: 
'RESPONSE/141882066551207-0/15/application/json:  \n{"status": 510}'
This produced the Exception:
error (551, 'Unparsable message header')
The exception was swallowed up (sent as 'error' signal, but nowhere exists a signal handler for 'error').

The fix consists of:
* error handling of the 'error' signal in the UMC-webserver by invalidating every request
* error handling of the 'error' signal in the umc-client by simply exiting with status code 1.
* fixing of the syntactically broken UMCP message

The web-server still can hang due to other requests which aren't answered (e.g. long polling requests). I will create a new bug for this with in-practice examples, soon.

Fix: svn r56933, svn r56934, svn r56935, svn r56936, svn r56937, svn r56938
Packages: univention-management-console and univention-managment-console-frontend.
YAMLs: 
ucs-4.0-0/doc/errata/staging/2014-12-17-univention-management-console.yaml
ucs-4.0-0/doc/errata/staging/2014-11-24-univention-management-console-frontend.yaml
ucs-3.2-3/doc/errata/staging/2014-12-17-univention-management-console-frontend.yaml
ucs-3.2-4/doc/errata/staging/2014-12-17-univention-management-console-ucs-3.2-4/doc/errata/staging/2014-12-17-univention-management-console.yaml

The packages does not have any cross dependencies. When upgrading only the Umc-Web-server the error handling can be tested (because the invalid UMCP response message is still sent then).
Comment 5 Sönke Schwardt-Krummrich univentionstaff 2014-12-18 12:33:52 CET
As discussed the changes in univention-management-console-frontend have to be reverted:
If several requests are in the queue, the first parsing error will invalidate all requests → it should be sufficient to just drop the message and write a log entry. Otherwise it may do more harm than help.

→ REOPEN
Comment 6 Florian Best univentionstaff 2014-12-18 13:14:38 CET
(In reply to Sönke Schwardt-Krummrich from comment #5)
> As discussed the changes in univention-management-console-frontend have to
> be reverted:
> If several requests are in the queue, the first parsing error will
> invalidate all requests → it should be sufficient to just drop the message
> and write a log entry. Otherwise it may do more harm than help.
> 
> → REOPEN
yes, changes in web-server have been reverted. client.py created log entry on parsing errors.

YAML's have been adapted (removed, merged).
Comment 7 Sönke Schwardt-Krummrich univentionstaff 2014-12-18 13:30:24 CET
OK: code change
OK: functional test
OK: YAML

scope: ucs_4.0-0-errata4.0-0
src: univention-management-console
fix: 7.1.63-9.850.201412181202
Comment 8 Florian Best univentionstaff 2014-12-18 13:50:50 CET
*** Bug 27586 has been marked as a duplicate of this bug. ***
Comment 9 Janek Walkenhorst univentionstaff 2015-01-22 11:54:45 CET
<http://errata.univention.de/ucs/4.0/44.html>