Univention Bugzilla – Bug 37367
umc-webserver hangs
Last modified: 2015-01-22 11:54:45 CET
Created attachment 6541 [details] test_umc_webserver_queue.sh The UMC-Webserver hangs when module processes does not answer to many requests. Attached is a script which can provoke this. Just execute the script, wait 5 minutes (5 = the values provided in umc/http/session/timeout). After this the server does not respond to any request anymore.
After the timeout apache answers with: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>502 Proxy Error</title> </head><body> <h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href="/umcp/command/ucr/query">POST /umcp/command/ucr/query</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p></p> <hr> <address>Apache/2.2.16 (Univention) Server at 10.200.27.10 Port 80</address> </body></html>
Of course a lot of dead sockets + symlinks: ls -l "/proc/$(cat /var/run/umc-web-server.pid)/fd" lrwx------ 1 root root 64 16. Dez 18:07 20 -> socket:[171939] lrwx------ 1 root root 64 16. Dez 18:07 21 -> /tmp/tmpu5r0Xr (deleted) lrwx------ 1 root root 64 16. Dez 18:07 22 -> socket:[171960] lrwx------ 1 root root 64 16. Dez 18:07 23 -> /tmp/tmpuQCYg3 (deleted) lrwx------ 1 root root 64 16. Dez 18:07 24 -> socket:[172078] lrwx------ 1 root root 64 16. Dez 18:07 25 -> /tmp/tmpc107Ff (deleted) lrwx------ 1 root root 64 16. Dez 18:07 26 -> socket:[172082] lrwx------ 1 root root 64 16. Dez 18:07 27 -> /tmp/tmpGGb4QI (deleted) lrwx------ 1 root root 64 16. Dez 18:07 28 -> socket:[172086] lrwx------ 1 root root 64 16. Dez 18:07 29 -> /tmp/tmp8wFjXj (deleted)
The last lines of the logfile: 16.12.14 17:52:15.817 PROTOCOL ( INFO ) : Sending UMCP COMMAND REQUEST 141874873580320-1687 16.12.14 17:53:35.222 MAIN ( INFO ) : SessionClient(0x2439a90): timed out! 16.12.14 17:53:35.232 MAIN ( INFO ) : SessionClient(0x2439a90): There are open requests (141874821073182-1664,141874821077562-1665). Postpone session shutdown 16.12.14 17:55:55.249 MAIN ( INFO ) : SessionClient(0x242a610): timed out! 16.12.14 17:55:55.249 MAIN ( INFO ) : SessionClient(0x242a610): Closing connection to UMC server 16.12.14 17:55:55.249 MAIN ( INFO ) : Open sessions: 9f62a8b8-49b8-443b-a12c-5e484c4d1e1b,9f11bd94-27d9-495f-b178-8003e506f3b1,92245ca6-a84d-448a-a649-26d07d29110d 16.12.14 17:55:55.249 MAIN ( INFO ) : Cleaning up session 9f62a8b8-49b8-443b-a12c-5e484c4d1e1b 16.12.14 18:02:20.828 MAIN ( INFO ) : SessionClient(0x2416490): timed out! 16.12.14 18:02:20.828 MAIN ( INFO ) : SessionClient(0x2416490): There are open requests (141874873576725-1684,141874870544253-1682,141874870181093-1681,141874869742753-1680,141874870645821-1683,141874873577771-1685,141874873580320-1687,141874873578477-1686). Postpone session shutdown
The main cause for this bug was that module processes which were dead blocked responding to the client. The UMC-Webserver (cherrypy) uses at maximum 10 threads (the process sums to maximum 14 threads) to respond to HTTP request. HTTP has no channeling and requires therefor the messages to be sent in the same order as the ingoing requests. Having 10 unanswered requests blocks therefore every further incoming request. When a module process dies the UMC-Server responds to the client (either the UMC-Web-Server or umc-command CLI tool) with the UMCP status code "510 Module Process Died". During responding an error occured. The bytes for the response message were not composed correctly so that a invalid UMCP message was sent to the client: 'RESPONSE/141882066551207-0/15/application/json: \n{"status": 510}' This produced the Exception: error (551, 'Unparsable message header') The exception was swallowed up (sent as 'error' signal, but nowhere exists a signal handler for 'error'). The fix consists of: * error handling of the 'error' signal in the UMC-webserver by invalidating every request * error handling of the 'error' signal in the umc-client by simply exiting with status code 1. * fixing of the syntactically broken UMCP message The web-server still can hang due to other requests which aren't answered (e.g. long polling requests). I will create a new bug for this with in-practice examples, soon. Fix: svn r56933, svn r56934, svn r56935, svn r56936, svn r56937, svn r56938 Packages: univention-management-console and univention-managment-console-frontend. YAMLs: ucs-4.0-0/doc/errata/staging/2014-12-17-univention-management-console.yaml ucs-4.0-0/doc/errata/staging/2014-11-24-univention-management-console-frontend.yaml ucs-3.2-3/doc/errata/staging/2014-12-17-univention-management-console-frontend.yaml ucs-3.2-4/doc/errata/staging/2014-12-17-univention-management-console-ucs-3.2-4/doc/errata/staging/2014-12-17-univention-management-console.yaml The packages does not have any cross dependencies. When upgrading only the Umc-Web-server the error handling can be tested (because the invalid UMCP response message is still sent then).
As discussed the changes in univention-management-console-frontend have to be reverted: If several requests are in the queue, the first parsing error will invalidate all requests → it should be sufficient to just drop the message and write a log entry. Otherwise it may do more harm than help. → REOPEN
(In reply to Sönke Schwardt-Krummrich from comment #5) > As discussed the changes in univention-management-console-frontend have to > be reverted: > If several requests are in the queue, the first parsing error will > invalidate all requests → it should be sufficient to just drop the message > and write a log entry. Otherwise it may do more harm than help. > > → REOPEN yes, changes in web-server have been reverted. client.py created log entry on parsing errors. YAML's have been adapted (removed, merged).
OK: code change OK: functional test OK: YAML scope: ucs_4.0-0-errata4.0-0 src: univention-management-console fix: 7.1.63-9.850.201412181202
*** Bug 27586 has been marked as a duplicate of this bug. ***
<http://errata.univention.de/ucs/4.0/44.html>