Univention Bugzilla – Bug 56391
UMC-Server: never ending module process during app installation possible if socket connections were closed by client
Last modified: 2023-09-07 12:20:21 CEST
We fixed the below error in a way that there can be situations where module processes don't end after the module timeout because they think there is still an active connection. This is a very rare and theoretical case as most modules respond immediately or at least after some time which causes the module process to end. But we can fake a situation where the module process is never closed via the following steps: ucr set umc/module/timeout=30 umc/http/session-timeout=60 systemctl restart univention-management-console-server touch /var/run/univention-appcenter.lock curl http://Administrator:univention@localhost:8090/command/appcenter/keep_alive -X POST ^C # after some seconds just exit that request pgrep -fl 'univention-management-console-module.*appcenter' # process runs, seems to never stop.. +++ This bug was initially created as a clone of Bug #56336 +++ Version: 5.0-4 errata740 Error: Traceback (most recent call last): File "%PY3%/tornado/web.py", line 1595, in _execute result = yield result File "%PY3%/tornado/gen.py", line 1133, in run value = future.result() File "%PY3%/univention/management/console/resources.py", line 728, in get self.future = process.request(self.request.method, self.request.full_url(), body=self.request.body or None, headers=headers) File "%PY3%/univention/management/console/resources.py", line 263, in request response = self.do_request(method, uri, headers, body, self.socket) File "%PY3%/univention/management/console/resources.py", line 147, in do_request return self._wrap_future(self._client.fetch(request, raise_error=True)) File "%PY3%/tornado/httpclient.py", line 256, in fetch raise RuntimeError("fetch() called on closed AsyncHTTPClient") RuntimeError: fetch() called on closed AsyncHTTPClient
Probably the following exception occurs due to the current changes: 03.08.23 21:00:45.272 MAIN ( INFO ) : session 'b9f732eb-5327-49e0-9b8e-287431a7e96f' timed out 03.08.23 21:00:45.272 MAIN ( INFO ) : Cleaning up session 'b9f732eb-5327-49e0-9b8e-287431a7e96f' 03.08.23 21:00:45.272 MAIN ( WARN ) : Closing all sockets to currently opened module processes for this session. 03.08.23 21:00:45 ERROR ( 12681) : Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f5f33ff9d08>) Traceback (most recent call last): File "/usr/lib/python3/dist-packages/tornado/ioloop.py", line 758, in _run_callback ret = callback() File "/usr/lib/python3/dist-packages/tornado/stack_context.py", line 300, in null_wrapper return fn(*args, **kwargs) File "/usr/lib/python3/dist-packages/univention/management/console/session.py", line 255, in _session_timeout_timer self.processes.timed_out() File "/usr/lib/python3/dist-packages/univention/management/console/session.py", line 409, in timed_out proc.abort() File "/usr/lib/python3/dist-packages/univention/management/console/resources.py", line 125, in abort self._client.close() File "/usr/lib/python3/dist-packages/tornado/curl_httpclient.py", line 68, in close self._force_timeout_callback.stop() AttributeError: 'NoneType' object has no attribute 'stop'
Happend again. The error occurs during access of Domaine --> DNS.
The customer asked for the status here. Feedback from the customer: The UMC has become almost unusable since this bug, especially because you always have to click away the traceback 3-4 times before you have the possibility to log out and log in again. And it seems to occur in the @School modules as well. I raised the pain!
(In reply to Christina Scheinig from comment #2) > Happend again. > The error occurs during access of Domaine --> DNS. hmm, what happened again? We need exact details if something happens in customer enviroments. The general problem was fixed by Bug #56336. (In reply to Christina Scheinig from comment #3) > The customer asked for the status here. I am in vacation until end of August. Nothing is prepared here. > Feedback from the customer: The UMC > has become almost unusable since this bug, especially because you always > have to click away the traceback 3-4 times before you have the possibility > to log out and log in again. And it seems to occur in the @School modules as > well. which traceback? why is a re-login necessary? why is UMC not usable anymore? how often does it happen? can you reliably reproduce it? which exact UCS@school modules cause this?
In the specific session, the customer was logged into the UMC for about 2 minutes before the error message appeared. As you can see, first of all, the message can only be removed with multiple "close" prompts and only logging out and logging back in helps to be able to work again.
A workaround should exist, i.e. downgrade to the old UMC. It is described here: https://help.univention.com/t/21661
(In reply to Erik Damrose from comment #7) > A workaround should exist, i.e. downgrade to the old UMC. It is described > here: https://help.univention.com/t/21661 This should not be necessary. Maybe it's easier to go back to the original UMC version from UCS 5.0-4 without errata updates.
MR: https://git.knut.univention.de/univention/ucs/-/merge_requests/873
Bug #56336 actually made it worse as it caused Bug #56336 comment 3 exceptions not only on aborted requests but instead of timed out sessions. So UMC could get unusable after an amount of time in certain conditions (same file descriptor number was assigned for the new socket). Instead of closing the socket connection (which causes a bug in tornado) in the UMC-server to the module process now, we don't close the connection anymore but sending a HTTP request alogn with a request-ID to the moduleprocess which has a new "/cancel" endpoint. The "/cancel" endpoint picks the original request and abort it there, so that it is responded with any response. We don't care about that response as the real client already closed the connection and will never receive the answer. Then the module processes aren't blocked anymore to shutdown if no other active requests (meanwhile) exists. Additionally we don't end the module process if there are still working threads invoked by univention.mangement.console.modules.decorators.SimpleThread. univention-management-console.yaml a062d8b339ae | chore(umc): adjust advisory univention-management-console-server.yaml a062d8b339ae | chore(umc): adjust advisory 9044299c48c2 | fix(UMC): properly cancel aborted requests univention-management-console (12.0.31-12) 9044299c48c2 | fix(UMC): properly cancel aborted requests
OK: code review OK: no connections from UMC-server to module processes are aborted manually anymore. Therefore the exceptions cannot happen anymore. OK: a cancel request is send to the module process when the connection breaks up, which then causes that the request is responded OK: active threads prevent module shutdown OK: advisory description
<https://errata.software-univention.de/#/?erratum=5.0x792>