Univention Bugzilla – Bug 56575
Quota module doesn't work with lot of users
Last modified: 2024-02-23 13:49:18 CET
I've created 3.000 users and activated quota for each user for /. Afterwards, I am unable to load the quota information for /. It reaches a 10 minute timeout and shows an error message.
Created attachment 11120 [details] management-console-server.log The error was shown at 6:48.
Created attachment 11121 [details] management-console-module-quota.log
Created attachment 11122 [details] patch
The quota module implementation has been changed to use a thread instead of a non-blocking subprocess. Fetching the user quotas for a partition was done via non-blocking `tornado.Subprocess()` instead of a thread. On systems with many users (e.g. 3000) the quota/users/query wasn't answered within 10 minutes, which caused apache to close the connection. The 10 minutes are not the umc/module/timeout (!) it also happens if that module timeout is below that value. It also doesn't happen when we make requests directly against UMC server instead of the Apache gateway in front of it. We don't know where the 10 minute timeout in apache comes from because the proxy-timeout we configured in apache is only 311 seconds. The subprocess definately doesn't take 10 minutes but milliseconds to be executed: $ time /usr/sbin/repquota -C -v /dev/mapper/vg_ucs-root real 0m0,162s user 0m0,018s sys 0m0,033s There is a bug in python-tornado which when the process output exceeds a certain buffer the subprocess hangs in a write() call. I created a bug report upstream: https://github.com/tornadoweb/tornado/issues/3323 Also fixed the handling of aborted requests in the UMC server: Since git:9044299c48c2319f62584244fba3b50d4bea71a5 Bug #56391 requests are aborted / canceled when the connection was closed. The module process ends itself when not active requests or running threads are there (after the module timeout). Therefore the module could just shut down after the connection was aborted. Since git:7e010571e85182a6154c050f329dc85c30674d2b (Bug #56198) the active requests inside of the UMC server weren't detected as active anymore because the request() methods have been changed from sync to async processing. Therefore the request IDs have been removed from the active-requests-list immediately instead of after the response was answered. univention-quota.yaml 252b8271c462 | fix(quota): fix aborted user query requests univention-quota (14.0.5-3) 252b8271c462 | fix(quota): fix aborted user query requests univention-management-console.yaml 7a364c1cb389 | fix(umc): fix detection of active requests univention-management-console (12.0.31-14) 7a364c1cb389 | fix(umc): fix detection of active requests
QA: - code review: OK - changelog and YAML: OK - package update: OK - System with 2000 users without update, request should be aborted after 10 minutes: OK - After update quota user overview loads instantly: OK
<https://errata.software-univention.de/#/?erratum=5.0x820> <https://errata.software-univention.de/#/?erratum=5.0x821>