Bug 56575 - Quota module doesn't work with lot of users
Quota module doesn't work with lot of users
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: UMC - Quota
UCS 5.0
Other Linux
: P5 normal (vote)
: UCS 5.0-5-errata
Assigned To: Florian Best
Marius Meschter
https://git.knut.univention.de/univen...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2023-09-13 06:58 CEST by Stefan Gohmann
Modified: 2024-02-23 13:49 CET (History)
3 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.171
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2023082421000126
Bug group (optional): Regression
Max CVSS v3 score:


Attachments
management-console-server.log (477.19 KB, text/plain)
2023-09-13 06:59 CEST, Stefan Gohmann
Details
management-console-module-quota.log (8.50 KB, text/plain)
2023-09-13 06:59 CEST, Stefan Gohmann
Details
patch (3.43 KB, patch)
2023-09-13 09:58 CEST, Florian Best
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Gohmann univentionstaff 2023-09-13 06:58:40 CEST
I've created 3.000 users and activated quota for each user for /. Afterwards, I am unable to load the quota information for /. It reaches a 10 minute timeout and shows an error message.
Comment 1 Stefan Gohmann univentionstaff 2023-09-13 06:59:25 CEST
Created attachment 11120 [details]
management-console-server.log

The error was shown at 6:48.
Comment 2 Stefan Gohmann univentionstaff 2023-09-13 06:59:50 CEST
Created attachment 11121 [details]
management-console-module-quota.log
Comment 3 Florian Best univentionstaff 2023-09-13 09:58:23 CEST
Created attachment 11122 [details]
patch
Comment 4 Florian Best univentionstaff 2023-09-19 12:48:40 CEST
The quota module implementation has been changed to use a thread instead of a non-blocking subprocess.

    Fetching the user quotas for a partition was done via non-blocking `tornado.Subprocess()` instead of a thread.
    On systems with many users (e.g. 3000) the quota/users/query wasn't answered within 10 minutes, which caused apache to close the connection.
    The 10 minutes are not the umc/module/timeout (!) it also happens if that module timeout is below that value.
    It also doesn't happen when we make requests directly against UMC server instead of the Apache gateway in front of it.
    We don't know where the 10 minute timeout in apache comes from because the proxy-timeout we configured in apache is only 311 seconds.
    
    The subprocess definately doesn't take 10 minutes but milliseconds to be executed:
    $ time /usr/sbin/repquota -C -v /dev/mapper/vg_ucs-root
    real    0m0,162s
    user    0m0,018s
    sys     0m0,033s

    There is a bug in python-tornado which when the process output exceeds a certain buffer the subprocess hangs in a write() call.
    I created a bug report upstream: https://github.com/tornadoweb/tornado/issues/3323


Also fixed the handling of aborted requests in the UMC server:

    Since git:9044299c48c2319f62584244fba3b50d4bea71a5 Bug #56391 requests are aborted / canceled when the connection was closed.
    The module process ends itself when not active requests or running threads are there (after the module timeout).
    Therefore the module could just shut down after the connection was aborted.

    Since git:7e010571e85182a6154c050f329dc85c30674d2b (Bug #56198) the
    active requests inside of the UMC server weren't detected as active
    anymore because the request() methods have been changed from sync to
    async processing. Therefore the request IDs have been removed from the
    active-requests-list immediately instead of after the response was
    answered.



univention-quota.yaml
252b8271c462 | fix(quota): fix aborted user query requests

univention-quota (14.0.5-3)
252b8271c462 | fix(quota): fix aborted user query requests

univention-management-console.yaml
7a364c1cb389 | fix(umc): fix detection of active requests

univention-management-console (12.0.31-14)
7a364c1cb389 | fix(umc): fix detection of active requests
Comment 5 Marius Meschter univentionstaff 2023-09-19 16:19:47 CEST
QA:
- code review: OK
- changelog and YAML: OK
- package update: OK
- System with 2000 users without update, request should be aborted after 10 minutes: OK
- After update quota user overview loads instantly: OK