Bug 56828 - Global connection limit of ten between UMC-Server and all running module processes
Global connection limit of ten between UMC-Server and all running module proc...
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: UMC (Generic)
UCS 5.0
Other Linux
: P5 normal (vote)
: UCS 5.0-5-errata
Assigned To: Florian Best
Jürn Brodersen
https://git.knut.univention.de/univen...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2023-11-16 09:41 CET by Jürn Brodersen
Modified: 2023-11-29 14:56 CET (History)
3 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.286
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2023031621000374, 2023112321000265, 2023112321000211
Bug group (optional): Regression
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jürn Brodersen univentionstaff 2023-11-16 09:41:24 CET
Global connection limit of ten between umc server and all running module processes

`tornado.httpclient.AsyncHTTPClient` returns a global singleton by default. That has `max_clients` set to ten by default. If you start more requests, they are just queued and not started.

https://www.tornadoweb.org/en/stable/httpclient.html#tornado.httpclient.AsyncHTTPClient

- Why can this be a problem?
As long as all modules answer requests reasonable fast and not many users are active, this is fine. But with more users and slower modules these ten connections can quickly cloak up which prevents users from connecting to their module processes.

We noticed this in the computerroom module. Multiple users had the "Watch" overview page open which creates a lot of slow requests for screenshots from student computers. That results in an unusable slow umc for all users because requests to other modules, like the password reset module, end up in the same queue.

- How to reproduce:
To make testing easier I artificially slowed down a request. In my case the rooms requests from the computerroom. But it should work with any module.

Add to /usr/lib/python3/dist-packages/univention/management/console/modules/computerroom/__init__.py::"def rooms"
```
import time
time.sleep(60)
```

I used the following snippet for testing:
```
from univention.lib.umc import Client
import concurrent.futures

def load_room():
     return c.umc_command("computerroom/rooms", options={"school": "school1"}).data

def pool_runner():
     with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
         futures = [executor.submit(load_room) for i in range(100)]
         for future in concurrent.futures.as_completed(futures):
             print(future.result())

c = Client()
c.authenticate("Administrator", "univention")
pool_runner()
```
If you now try to open a umc module you should notice that it got significantly slower.

- Possible fix:
I propose to use `force_instance=true` in https://git.knut.univention.de/univention/ucs/-/blob/5.0-5/management/univention-management-console/src/univention/management/console/resources.py?ref_type=heads#L120 so each module process has its own independent instance. That would imitate the old behavior.
Comment 1 Florian Best univentionstaff 2023-11-16 09:48:40 CET
great, can you create a MR?
Comment 4 Florian Best univentionstaff 2023-11-21 20:36:40 CET
The suggested patch has been applied.

univention-management-console.yaml
773af91d1c2d | fix(umc): allow more than 10 parallel requests to module processes

univention-management-console (12.0.31-18)
773af91d1c2d | fix(umc): allow more than 10 parallel requests to module processes
…
    Note: Apache starts queuing requests at 150 open requests. See `ucr get apache2/maxclients`.
    If you want to test more extreme scenarios, you need to increase `apache2/max-request-workers` and `apache2/server-limit`.
    While `apache2/server-limit` needs to be at least as big as `apache2/max-request-workers`.
    Warning: each open connection equals one Apache process, which uses around 4 MB of RAM.

ucs-test (10.0.19-32)
c49452b58a29 | test(umc): add test for maximum connection and timeout
Comment 5 Jürn Brodersen univentionstaff 2023-11-28 00:36:04 CET
`joinscript()` only works on primaries and backups (missing credentials?) :(

```
root@slave:~# python3 -c "from univention.management.console.modules.ucstest import joinscript; joinscript()"
Object exists: cn=UMC,cn=univention,dc=ucs,dc=test
Object exists: cn=UMC,cn=policies,dc=ucs,dc=test
Object exists: cn=operations,cn=UMC,cn=univention,dc=ucs,dc=test
Object exists: cn=default-umc-all,cn=UMC,cn=policies,dc=ucs,dc=test
No modification: cn=Domain Admins,cn=groups,dc=ucs,dc=test
Object exists: cn=default-umc-users,cn=UMC,cn=policies,dc=ucs,dc=test
No modification: cn=Domain Users,cn=groups,dc=ucs,dc=test
Permission denied.
```

I adjusted the test to run only on a primary and increased the timeout a little to get it more stable.

Missed the test window for tonight...

[5.0-5 46e4e3b78d] Bug #56828: Run test case only on primary and increase timeout
Package: ucs-test
Version: 10.0.19-36
Branch: ucs_5.0-0
Scope: errata5.0-5
Comment 6 Jürn Brodersen univentionstaff 2023-11-29 09:21:56 CET
Tests look better now -> Verified