Univention Bugzilla – Bug 50050
[UDM HTTP API] server does not scale horizontally
Last modified: 2022-02-02 16:05:08 CET
The UDM HTTP API service uses only 1 process. Under medium load UDM is CPU-bound, and thus the UDM HTTP API cannot deliver speedups available in multi-cpu/core systems. In a 4-core system creating 1000 users sequentially, with 3 or 4 processes makes a big difference: parallelism=1: Seconds for creating 1000 Users: 127,57 Seconds for reading 1000 Users: 3,15 Seconds for modifying 1000 Users: 23,32 Seconds for deleting 1000 Users: 71,95 parallelism=3: Seconds for creating 1000 Users: 47,55 Seconds for reading 1000 Users: 1,25 Seconds for modifying 1000 Users: 17,93 Seconds for deleting 1000 Users: 42,69 parallelism=4: Seconds for creating 1000 Users: 47,53 Seconds for reading 1000 Users: 1,04 Seconds for modifying 1000 Users: 22,83 Seconds for deleting 1000 Users: 53,81 The UDM REST API service should spawn [num-cores]-1 processes to be able to deliver the maximum speed available on the system.
Using directory/manager/rest/cpus starts more processes, but still only one is used.
Created attachment 10184 [details] log the PID
Started with ucr set directory/manager/rest/cpus=3: ------------------------------------------------------------------------------------ CGroup: /system.slice/univention-directory-manager-rest.service ├─32644 /usr/bin/python2.7 -m univention.admin.rest.server ├─32647 /usr/bin/python2.7 -m univention.admin.rest -s ... -l de_DE run ├─32648 /usr/bin/python2.7 -m univention.admin.rest -s ... -l en_US run ├─32653 /usr/bin/python2.7 -m univention.admin.rest -s ... -l de_DE run ├─32654 /usr/bin/python2.7 -m univention.admin.rest -s ... -l de_DE run ├─32655 /usr/bin/python2.7 -m univention.admin.rest -s ... -l de_DE run ├─32656 /usr/bin/python2.7 -m univention.admin.rest -s ... -l en_US run ├─32657 /usr/bin/python2.7 -m univention.admin.rest -s ... -l en_US run └─32658 /usr/bin/python2.7 -m univention.admin.rest -s ... -l en_US run ------------------------------------------------------------------------------------ The client has 20 connections open to the server: ------------------------------------------------------------------------------------ root@m66:/sync/univention-directory-manager-rest# lsof -Pni | grep 10.200.3.68 apache2 301 www-data 14u IPv6 49940871 0t0 TCP 10.200.3.66:80->10.200.3.68:55996 (ESTABLISHED) apache2 302 www-data 14u IPv6 49940865 0t0 TCP 10.200.3.66:80->10.200.3.68:55986 (ESTABLISHED) [..] ------------------------------------------------------------------------------------ root@m66:/sync/univention-directory-manager-rest# lsof -Pni | grep 10.200.3.68 | wc -l 20 ------------------------------------------------------------------------------------ Bug only 1 process ever connectes to the LDAP server: ------------------------------------------------------------------------------------ root@m66:/sync/univention-directory-manager-rest# lsof -Pni | egrep '389|636' | grep py python2.7 32658 root 10u IPv4 49936509 0t0 TCP 10.200.3.66:45230->10.200.3.66:7389 (ESTABLISHED) python2.7 32658 root 11u IPv4 49937413 0t0 TCP 10.200.3.66:45232->10.200.3.66:7389 (ESTABLISHED) ------------------------------------------------------------------------------------ The patch logs the PID of the process handling a request - only 1 process is at work: ------------------------------------------------------------------------------------ ==> /var/log/univention/directory-manager-rest.log <== 16.09.19 17:23:54.891 MODULE ( WARN ) : [32647] Initialized. 16.09.19 17:23:54.965 MODULE ( WARN ) : [32648] Initialized. 16.09.19 17:24:07.892 MODULE ( WARN ) : [32658] Getting fresh LDAP connection. 16.09.19 17:24:08.193 ADMIN ( WARN ) : Unknown type for property 'school': ucsschoolSchools 16.09.19 17:24:08.195 ADMIN ( WARN ) : Unknown type for property 'mailHomeServer': MailHomeServer 16.09.19 17:24:08.199 ADMIN ( WARN ) : Unknown type for property 'school': ucsschoolSchools 16.09.19 17:24:08.200 ADMIN ( WARN ) : Unknown type for property 'mailHomeServer': MailHomeServer [I 190916 17:24:08 web:1971] 200 GET /udm/users/user/add (0.0.0.0) 427.46ms 16.09.19 17:24:08.331 MODULE ( WARN ) : [32658] Using cached LDAP connection. [I 190916 17:24:08 web:1971] 301 GET /udm/ldap/base/ (0.0.0.0) 2.04ms 16.09.19 17:24:08.336 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:08.417 MODULE ( ERROR ) : Identified modules ['policies/thinclient'] for cn=default-settings,cn=thinclient,cn=policies,dc=uni,dc=dtr (flavor=None) does not have a relating UDM module. 16.09.19 17:24:08.418 MODULE ( ERROR ) : Identified modules ['policies/admin_user'] for cn=default-users,cn=admin-settings,cn=users,cn=policies,dc=uni,dc=dtr (flavor=None) does not have a relating UDM module. [I 190916 17:24:08 web:1971] 200 GET /udm/container/dc/dc=uni,dc=dtr (0.0.0.0) 83.95ms 16.09.19 17:24:10.693 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:10.694 MODULE ( WARN ) : [32658] Create users/user... 16.09.19 17:24:10.702 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:10.704 MODULE ( WARN ) : [32658] Create users/user... 16.09.19 17:24:10.711 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:10.713 MODULE ( WARN ) : [32658] Create users/user... 16.09.19 17:24:10.735 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:10.737 MODULE ( WARN ) : [32658] Create users/user... 16.09.19 17:24:10.798 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:10.802 MODULE ( WARN ) : [32658] Create users/user... 16.09.19 17:24:10.827 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:10.829 MODULE ( WARN ) : [32658] Create users/user... 16.09.19 17:24:11.046 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:11.054 MODULE ( WARN ) : [32658] Create users/user... [I 190916 17:24:11 web:1971] 201 POST /udm/users/user/ (0.0.0.0) 950.53ms 16.09.19 17:24:11.650 MODULE ( WARN ) : [32658] Using cached LDAP connection. [I 190916 17:24:11 web:1971] 201 POST /udm/users/user/ (0.0.0.0) 1007.57ms [I 190916 17:24:11 web:1971] 201 POST /udm/users/user/ (0.0.0.0) 1017.71ms 16.09.19 17:24:11.726 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:11.771 MODULE ( WARN ) : [32658] Create users/user... 16.09.19 17:24:11.815 MODULE ( WARN ) : [32658] Using cached LDAP connection. [I 190916 17:24:11 web:1971] 201 POST /udm/users/user/ (0.0.0.0) 1116.85ms 16.09.19 17:24:12.057 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:12.078 MODULE ( WARN ) : [32658] Create users/user... [I 190916 17:24:12 web:1971] 201 POST /udm/users/user/ (0.0.0.0) 1430.27ms 16.09.19 17:24:12.220 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:12.233 MODULE ( WARN ) : [32658] Create users/user... 16.09.19 17:24:12.301 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:12.463 ADMIN ( WARN ) : Unknown type for property 'school': ucsschoolSchools 16.09.19 17:24:12.464 ADMIN ( WARN ) : Unknown type for property 'mailHomeServer': MailHomeServer [I 190916 17:24:12 web:1971] 200 GET /udm/users/user/uid=testalxrfuxlvd,cn=lehrer,cn=users,ou=DEMOSCHOOL,dc=uni,dc=dtr (0.0.0.0) 821.35ms 16.09.19 17:24:12.527 ADMIN ( WARN ) : Unknown type for property 'school': ucsschoolSchools 16.09.19 17:24:12.529 ADMIN ( WARN ) : Unknown type for property 'mailHomeServer': MailHomeServer [I 190916 17:24:12 web:1971] 200 GET /udm/users/user/uid=testlkn56cxqvh,cn=lehrer,cn=users,ou=DEMOSCHOOL,dc=uni,dc=dtr (0.0.0.0) 237.47ms 16.09.19 17:24:12.579 MODULE ( WARN ) : [32658] Using cached LDAP connection. 16.09.19 17:24:12.629 MODULE ( WARN ) : [32658] Create users/user... [I 190916 17:24:12 web:1971] 201 POST /udm/users/user/ (0.0.0.0) 1826.55ms ------------------------------------------------------------------------------------ The one process uses all the COU time: ------------------------------------------------------------------------------------ USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND [..] root 32644 1.3 2.6 602788 108712 ? Ss 17:23 0:06 /usr/bin/python2.7 -m univention.admin.rest.server root 32647 0.2 2.7 601368 109992 ? S 17:23 0:01 /usr/bin/python2.7 -m univention.admin.rest -s /var/r root 32648 0.2 2.7 601188 109916 ? S 17:23 0:01 /usr/bin/python2.7 -m univention.admin.rest -s /var/r root 32653 0.0 2.1 601368 88676 ? S 17:23 0:00 /usr/bin/python2.7 -m univention.admin.rest -s /var/r root 32654 0.0 2.1 601368 88676 ? S 17:23 0:00 /usr/bin/python2.7 -m univention.admin.rest -s /var/r root 32655 0.0 2.1 601368 88676 ? S 17:23 0:00 /usr/bin/python2.7 -m univention.admin.rest -s /var/r root 32656 0.0 2.1 601188 88788 ? S 17:23 0:00 /usr/bin/python2.7 -m univention.admin.rest -s /var/r root 32657 0.0 2.1 601188 88788 ? S 17:23 0:00 /usr/bin/python2.7 -m univention.admin.rest -s /var/r root 32658 18.1 3.9 3005452 160368 ? Sl 17:23 1:30 /usr/bin/python2.7 -m univention.admin.rest -s /var/r
I think it's either the middleware component or there is a bug in starting the subprocesses: Starting the server without a middleware: # service univention-directory-manager-rest stop # python -m univention.admin.rest -p 9979 -l en_US -d2 -c 20 run → causes a lot of: Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/dist-packages/univention/admin/rest/__main__.py", line 151, in <module> Server.main() File "/usr/lib/python2.7/dist-packages/univention/admin/rest/__main__.py", line 147, in main args.func(args) File "/usr/lib/python2.7/dist-packages/univention/admin/rest/__main__.py", line 88, in run server.listen(args.port) File "/usr/lib/python2.7/dist-packages/tornado/tcpserver.py", line 126, in listen sockets = bind_sockets(port, address=address) File "/usr/lib/python2.7/dist-packages/tornado/netutil.py", line 193, in bind_sockets sock.bind(sockaddr) File "/usr/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 98] Address already in use
From what I understand in https://www.tornadoweb.org/en/stable/guide/running.html#processes-and-ports it should be server.bind() and not server.listen(). But I don't think the way subprocess are created atm is how tornado was designed to work. I suggest to follow the advise at the end of the linked section and fork multiple individual instances on separate ports/sockets. Then use apache/nginx to load-balance them. The routing-by-language can be done by those.
(In reply to Daniel Tröder from comment #5) > From what I understand in > https://www.tornadoweb.org/en/stable/guide/running.html#processes-and-ports > it should be server.bind() and not server.listen(). Oh yes! This was fixed in: univention-directory-manager-rest (9.0.15-8) 72c39668a741 | Bug #27816: Bug #50050: Fix TCP socket bind() Now it works for me: python -m univention.admin.rest -p 9979 -c 0 run forks X different processes and some random processes are used to respond. Alternative to attachment 10184 [details] / comment 2: I added a logging of the process ID in the access_log logging handler, there you can see the pid for each request. > But I don't think the way subprocess are created atm is how tornado was > designed to work. > I suggest to follow the advise at the end of the linked section and fork > multiple individual instances on separate ports/sockets. Then use > apache/nginx to load-balance them. The routing-by-language can be done by > those. I thought so in the beginnig, too and did some hours of research. We don't have nginx set up. We have apache. And apache sucks in parsing HTTP request headers. I didn't find a way to route by a valid differentiation about Accept-Language, except for writing an own apache module (/plugin for some module which provides such differentiation).
(In reply to Florian Best from comment #6) > (In reply to Daniel Tröder from comment #5) > Now it works for me: > python -m univention.admin.rest -p 9979 -c 0 run > forks X different processes and some random processes are used to respond. Doesn't work for me. All requests are handled by the same process (as can now be seen in the log file). I didn't start it with "python -m" though, but through the Debian service: $ ucr set directory/manager/rest/cpus=4 $ service univention-directory-manager-rest restart > Alternative to attachment 10184 [details] / comment 2: > I added a logging of the process ID in the access_log logging handler, there > you can see the pid for each request. Nice. The width can be reduced to 5 in the log format "(%(process)9d)". > > But I don't think the way subprocess are created atm is how tornado was > > designed to work. > > I suggest to follow the advise at the end of the linked section and fork > > multiple individual instances on separate ports/sockets. Then use > > apache/nginx to load-balance them. The routing-by-language can be done by > > those. > I thought so in the beginnig, too and did some hours of research. > We don't have nginx set up. We have apache. And apache sucks in parsing HTTP > request headers. I didn't find a way to route by a valid differentiation > about Accept-Language, except for writing an own apache module (/plugin for > some module which provides such differentiation). Huh - that's unfortunate - Apache… :(
I received a patch from Florian that makes the UDM REST API server use a configurable number of processes -> multi processing. On a 8 core machine, starting 4 API processes that lead to a 2,5 times speedup for creating an modifying users in parallel. That is significant! In UCS@school we are working on changing the creation of exam users from a UMC module to using the Kelvin API. The Kelvin API uses the UDM REST API. Creating the users sequentially will be slower through Kelvin->UDM-REST than through UDM/LDAP. But creating them in parallel can be must faster, if the UDM REST API can be executed in a multi-process model. Having this implemented is a hard requirement for our work to switch our backend to the Kelvin API.
(In reply to Daniel Tröder from comment #8) > I received a patch from Florian that makes the UDM REST API server use a > configurable number of processes -> multi processing. > On a 8 core machine, starting 4 API processes that lead to a 2,5 times > speedup for creating an modifying users in parallel. > That is significant! Can you please attach this patch to the bug?
Created attachment 10506 [details] Florians patch to enable multiple UDM REST API processes
Regarding the patch: In my test VM I am using '-c $(nproc)' instead of '-c 4', to always start as much processes as there are CPU cores. For the product I suggest add this suggestion to the UCRV description and to also mention it in the performance manual.
(In reply to Daniel Tröder from comment #13) > Regarding the patch: In my test VM I am using '-c $(nproc)' instead of '-c > 4', to always start as much processes as there are CPU cores. '-c $(nproc)' should be equivalent to "-c 0".
(In reply to Florian Best from comment #14) > '-c $(nproc)' should be equivalent to "-c 0". Confirmed.
We can't just apply the patch in the product because it removes the functionality for multi-languages and the internal state is not shared between the processes. internal state exists e.g. for asynchronous operations like moving objects. when you move an object you simply make a request which is accepted and responded immediately, in the background the move operation runs (in a thread) and the client has the possibility to request the current state. another example is, the not-officially-used pagination feature, which needs to store a LDAP cookie. Therefore we have to implement some shared memory. For Python 3.8 there are built in libraries. We need to backport one of them to Python 2.7.
(In reply to Florian Best from comment #16) > the internal state is not shared > between the processes. > internal state exists e.g. for asynchronous operations like moving objects. > when you move an object you simply make a request which is accepted and > responded immediately, in the background the move operation runs (in a > thread) and the client has the possibility to request the current state. I have an idea for a workaround for this problem: When a client starts a move, it gets redirected to a URL. If that URL has a parameter (like "pid=12345") that makes the main process forward it to a specific worker process, it would stick for that operation to the same process.
Won't work: The process where your request is processed is random.
(In reply to Florian Best from comment #18) > Won't work: The process where your request is processed is random. I mean it like this: 1. client send "move" request, in the reply is a redirect URL for retrieving the status and ultimately the result - including a "processor"-parameter. 2. the client connects to the URL and the UDM REST API main process forwards the request not to a random worker process, but instead to the one in the "processor"-parameter.
(In reply to Daniel Tröder from comment #19) > (In reply to Florian Best from comment #18) > > Won't work: The process where your request is processed is random. > I mean it like this: > 1. client send "move" request, in the reply is a redirect URL for retrieving > the status and ultimately the result - including a "processor"-parameter. That is already the case. > 2. the client connects to the URL and the UDM REST API main process forwards > the request not to a random worker process, but instead to the one in the > "processor"-parameter. the main process cannot choose a process - the linux kernel selects one randomly. I could in the random process make an extra request to a certain other process but this adds extra complexity because I need to create different/additional sockets for each process and be aware of the URLs of other processes then. That's probably more efford than sharing the memory.
redis is an alternative to classical shared memory: https://redis.io/ https://github.com/andymccurdy/redis-py (https://github.com/thefab/tornadis)
https://git.knut.univention.de/univention/ucs/-/merge_requests/153 Merge Request 153 adds multiprocessing to the UDM REST API by: * backporting multiprocessing.shared_memory * backporting shared_memory_dict * change from UNIX sockets to TCP sockets to allow SO_REUSEADDR * fork N processes * signal handling for child and parent processes This is stable enough to start the QA - I will probably still find enhancements.
What's the reason to backport multiprocessing.shared_memory from Python 3.8 including a bunch of Python and C modules that must be compiled and maintained? Why is it not possible to use the shared memory code that exists in Python 2.7 and 3.7 (e.g. Array, BaseManager, SyncManager etc)?
(In reply to Daniel Tröder from comment #25) > What's the reason to backport multiprocessing.shared_memory from Python 3.8 > including a bunch of Python and C modules that must be compiled and > maintained? > > Why is it not possible to use the shared memory code that exists in Python > 2.7 and 3.7 (e.g. Array, BaseManager, SyncManager etc)? I started with multiprocessing.Manager().dict() but got horrible exceptions - it works only with a asyncio main loop. They also don't use shared memory but TCP (or with some configuration Unix) sockets.
(In reply to Florian Best from comment #26) > (In reply to Daniel Tröder from comment #25) > > What's the reason to backport multiprocessing.shared_memory from Python 3.8 > > including a bunch of Python and C modules that must be compiled and > > maintained? > > > > Why is it not possible to use the shared memory code that exists in Python > > 2.7 and 3.7 (e.g. Array, BaseManager, SyncManager etc)? > > I started with multiprocessing.Manager().dict() but got horrible exceptions > - it works only with a asyncio main loop. Not what the Python documentation says and also not my personal experience. I have programmed a multi-process application using Python 2.7 (no async) and shared variables using the standard library. > They also don't use shared memory but TCP (or with some configuration Unix) > sockets. Yes, they implement synchronized shared variables (memory) using a separate process and network communication, not Linux-shared-memory. Why is that a problem?
Created attachment 10851 [details] `multiprocessing.Manager` crashes (In reply to Daniel Tröder from comment #27) > (In reply to Florian Best from comment #26) > > (In reply to Daniel Tröder from comment #25) > > > What's the reason to backport multiprocessing.shared_memory from Python 3.8 > > > including a bunch of Python and C modules that must be compiled and > > > maintained? > > > > > > Why is it not possible to use the shared memory code that exists in Python > > > 2.7 and 3.7 (e.g. Array, BaseManager, SyncManager etc)? > > > > I started with multiprocessing.Manager().dict() but got horrible exceptions > > - it works only with a asyncio main loop. > Not what the Python documentation says and also not my personal experience. > I have programmed a multi-process application using Python 2.7 (no async) > and shared variables using the standard library. Sorry, I confused: not `asyncio` but `multiprocessing.Process()`. Tornado uses a simple `os.fork()`, which can't be used with the multiprocessing tools. Attached is the logs of some of the exceptions occurring then. Details at: https://stackoverflow.com/questions/37692262/python-multiprocessing-assertionerror-can-only-join-a-child-process Using multiprocessing.Manager() was my starting point, too. Here is a patch on top of the merge request to use it instead - which doesn't work. diff --git management/univention-directory-manager-rest/src/univention/admin/rest/shared_memory.py management/univention-directory-manager-rest/src/univention/admin/rest/shared_memory.py index 14179203ca..98a0db49fb 100644 --- management/univention-directory-manager-rest/src/univention/admin/rest/shared_memory.py +++ management/univention-directory-manager-rest/src/univention/admin/rest/shared_memory.py @@ -175,3 +175,16 @@ def unlink_shared_memory(): except (EnvironmentError, RuntimeError): pass del __shared_memory[key] + + +import multiprocessing + +manager = multiprocessing.Manager() + + +def shared_memory_dict(name, size=None, parent=None): + return manager.dict() + + +def unlink_shared_memory(): + manager.shutdown()
You must create the shared object before forking. IMHO the internal working is that a process is spawn (the only one with "real" access to the variable) and it is talked to through sockets by the other processes. The server can then synchronize the access. For that to work, all process need to have access to the socket. As sockets are inherited when fork()ing, it works, when the variable (and thus the socket) is created _before_ forking. I just tried the following: In u.a.rest.__main__.Server.run() I added before server.start(args.cpus): -------------------------------------------------------------------------- from multiprocessing import Array arr = Array('i', range(6)) -------------------------------------------------------------------------- Then after the server.start(args.cpus) I added: --------------------------------------------------------------------------CORE.error('**** os.getpid()={!r} ****'.format(os.getpid())) CORE.error('**** arr={!r} ****'.format(arr)) for a in arr: CORE.error('**** a={!r}'.format(a)) -------------------------------------------------------------------------- In the log file I could see, that the Array could be accessed by all processes (they had different PIDs). When shutting down the REST API service using systemctl, there were no problems.
(In reply to Daniel Tröder from comment #29) > You must create the shared object before forking. Look at the patch - they are created during python import time before forking. > IMHO the internal working is that a process is spawn (the only one with > "real" access to the variable) and it is talked to through sockets by the > other processes. The server can then synchronize the access. For that to > work, all process need to have access to the socket. As sockets are > inherited when fork()ing, it works, when the variable (and thus the socket) > is created _before_ forking. yes > I just tried the following: … > In the log file I could see, that the Array could be accessed by all > processes (they had different PIDs). Did you also try to modify one of the values in a process? Is it synced to the others? For me it doesn't work for dicts in dicts - no matter if I use a Manager.dict() or a regular dict as the children dict. > When shutting down the REST API service using systemctl, there were no > problems. There is, look at /var/log/univention/directory-manager-rest.log
I have created two examples how this can be accomplished using only the standard Python library (Python 2.7). The simpler examples is using a MP-safe "Array" object. If the size of your data structure is known and fixed, that is the simplest and fastest solution using shared memory. The code is in the git branch dtroeder/50050_MP_Array. The more complex solution allows using arbitrary Python objects. In the example a dict is shared among all processes. The main process creates a process that "owns" the dict and stores its connection data in an MP-safe "Value" object. The worker processes then connect to it and use it. Tornado makes using Pythons own MP library very hard, because it uses fork(), instead of MP. That's why a workaround has to be used in the worker processes to unregister the MP._exit_function() from atexit. With this a pure Python 2.7 solution is available.
The "complex solution" for the shared dict is in the git branch "dtroeder/50050_MP_dict".
> The main process creates a process that "owns" the dict and stores its connection data in an MP-safe "Value" object. I just through that using an MP-safe "Value" object isn't necessary, as it will only be read. I had implemented it that way, because in an earlier attempt one of the workers started the dict-owning-process. But as the main process does it now, a simple string object can be used.
Just tested it, works: [dtroeder/50050_MP_dict b36087aea8] Bug #50050: access_manager_address is only read, no need for a synced variable
Turns out, that the existing DictAccessManager(SyncManager) class can be used in the worker processes as well. I think the example in the Python lib docs create a second class, because the code in those clients may be in a different module [on a different host]. But in our case we have just used fork() and have access to the class. [dtroeder/50050_MP_dict b36087aea8] Bug #50050: access_manager_address is only read, no need for a synced variable
I adjusted the Merge Request to your wish: use the multiprocessing standard library for sharing dictionaries via further processes, pickle and pipes instead of sharing common memory. Your examples are btw bloated: no subclassing or different instances of managers are required. They also did not work because the dict instances were created after the os.fork() so that each subprocess just had a own dictionary instance shared with noone except the manager process. It also created 2 manager processes because you first create and start a manager which holds a global dict and then start a manager where you registered methods to get that. Simply a `multiprocessing.Manager` - which is a started instance of `multiprocessing.SyncManager` - is required, which provides already registered dict() proxies. The disadvantage of all this is that all dictionaries to be shared must be created during process start before forking (e.g. at import time), otherwise they cannot be shared with sub-processes (because multiprocessing doesn't provide a way to get a specific instance). For creating those dictionary instances a SyncManager must be started, which causes that a manager process is started. This ofc also happens when just importing univention.admin.rest.* for e.g. mypy / sphinx-documentation / whatever. univention-directory-manager-rest.yaml c3570499249d | Bug #50050: debian/changelog + YAML univention-directory-manager-rest (10.0.2-2) dc94c1ed0c17 | Bug #50050: add missing translation 833fd3074634 | Bug #50050: remove daemonizing handled by systemd d0280dbfae10 | Bug #50050: use mulitprocessing.Manager() for shared memory d5729aa684ee | Bug #50050: remove shared_memory df2a103d7b82 | Bug #50050: unlink shared memory in main process d22b9b03737a | Bug #50050: implement shared memory using shared_memory_dict c85600e5b05f | Bug #50050: `raise NotFound` if wrong object type is given f9ebba3fd349 | Bug #50050: Add multiprocessing to UDM REST API cca140f63a36 | Bug #50050: protect secret shared memory against local exploitation 42a0ce8cb1a6 | Bug #50050: shared_memory_dict.py: Add Python 2 compatibility 4694edbffc35 | Bug #50050: Add shared_memory_dict.py 6e418764dba7 | Bug #50050: multiprocessing_shared_memory.py: re-add more Python 2 compatibility 775c735b9e22 | Bug #50050: multiprocessing_shared_memory.py: drop not required things 75cd98fd458c | Bug #50050: backport multiprocessing.shared_memory into new package
(In reply to Florian Best from comment #36) > I adjusted the Merge Request to your wish: > use the multiprocessing standard library for sharing dictionaries via > further processes, pickle and pipes instead of sharing common memory. > > Your examples are btw bloated: no subclassing or different instances of > managers are required. I wrote my example following the official Python documentation as closely as possible to make it easier to understand and adapt. I think the official Python documentations intention to subclass is a good one: they don't want to pollute the standard libraries class with the custom method (here "get_the_dict"). DictAccessManager.register() will only add the function to the subclass, so other parts of the software system - also using the BaseManager or a subclass of it - won't have it. > They also did not work because the dict instances > were created after the os.fork() so that each subprocess just had a own > dictionary instance shared with noone except the manager process. > It also created 2 manager processes because you first create and start a > manager which holds a global dict and then start a manager where you > registered methods to get that. The example did work. But it seems something in ucs5 changed or py3 changed it now it's not working anymore. Strange. Anyway, I updated the branch so that it works with 5.0. I also made the example a bit simpler so it can be clearly seen, that the dict is shared amongst all processes. I'll attach directory-manager-rest.log, where it is clearly visible, that a common dict is filled from all processes. In any case I am happy that my PoC worked, so we can stay with a supported version of Python 3 in UCS 5.0. > Simply a `multiprocessing.Manager` - which is a started instance of > `multiprocessing.SyncManager` - is required, which provides already > registered dict() proxies. > > The disadvantage of all this is that all dictionaries to be shared must be > created during process start before forking (e.g. at import time), otherwise > they cannot be shared with sub-processes (because multiprocessing doesn't > provide a way to get a specific instance). > For creating those dictionary instances a SyncManager must be started, which > causes that a manager process is started. > This ofc also happens when just importing univention.admin.rest.* for e.g. > mypy / sphinx-documentation / whatever. It is not necessary to create the SyncManager at import time. That is of cause very undesirable. I have modified my example in dtroeder/50050_MP_dict so that each process prints out the dict it sees, so that you can see, that the dict is a shared one, and that the SyncManager can be instanciated at runtime.
Created attachment 10884 [details] directory-manager-rest.log
Jenkins shows: Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/lib/python3/dist-packages/univention/admin/rest/__main__.py", line 221, in <module> Server.main() File "/usr/lib/python3/dist-packages/univention/admin/rest/__main__.py", line 217, in main server.run(args) File "/usr/lib/python3/dist-packages/univention/admin/rest/__main__.py", line 114, in run import univention.admin.rest.module # noqa: F401 File "/usr/lib/python3/dist-packages/univention/admin/rest/module.py", line 363, in <module> class ResourceBase(object): File "/usr/lib/python3/dist-packages/univention/admin/rest/module.py", line 368, in ResourceBase authenticated = shared_memory.dict() File "/usr/lib/python3.7/multiprocessing/managers.py", line 701, in temp token, exp = self._create(typeid, *args, **kwds) File "/usr/lib/python3.7/multiprocessing/managers.py", line 583, in _create assert self._state.value == State.STARTED, 'server not yet started' AssertionError: server not yet started
don't know if related: Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/lib/python3/dist-packages/univention/admin/rest/__main__.py", line 221, in <module> Server.main() File "/usr/lib/python3/dist-packages/univention/admin/rest/__main__.py", line 217, in main server.run(args) File "/usr/lib/python3/dist-packages/univention/admin/rest/__main__.py", line 114, in run import univention.admin.rest.module # noqa: F401 File "/usr/lib/python3/dist-packages/univention/admin/rest/module.py", line 81, in <module> from univention.management.console.modules.udm.udm_ldap import get_module, UDM_Module, ldap_dn2path, read_syntax_choices, _get_syntax, container_modules, UDM_Error File "/usr/lib/python3/dist-packages/univention/management/console/modules/__init__.py", line 36, in <module> import univention.management.console.protocol # noqa: F401 File "/usr/lib/python3/dist-packages/univention/management/console/protocol/__init__.py", line 212, in <module> from .session import * # noqa: F403,F401 File "/usr/lib/python3/dist-packages/univention/management/console/protocol/session.py", line 58, in <module> from .client import Client, NoSocketError File "/usr/lib/python3/dist-packages/univention/management/console/protocol/client.py", line 47, in <module> from OpenSSL import SSL File "/usr/lib/python3/dist-packages/OpenSSL/__init__.py", line 8, in <module> from OpenSSL import crypto, SSL File "/usr/lib/python3/dist-packages/OpenSSL/crypto.py", line 12, in <module> from cryptography import x509 File "/usr/lib/python3/dist-packages/cryptography/x509/__init__.py", line 8, in <module> from cryptography.x509.base import ( File "/usr/lib/python3/dist-packages/cryptography/x509/base.py", line 15, in <module> from cryptography.hazmat.primitives.asymmetric import dsa, ec, rsa File "/usr/lib/python3/dist-packages/cryptography/hazmat/primitives/asymmetric/rsa.py", line 18, in <module> from cryptography.hazmat.backends.interfaces import RSABackend File "<frozen importlib._bootstrap>", line 980, in _find_and_load File "<frozen importlib._bootstrap>", line 149, in __enter__ File "<frozen importlib._bootstrap>", line 101, in acquire KeyError: 140359116617536
Please note that running the UDM REST API with multiple processes causes that also the number of LDAP connections increases. I don't know if this has any relevant side effects.
SyncManager is not started during python import anymore. univention-directory-manager-rest (10.0.2-4) 1fd6c9c7cb0e | Bug #50050: don't create global dictionaries during python import Bug #50050: don't create global dictionaries during python import The main shared memory dictionary instances must be created after the python-import so that python importing the code doesn't need a started SyncManager. But the instances must be created before os.fork() so that every process share/reference the same instance.
univention-directory-manager-rest (10.0.2-7) ac176850ae27 | Bug #50050: add basic README for UDM REST API 59998efc969e | Bug #50050: fix small glitch in HTML form view 6ea71770c9fb | Bug #50050: document UCR variable and use UCR default layer
A section has been added to the performance documentation: f0514f248188 | Bug #50050: add UDM REST API scaling to performance doc
Verified: * Code review * Functional tests * Simple performance comparison: 1000 parallel curl users/user lookups from a remote machine * before (against VM with 8 cores): * ~75 seconds * after: * ~15 seconds with directory/manager/rest/processes=8 (or =0) * ~20 seconds with directory/manager/rest/processes=4 * Documentation (README.md & Performance Guide) * Advisory
<https://errata.software-univention.de/#/?erratum=5.0x202>