Univention Bugzilla – Bug 51182
ldap-group-to-file: huge performance loss by using ldap/server/addition
Last modified: 2020-05-06 14:40:02 CEST
ldap-group-to-file.py uses the UCR variable ldap/server/name as well as the LDAP servers from the UCR variable ldap/server/addition for an LDAP connection to query the group memberships. If the local LDAP server is used, this is usually not a problem and is done quickly. However, if e.g. a DC Master or DC Backup is used, which can be reached with a few miliseconds latency, the runtime of the script increases to several minutes. In the concrete customer case ldap-group-to-file was started on a UCS@school slave by the class work mode. There are 1900 groups readable in the domain or for the slave. If ldap-group-to-file uses the local LDAP server, the script finishes its work after only 6 seconds. However, ldap/server/addition contains 3 external systems (master + 2 backups), which are also randomly selected by ldap-group-to-file. The runtime of the script then increases to 4-5min. The start of the class work mode therefore does not take ~45 seconds but >5min. It should be checked if - too many LDAP queries are performed (# of query * latency = 5min?) - the LDAP ACL evaluation is too expensive on DC Master or DC Backup - ...
The intention of using ldap/server/addition for this script was to reduce the load on the DC Master initiated by memberserver instances. It makes absolutely sense to always try the local LDAP server first if it is ldap/server/name or in ldap/server/addition. So I suggest to change that script in the first place.
IMHO getMachineConnection() should be modified - not ldap-group-to-file.py. getMachineConnection() always uses [ldap/server/name + ldap/server/addition]. If called with random_server=True, it randomizes the list. It should be changed to always prefer the local LDAP server, and then append the (optionally randomized) ldap/server/addition list.
Created attachment 10342 [details] prefer local ldap server
(In reply to Daniel Tröder from comment #2) > IMHO getMachineConnection() should be modified - not ldap-group-to-file.py. > > getMachineConnection() always uses [ldap/server/name + ldap/server/addition]. > If called with random_server=True, it randomizes the list. > > It should be changed to always prefer the local LDAP server, and then append > the (optionally randomized) ldap/server/addition list. In which other usecases is getMachineConnection() used with random_server=true?
(In reply to Ingo Steuwer from comment #4) > In which other usecases is getMachineConnection() used with > random_server=true? Only in test/ucs-test/tests/10_ldap/24uldap.
(In reply to Daniel Tröder from comment #5) > (In reply to Ingo Steuwer from comment #4) > > In which other usecases is getMachineConnection() used with > > random_server=true? > Only in test/ucs-test/tests/10_ldap/24uldap. To be precise: the occurrence in 24uldap is the actual test of the random_server argument. I would still suggest, that ldap-group-to-file uses the local LDAP-server first (random_server=False) and if that fails, it retries with random_server=True.
(In reply to Sönke Schwardt-Krummrich from comment #6) > I would still suggest, that ldap-group-to-file uses the local LDAP-server > first (random_server=False) and if that fails, it retries with > random_server=True. That won't work... Or at least not how you expect it to. With random_server=False it will still try out *all* servers, incl. those from ldap/server/addition. Just in a predefined order.
(In reply to Daniel Tröder from comment #7) > That won't work... Or at least not how you expect it to. > With random_server=False it will still try out *all* servers, incl. those > from ldap/server/addition. Just in a predefined order. Yes. I've been doing some back and forth with Arvid and we decided on a variant that maintains the current state on member servers and shows improved behavior on domain controllers. For this purpose getMachineConnection() was adapted and not the script ldap-group-to-file.py: With random_server==false, the servers from ldap/server/name and ldap/server/addition are still concatenated and iterate over the resulting server list until a connection is successfully established with an LDAP server. If random_server==True, the system role is important. As is well known, no LDAP server runs on member servers, so the desired goal is to distribute the load by requests to all specified LDAP servers. Therefore, getMachineConnection first concatenates ldap/server/name and ldap/server/addition and then randomizes the order of the resulting list of hosts. On all other system roles, with random_server==True only the order of the servers from ldap/server/addition is randomized. The server from ldap/server/name (usually the local system) remains in first position. This should eliminate the performance regression introduced with bug 50191. [4.4-4] 89ab6ec640 Bug #51182: update advisory [4.4-4] ffbda6baa3 Bug #51182: Merge branch 'sschwardt/bug51182/4.4-4' into 4.4-4 [4.4-4] 05bf9aa62a Bug #51182: add advisory [4.4-4] 0c517a8197 Bug #51182: improve random_server in getMachineConnection Package: univention-python Version: 12.0.0-19A~4.4.0.202004292242 Branch: ucs_4.4-0 Scope: errata4.4-4 Waiting for jenkins results.
> Waiting for jenkins results. Results are good. [4.4-4] 224ae72feb Bug #51182: update unit test for getMachineConnection Package: ucs-test Version: 9.0.3-195A~4.4.0.202004301730 Branch: ucs_4.4-0 Scope: errata4.4-4
Verified: * Code Ok * Test-Case: Ok and green * Advisory Ok
<http://errata.software-univention.de/ucs/4.4/588.html>