Bug 50191 - Use random server for ldap-group-to-file.py on member servers
Use random server for ldap-group-to-file.py on member servers
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 4.4
Other Linux
: P5 normal (vote)
: UCS 4.4-3-errata
Assigned To: Julia Bremer
Jürn Brodersen
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-09-13 15:52 CEST by Christian Völker
Modified: 2020-04-29 10:40 CEST (History)
5 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 4: Will affect most installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.571
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2019091221000781, 2019120221000346
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Völker univentionstaff 2019-09-13 15:52:16 CEST
By default, ldap-group-to-file runs very frequently (at least every 15minutes by default).

By default it runs against the master server.

Customer having a large number of objects (~50.000 user) the installed member servers hammered the master server with this operation. The backup servers where idling meanwhile.

The script should use a random server among the existing servers (master, backup, possibly slaves) so the load of this operation gets more balanced.
Comment 1 Christian Völker univentionstaff 2019-12-04 10:00:10 CET
Happened on a big environment and affected exam mode in a very severe way because the changes did not reflect in time so the exam mode failed.
Comment 2 Arvid Requate univentionstaff 2019-12-04 17:03:38 CET
> By default it runs against the master server.

Sure?

base/univention-pam/ldap-group-to-file.py does:

  lo = univention.uldap.getMachineConnection(ldap_master=False)

a look into ./base/univention-python/modules/uldap.py shows that ths code evaluates UCR ldap/server/name and ldap/server/addition in that case. On UCS DCs ldap/server/name should point to the local host by default.


git blame shows:

commit 053be8d9f9
Author: Stefan Gohmann <gohmann@univention.de>
Date:   Fri Nov 2 08:04:01 2012 +0000

    * ldap-group-to-file.py: don't use always the DC Master (Bug #29029)
Comment 3 Arvid Requate univentionstaff 2019-12-04 21:15:34 CET
Ok, I understand that ldap/server/name and ldap/server/addition are used in uldap.getMachineConnection.
On secondary UCS LDAP servers (Backup & Slave) ldap/server/name defaults to the fqdn of the local system,
but on a UCS Memberserver this is not the case.
Comment 4 Julia Bremer univentionstaff 2020-01-15 09:38:07 CET
Successful build
Package: univention-management-console-module-diagnostic
Version: 5.0.1-33A~4.4.0.202001150905
Branch: ucs_4.4-0
Scope: errata4.4-3

9d643737bb Bug #50191: Merge branch 'jbremer/bug50191' into 4.4-3
97979cef5d Bug #50191: Add UMC-Check, add univention-config-registry-variable and add build-dep, translate


I created a diagnostic check, which raises an error if the ucr variable ldap/server/name is set to the master on a member server and there exist other DCs.
This check can be disabled via the ucr set diagnostic/check/disable/59_ldap_server_name=yes
Comment 5 Christian Völker univentionstaff 2020-01-15 10:22:15 CET
So we should now have an additional UCRV to check if another UCRV is set to master?

In the end it will be the same for customers. In frequent case when they do NOT start UMC diagnostics the member server will hammer the master with ldap-group-to-file and they will complain about the load on the master.

Then support will tell them to set the variable manually to a different server.

No change in behaviour!



In UCS for replication we have a mechanism for slave servers to decide randomly which of {master, backup1, backup2, ... } to be used for replication.

This mechanism could be used to distribute the load of slave servers.

Second, is the update of ldap-group-to-file so often needed? Couldn't we at least reduce the frequency of runs on member server (currently on EVERY change AND every 15 minutes)?
Comment 6 Julia Bremer univentionstaff 2020-01-15 10:43:40 CET
> In UCS for replication we have a mechanism for slave servers to decide
> randomly which of {master, backup1, backup2, ... } to be used for
> replication.
> 
> This mechanism could be used to distribute the load of slave servers.
> 
> Second, is the update of ldap-group-to-file so often needed? Couldn't we at
> least reduce the frequency of runs on member server (currently on EVERY
> change AND every 15 minutes)?

Both of these steps were discussed and are part of the resolution of this bug. 
The UMC-Check was just the first step to inform costumers if there is potential for optimization. 
I am sorry, I should have documented that on this bug.
Comment 7 Julia Bremer univentionstaff 2020-01-22 13:03:20 CET
Successful build
Package: univention-management-console-module-diagnostic
Version: 5.0.1-43A~4.4.0.202001221256
Branch: ucs_4.4-0
Scope: errata4.4-3

*I adjusted the UMC-Check Warning message

============================================
Successful build
Package: univention-python
Version: 12.0.0-18A~4.4.0.202001171809
Branch: ucs_4.4-0
Scope: errata4.4-3
User: jbremer

Successful build
Package: univention-pam
Version: 12.0.2-6A~4.4.0.202001171113
Branch: ucs_4.4-0
Scope: errata4.4-3
User: jbremer

* The LDAP Server selection is now randomized for ldap-group-to-file.py
* getMachineConnection has now the flag random_server to activate this.
* The cronjob default now is to run once a day at 3 am with a jitter of 1800 (half an hour)


TODO: Write an sdb article which explains how to change the configuration of ldap/server/name ldap/server/addition
Comment 8 Jürn Brodersen univentionstaff 2020-01-24 11:40:45 CET
- nss/group/cachefile/invalidate_interval is not adjusted in the postinst
- The check title seems wrong: "Check LDAP server role" -> "Check default LDAP server"
- jitter takes a time and a command -> the "&&" doesn't work.
- "Diese Variable deaktiviert des Test 59_ldap_server_name im Diagnosemodul wenn sie auf true gesetzt wird." -> "Diese Variable deaktiviert den Test 59_ldap_server_name im Diagnosemodul wenn sie auf true gesetzt ist."
Comment 9 Jürn Brodersen univentionstaff 2020-01-24 12:45:44 CET
Please also remove DC slaves from the umc check ldap filter. With selective replication on school slaves, I think we should only recommand using the master or backup servers to be on the safe side :)
Comment 10 Julia Bremer univentionstaff 2020-01-24 15:21:37 CET
Successful build
Package: univention-management-console-module-diagnostic
Version: 5.0.1-45A~4.4.0.202001241312
Branch: ucs_4.4-0
Scope: errata4.4-3

Successful build
Package: univention-pam
Version: 12.0.2-7A~4.4.0.202001241209
Branch: ucs_4.4-0
Scope: errata4.4-3

The sdb-article is here:
I will it after verification.

https://help.univention.com/t/changing-the-primary-ldap-server-to-redistribute-the-server-load/14138
Comment 11 Jürn Brodersen univentionstaff 2020-01-28 12:20:57 CET
What I tested:
Warning is shown on member if backup exists -> OK
Warning is not shown on member if no backup exists -> OK
No warning on master, backup, slave -> OK
Test can be deactivated -> OK
ldap-group-to-file uses a random ldap server, if multiple are available -> OK

yaml -> OK

Some small changes:
[4.4-3 23743d113d] Bug #50191: adjust text
[4.4-3 bc873ca155] Bug #50191: Test random ldap server connection
[4.4-3 73ec2f260f] Bug #50191: yaml
Comment 13 Florian Best univentionstaff 2020-03-25 16:29:14 CET
FYI: this introduced the following python issues:

base/univention-python/modules/uldap.py|179 col 44 error| missing whitespace after ',' [E231]                                                                                                                                                                                
base/univention-python/modules/uldap.py|188 col 9 error| undefined name 'exc' [F821]