Bug 51738 - Slow replication due to prometheus-targets.py
Slow replication due to prometheus-targets.py
Status: NEW
Product: UCS
Classification: Unclassified
Component: UCS Dashboard
UCS 4.4
Other Windows NT
: P5 normal with 1 vote (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-07-28 22:54 CEST by Michael Grandjean
Modified: 2020-12-28 11:04 CET (History)
1 user (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 2: Improvement: Would be a product improvement
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.046
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Grandjean univentionstaff 2020-07-28 22:54:51 CEST
# univention-app info
UCS: 4.3-4 errata617
Installed: admin-dashboard=1.2 dhcp-server=12.0 pkgdb=11.0 prometheus=1.1 prometheus-node-exporter=1.1 ucsschool=4.3 v9
Upgradable:

I'm seeing a strange behaviour on a UCS 4.3 that we are trying to update to UCS 4.4. I know that UCS 4.3 is out of maintenance, but the code in question is the same for UCS 4.4, so I assume it's still there.

The domain has about 1700 objects that count as "univentionHost" (servers, ipmanagedclients, and so on). 
As far as I understand, the prometheus-targets.py tries to update the target list of prometheus nodes on every single transaction that occurs for a "univentionHost"  object (objectClass=univentionHost). This seems to result in a slowdown of the replication to about 15-20 seconds per transaction on this specific system.
All other UCS systems in the domain are fast as always, only the one with the UCS Dashboard installed is that slow.
And it results in an unusable logfile with >1700 lines per transaction.

Maybe the ldap lookup for prometheus nodes can be optimized? 
Additionally, some kind of checkbox to actually activate a host as prometheus target would be nice. This way the ldapfilter could be something like '(&(objectClass=univentionHost)(univentionPrometheusTarget=1))'