background: Modifying a group with around 36k users failed with an ldap error: _ldap_modify: Other (e.g., implementation specific) error (80) _ The nagios check said: SLAPD MDB OK: Database /var/lib/univention-ldap/ldap operational (in fact 35.7%) There we had the default UCR: ldap/database/mdb/maxsize: 4295000000 After increasing the ldap/database/mdb/maxsize: 5295000000 the group could be modified. The nagios check itself was misleading. We should adjust the warning levels for nagios and prometheus checks. > A good rule of thumb is to multiply the total database size by five. https://www.symas.com/post/openldap-lmdb-sizing-guide The administrator has no idea which group sizes/(other operations?) are "OK" for which ldap database size. Is there a way to estimate whether the remaining available space is enough to allow for modifications of present ldap objects? (At least for the largest object in LDAP). If so we should provide a monitoring script.
I increase the pain, I see a risk that this happens in other larger environments.
An adjustment of the metric has been merged and built: ``` univention-nagios.yaml 12ff3eea84e3 | feat(nagios): Improve check_univention_slapd_mdb_maxsize univention-nagios (15.3.1) 12ff3eea84e3 | feat(nagios): Improve check_univention_slapd_mdb_maxsize univention-monitoring-client.yaml db2090357f7c | feat(monitoring): Improve alerts/check_univention_mdb_maxsize univention-monitoring-client (3.3.1) db2090357f7c | feat(monitoring): Improve alerts/check_univention_mdb_maxsize ``` The metric which calculates how much of the available space has been used up (see `UNIVENTION_SLAPD_MDB_MAXSIZE`) has been adjusted: The pages from the LMDB freelist are now subtracted from the maximum number of pages. This has the effect that, when the `Number of pages used` metric from `mdb_stat` approaches the `Max pages`, we will always approach 100% and the alerts will be triggered. Before, it was for example possible that with `Number of pages used` == `Max pages` and 40% free pages you get a metric of 60%, which is below the default alert levels. So the metric will now trigger alerts in more scenarios, without ignoring free pages completely. More details on the GitLab Issue.
<https://errata.software-univention.de/#/?erratum=5.2x256> <https://errata.software-univention.de/#/?erratum=5.2x257>