Univention Bugzilla – Bug 56286
UCS Dashboard alerts - Univention KPASSWDD issue
Last modified: 2024-02-27 18:00:56 CET
UCS: 5.0-3 errata718 Installed: admin-dashboard=2.1-ucs1 dhcp-server=12.0 prometheus-alertmanager=1.0 prometheus-node-exporter=2.0.1 samba4=4.16 ucsschool=5.0 v3 4.4/prometheus=2.35.0-5 UCS Dashboard/Grafana is installed and displays the following message as a fire alert by default when Samba is installed on my test systems: State Labels Created Firing alertname=UNIVENTION_KPASSWDD instance=dc0.ucs5schoolhejne.intranet job=univention severity=critical 2023-07-04 09:36:01 Value 0e+00 Description heimdal kpasswdd process is running title check if kpasswdd process is present Firing alertname=UNIVENTION_KPASSWDD instance=hejneschool2.ucs5schoolhejne.intranet job=univention severity=critical 2023-07-05 03:11:01 Value 0e+00 Description heimdal kpasswdd process is running title check if kpasswdd process is present But the kpasswd service is marked and inactive, because Samba is installed on the test systems: root@dc0:~# systemctl status heimdal-kdc.service ● heimdal-kdc.service Loaded: masked (Reason: Unit heimdal-kdc.service is masked.) Active: inactive (dead) root@dc0:~# ps aufx | grep kpasswd root 10385 0.0 0.0 6416 828 pts/1 S+ 07:08 0:00 | \_ grep kpasswd Here are the information from the doc, for that fire alert: UNIVENTION_KPASSWDD Tests the availability of the Kerberos password service (only available on Primary/Backup Directory Nodes). If fewer or more than one process is running, an alert is fired. So this looks like a Bug.
Created attachment 11085 [details] Screenshot from UCS Dashboard/Grafana
Customer affected Ticket#2023070521000253 UCS Version: 5.0-3-642 DNS-Backend: samba4
This check was somehow invented via Bug 54748, before that, with nagios, it doesn't seem to have existed. Anyway, kpasswdd doesn't run on UCS nodes that serve Samba/AD.
(In reply to Arvid Requate from comment #3) > This check was somehow invented via Bug 54748, > before that, with nagios, it doesn't seem to have > existed. Anyway, kpasswdd doesn't run on UCS nodes > that serve Samba/AD. uhm no. This is the nagios equivalent of: monitoring/univention-nagios/26univention-nagios-common.inst 127 univention-directory-manager nagios/service create "$@" --ignore_exists --position="$NAGIOSCONTAINER" --set name="UNIVENTION_KPASSWDD" --set description="Default Service: check if kpasswdd process is present" --set checkCommand="check_ univention_procs_name" --set checkArgs="1:1!1:!kpasswdd" --set useNRPE="1" --set normalCheckInterval="15" --set retryCheckInterval="2" --set maxCheckAttempts="2" --set checkPeriod="24x7" --set notificationOptionWarning="1" --set notifi cationOptionCritical="1" --set notificationOptionUnreachable="1" --set notificationOptionRecovered="1" --set notificationPeriod="24x7" || die
But if Samba is installed there is not kpasswdd because that exists only in heimdal. We should somehow deactivate the check in that case.
I am the customer who is affected. As the support ticket is already closed i would like to point out that it is important that this gets fixed as we need a monitoring system without false-positives, please.
Another Customer is effected from this Ticket#2023092621000344 UCS 5.0-3-642
Another customer is affected 2024011021000341 UCS: 5.0-5 errata852 Installed: admin-dashboard=3.0 dhcp-server=12.0 pkgdb=11.0 prometheus-node-exporter=2.0.1 samba4=4.16 self-service=5.0 self-service-backend=5.0 4.4/prometheus=2.35.0-5 Upgradable: UCS: 5.0-5 errata852 Installed: pkgdb=11.0 prometheus-alertmanager=1.0 prometheus-node-exporter=2.0.1 samba-memberserver=4.16 4.4/prometheus=2.35.0-5 Upgradable:
Workaround / Solution https://help.univention.com/t/problem-ucs-dashboard-alerts-univention-kpasswdd-fired/22531
> ucr set monitoring/plugin/check_univention_kpasswdd/disabled=true Maybe we could set that in the univention-samba4 joinscript (or postinst).
>> ucr set monitoring/plugin/check_univention_kpasswdd/disabled=true > >Maybe we could set that in the univention-samba4 joinscript (or postinst). I'm not sure if that's the most viable solution, as it did not work on my test machine. Please test before implementing it. What did work reliably was removing the alert object from the domaincontroller, please see the updated help article.
The UCR variable would disabled the execution of the alert-script. Then the metric is missing and a UNIVENTION_KPASSWD_MISSING alert will be thrown. Therefore the alert must probably be deactivated in UDM.