Bug 56286 - UCS Dashboard alerts - Univention KPASSWDD issue
UCS Dashboard alerts - Univention KPASSWDD issue
Status: NEW
Product: UCS
Classification: Unclassified
Component: UCS Dashboard
UCS 5.0
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on: 54748
Blocks:
  Show dependency treegraph
 
Reported: 2023-07-07 11:59 CEST by Mirac Erdemiroglu
Modified: 2024-02-27 18:00 CET (History)
6 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 3: Simply Wrong: The implementation doesn't match the docu
Who will be affected by this bug?: 3: Will affect average number of installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.154
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2023070521000253, 2023092621000344, 2024011021000341, 2024011121000367
Bug group (optional):
Max CVSS v3 score:


Attachments
Screenshot from UCS Dashboard/Grafana (67.91 KB, image/png)
2023-07-07 12:00 CEST, Mirac Erdemiroglu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mirac Erdemiroglu univentionstaff 2023-07-07 11:59:29 CEST
UCS: 5.0-3 errata718
Installed: admin-dashboard=2.1-ucs1 dhcp-server=12.0 prometheus-alertmanager=1.0 prometheus-node-exporter=2.0.1 samba4=4.16 ucsschool=5.0 v3 4.4/prometheus=2.35.0-5

UCS Dashboard/Grafana is installed and displays the following message as a fire alert by default when Samba is installed on my test systems:

State
Labels
Created
Firing

    alertname=UNIVENTION_KPASSWDD
    instance=dc0.ucs5schoolhejne.intranet
    job=univention
    severity=critical

2023-07-04 09:36:01
Value
0e+00
Description
heimdal kpasswdd process is running
title
check if kpasswdd process is present
Firing

    alertname=UNIVENTION_KPASSWDD
    instance=hejneschool2.ucs5schoolhejne.intranet
    job=univention
    severity=critical

2023-07-05 03:11:01
Value
0e+00
Description
heimdal kpasswdd process is running
title
check if kpasswdd process is present


But the kpasswd service is marked and inactive, because Samba is installed on the test systems:

root@dc0:~# systemctl status heimdal-kdc.service 
● heimdal-kdc.service
   Loaded: masked (Reason: Unit heimdal-kdc.service is masked.)
   Active: inactive (dead)


root@dc0:~# ps aufx | grep kpasswd
root     10385  0.0  0.0   6416   828 pts/1    S+   07:08   0:00  |   \_ grep kpasswd



Here are the information from the doc, for that fire alert:

UNIVENTION_KPASSWDD
	

Tests the availability of the Kerberos password service (only available on Primary/Backup Directory Nodes). If fewer or more than one process is running, an alert is fired.

So this looks like a Bug.
Comment 1 Mirac Erdemiroglu univentionstaff 2023-07-07 12:00:26 CEST
Created attachment 11085 [details]
Screenshot from UCS Dashboard/Grafana
Comment 2 Mirac Erdemiroglu univentionstaff 2023-07-07 12:02:06 CEST
Customer affected Ticket#2023070521000253

UCS Version: 5.0-3-642
DNS-Backend: samba4
Comment 3 Arvid Requate univentionstaff 2023-07-07 13:17:25 CEST
This check was somehow invented via Bug 54748,
before that, with nagios, it doesn't seem to have
existed. Anyway, kpasswdd doesn't run on UCS nodes
that serve Samba/AD.
Comment 4 Florian Best univentionstaff 2023-07-07 14:36:53 CEST
(In reply to Arvid Requate from comment #3)
> This check was somehow invented via Bug 54748,
> before that, with nagios, it doesn't seem to have
> existed. Anyway, kpasswdd doesn't run on UCS nodes
> that serve Samba/AD.

uhm no.
This is the nagios equivalent of:
monitoring/univention-nagios/26univention-nagios-common.inst 

127 univention-directory-manager nagios/service create "$@" --ignore_exists --position="$NAGIOSCONTAINER" --set name="UNIVENTION_KPASSWDD" --set description="Default Service: check if kpasswdd process is present" --set checkCommand="check_    univention_procs_name" --set checkArgs="1:1!1:!kpasswdd" --set useNRPE="1" --set normalCheckInterval="15" --set retryCheckInterval="2" --set maxCheckAttempts="2" --set checkPeriod="24x7" --set notificationOptionWarning="1" --set notifi    cationOptionCritical="1" --set notificationOptionUnreachable="1" --set notificationOptionRecovered="1" --set notificationPeriod="24x7" || die
Comment 5 Florian Best univentionstaff 2023-07-07 14:37:55 CEST
But if Samba is installed there is not kpasswdd because that exists only in heimdal.
We should somehow deactivate the check in that case.
Comment 6 Thomas Stather 2023-08-16 11:14:16 CEST
I am the customer who is affected. As the support ticket is already closed i would like to point out that it is important that this gets fixed as we need a monitoring system without false-positives, please.
Comment 7 Mirac Erdemiroglu univentionstaff 2023-09-26 16:40:29 CEST
Another Customer is effected from this Ticket#2023092621000344
UCS 5.0-3-642
Comment 8 Mirac Erdemiroglu univentionstaff 2024-01-11 11:30:29 CET
Another customer is affected 2024011021000341

UCS: 5.0-5 errata852
Installed: admin-dashboard=3.0 dhcp-server=12.0 pkgdb=11.0 prometheus-node-exporter=2.0.1 samba4=4.16 self-service=5.0 self-service-backend=5.0 4.4/prometheus=2.35.0-5
Upgradable:

UCS: 5.0-5 errata852
Installed: pkgdb=11.0 prometheus-alertmanager=1.0 prometheus-node-exporter=2.0.1 samba-memberserver=4.16 4.4/prometheus=2.35.0-5
Upgradable:
Comment 9 Mirac Erdemiroglu univentionstaff 2024-02-08 14:44:45 CET
Workaround / Solution
https://help.univention.com/t/problem-ucs-dashboard-alerts-univention-kpasswdd-fired/22531
Comment 10 Arvid Requate univentionstaff 2024-02-08 17:35:23 CET
> ucr set monitoring/plugin/check_univention_kpasswdd/disabled=true

Maybe we could set that in the univention-samba4 joinscript (or postinst).
Comment 11 Finn David univentionstaff 2024-02-27 17:10:54 CET
>> ucr set monitoring/plugin/check_univention_kpasswdd/disabled=true
>
>Maybe we could set that in the univention-samba4 joinscript (or postinst).


I'm not sure if that's the most viable solution, as it did not work on my test machine. Please test before implementing it. 
What did work reliably was removing the alert object from the domaincontroller, please see the updated help article.
Comment 12 Florian Best univentionstaff 2024-02-27 18:00:56 CET
The UCR variable would disabled the execution of the alert-script.
Then the metric is missing and a UNIVENTION_KPASSWD_MISSING alert will be thrown.

Therefore the alert must probably be deactivated in UDM.