Bug 34570 - check_ntp integration doesn't work since UCS 3.2-1
check_ntp integration doesn't work since UCS 3.2-1
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Monitoring (Prometheus or Nagios)
UCS 3.2
Other Linux
: P5 normal (vote)
: UCS 3.2-2-errata
Assigned To: Erik Damrose
Philipp Hahn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-16 14:46 CEST by Tim Petersen
Modified: 2022-06-27 17:48 CEST (History)
4 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Petersen univentionstaff 2014-04-16 14:46:42 CEST
A customer reported this at 2014041021000837
For new systems which are installed and joined with 3.2-1 the UNIVENTION_NTP Nagios checks leads to a "socket timed out" error in either nagstamon or the web frontend. "check_ntp" by hand works.

I reproduced this at the first go on a newly installed 3.2-1 dc backup.

Unfortunately I didn't see any difference in binary versions or the join scripts...
Comment 1 Janis Meybohm univentionstaff 2014-04-29 10:28:56 CEST
Reportet again: 2014042521005841
Comment 2 Janis Meybohm univentionstaff 2014-04-30 10:43:00 CEST
Reported again 2014043021002879

TM to errata as this is annoying for customers and of course, results in support requests.
Comment 3 Erik Damrose univentionstaff 2014-06-06 11:20:34 CEST
In Bug 33834 we introduced the UCR variable ntp/noquery, which is set to true on new installations. This denies external requests to the time server, including the nagios check UNIVENTION_NTP. 

Note that for the UNIVENTION_NTP check the nagios server queries the target server's ntpd with the nagios plugin check_ntp.

Workaround: Set ntp/noquery to false or deactivate the check.

At Bug 33834 we decided to make ntp/noquery=true the default for new installations. Thus, we could change the check to use check_ntp_time via NRPE. Note that this would change the test slightly: Currently, we query the servers ntp server for the time and compare to the local nagios server time, when using check_ntp_time via NRPE we compare the time on the checked server to an external ntp server.
Comment 4 Erik Damrose univentionstaff 2014-06-06 11:42:11 CEST
Note that the 'check_ntp' check does call /usr/lib/nagios/plugins/check_ntp_peer and not /usr/lib/nagios/plugins/check_ntp

Maybe it is a problem with the plugin after all, as check_ntp works fine, but is marked as deprecated
Comment 5 Erik Damrose univentionstaff 2014-06-10 12:50:32 CEST
Change UNIVENTION_NTP nagios check to compare ntp server time by using the check_time nagios plugin.

The previous check also asked for ntp server configuration options, which are not relevant for this check and, since UCS 3.2 errata 20, are not allowed to be queried from external sources.

r50951 univention-nagios 7.0.6-3.264.201406101246
r50953 2014-06-10-univention-nagios.yaml
Comment 6 Philipp Hahn univentionstaff 2014-06-13 16:29:52 CEST
OK: r50951 UNIVENTION_NTP: check_ntp → check_univention_ntp

FAIL: The definition in ucs-3.2-2/nagios/univention-nagios/usr/share/nagios-plugins/templates-univention/univention.cfg uses the wrong plugin and now checks something completely different:

# find /usr/lib/nagios/plugins -name \*_time\* -o -name \*_ntp\*
/usr/lib/nagios/plugins/check_ntp
/usr/lib/nagios/plugins/check_ntp_time
/usr/lib/nagios/plugins/check_time
/usr/lib/nagios/plugins/check_ntp_peer

<http://nagios-plugins.org/doc/man/check_ntp.html>
  Deprecated
  Checks that the given server is a working NTP server and that the difference between it and the local system is in the given range.
  It also checks the server status like stratum, ..., which is disabled for security and DoS reasons.

<http://nagios-plugins.org/doc/man/check_ntp_time.html>:
  port 123 = "ntp"
  Only checks that the difference between the given remote NTP and local system is in the given range.

<http://nagios-plugins.org/doc/man/check_time.html>:
  port 37 = "time"
  Uses a completely different protocol (minimum resolution: 1s)

<http://nagios-plugins.org/doc/man/check_ntp_peer.html>:
  Checkts the the given server is a responding NTP server.

before: UNIVENTION_NTP
CRITICAL	2014-06-13 15:06:55	 0d 0h 0m 35s	1/10	CRITICAL - Socket timeout after 10 seconds


OK: aptitude install '?source-package(univention-nagios)?installed'
OK: udm nagios/service list --filter name=UNIVENTION_NTP
OK: announce_errata -V ucs-3.2-2/doc/errata/staging/2014-06-10-univention-nagios.yaml
Comment 7 Erik Damrose univentionstaff 2014-06-16 10:00:59 CEST
I fixed the nagios check, it now uses check_ntp_time.

r51073 univention-nagios 7.0.6-4.265.201406160957
r51074 yaml
Comment 8 Philipp Hahn univentionstaff 2014-06-17 14:01:19 CEST
OK: r51073 r51074
OK: aptitude install '?source-package(univention-nagios)?installed'
OK: announce_errata -V ucs-3.2-2/doc/errata/staging/2014-06-10-univention-nagios.yaml
OK: /usr/lib/nagios/plugins/check_ntp -H 10.200.17.18 -j ... # fails as expected
OK: /usr/lib/nagios/plugins/check_ntp_peer -H 10.200.17.18   # fails as expected
OK: /usr/lib/nagios/plugins/check_ntp_time -H 10.200.17.18   # succeeds

FYI: My test DC slave fails to synchronize to the DC master, as the LOCAL clock is still preferred even with the higher stratum:
# ntpq -c peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 qa18.phahn.qa   LOCAL(0)         6 u   50   64  377    0.486  2810960   7.332
*LOCAL(0)        .LOCL.           9 l   51   64  377    0.000    0.000   0.000
Comment 10 Moritz Muehlenhoff univentionstaff 2014-07-02 11:28:05 CEST
http://errata.univention.de/ucs/3.2/129.html