Bug 56341 - all *_METRICS_MISSING alerts fired due to timestamp in prom files
all *_METRICS_MISSING alerts fired due to timestamp in prom files
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Monitoring (Prometheus or Nagios)
UCS 5.0
Other Linux
: P5 normal (vote)
: UCS 5.0-4-errata
Assigned To: Florian Best
Christian Castens
:
Depends on: 55367
Blocks:
  Show dependency treegraph
 
Reported: 2023-07-19 17:53 CEST by Florian Best
Modified: 2023-07-19 18:36 CEST (History)
0 users

See Also:
What kind of report is it?: Development Internal
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Best univentionstaff 2023-07-19 17:53:47 CEST
The change introduced an incompatibility with the prometheus-node-exporter version.

https://github.com/prometheus/node_exporter/issues/1284

The .prom files are rejected by the prometheus-node-exporter because it doesn't accept timestamps in the prom files:

e.g. journalctl -u prometheus-node-exporter:
Jul 19 17:47:07 primary prometheus-node-exporter[730]: time="2023-07-19T17:47:07+02:00" level=error msg="Textfile \"/var/lib/prometheus/node-exporter/check_univention_winbind.prom\" contains unsupported client-side timestamps, skipping entire file" source="textfile.go:216"

This causes all *_METRICS_MISSING alerts to be fired.

+++ This bug was initially created as a clone of Bug #55367 +++

The node collector collects every N times all metrics from the .prom files.
This is a little bit inaccurate because we don't write the timestamp when the metrics were actually created. This can easily be added as the prom format allows to add a timestamp:

https://prometheus.io/docs/instrumenting/exposition_formats/

metric_name [
  "{" label_name "=" `"` label_value `"` { "," label_name "=" `"` label_value `"` } [ "," ] "}"
] value [ timestamp ]

The timestamp is an int64 (milliseconds since epoch, i.e. 1970-01-01 00:00:00 UTC, excluding leap seconds), represented as required by Go's ParseInt() function.

We also use a self-written library but Debian also provides python3-prometheus-client, which does the protocol handling. We should use that instead.
Comment 1 Florian Best univentionstaff 2023-07-19 18:01:14 CEST
Writing timestmaps into the .prom files has been reverted:

univention-monitoring-client.yaml
67e118d482b7 | Revert "feat(monitoring): add timestamp to metrics to be more accurate"

univention-monitoring-client (1.0.2-5)
67e118d482b7 | Revert "feat(monitoring): add timestamp to metrics to be more accurate"
Comment 2 Christian Castens univentionstaff 2023-07-19 18:15:53 CEST
QA:
  changes from Bug #55367 have been reverted: OK
  advisories: OK