Bug 54754 - 50prometheus.inst fails during database migration
50prometheus.inst fails during database migration
Status: NEW
Product: UCS
Classification: Unclassified
Component: UCS Dashboard
UCS 4.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2022-05-13 10:23 CEST by Ingo Steuwer
Modified: 2022-05-16 14:25 CEST (History)
1 user (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ingo Steuwer univentionstaff 2022-05-13 10:23:47 CEST
I upgraded the Prometheus App on UCS 4.4 to the latest version.

The database migration fails and so does the joinscript for prometheus. There have been several attempts to run the script, most of them while upgrading prometheus and the other UCS Dashboard Apps and at least one is a manuel attempt by me.

join.log:

-----------

RUNNING 50prometheus.inst
2022-05-11 17:49:24.548625005+02:00 (in joinscript_init)
Compatibility rules not yet installed
EXITCODE=1
df78dc1c-f6c5-4245-97ae-9a39b3a20c36

-----------

RUNNING 50prometheus.inst
2022-05-11 17:49:49.464148850+02:00 (in joinscript_init)
/etc/prometheus/16-compatibility-rules.yml
level=info backfiller="new rule importer" start="09 Apr 22 15:49 UTC" end="11 May 22 12:49 UTC"
level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-stat
level=info backfiller="processing rule" id=0 name=node_boot_time_seconds
level=info backfiller="processing rule" id=1 name=node_time_seconds
Could not migrate database
EXITCODE=137
e8810c82-53c2-4dad-b8a8-99853905082e

-----------

RUNNING 50prometheus.inst
2022-05-11 17:58:11.308640137+02:00 (in joinscript_init)
/etc/prometheus/16-compatibility-rules.yml
level=info backfiller="new rule importer" start="09 Apr 22 15:58 UTC" end="11 May 22 12:58 UTC"
level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-interrupts
level=info backfiller="processing rule" id=0 name=node_interrupts_total
Could not migrate database
EXITCODE=137
b4e5dba2-4396-459a-9b36-34dbee16592e

-----------

RUNNING 50prometheus.inst
2022-05-11 18:38:56.070291151+02:00 (in joinscript_init)
/etc/prometheus/16-compatibility-rules.yml
level=info backfiller="new rule importer" start="09 Apr 22 16:38 UTC" end="11 May 22 13:38 UTC"
level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-cpu
level=info backfiller="processing rule" id=0 name=node_cpu
level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-diskstats
level=info backfiller="processing rule" id=0 name=node_disk_read_bytes_total
level=info backfiller="processing rule" id=1 name=node_disk_written_bytes_total
level=info backfiller="processing rule" id=2 name=node_disk_io_time_seconds_total
Could not migrate database
EXITCODE=137
8762e2cc-9b6c-4194-8f57-7303f8665ece

-----------

RUNNING 50prometheus.inst
2022-05-11 18:52:45.942854126+02:00 (in joinscript_init)
/etc/prometheus/16-compatibility-rules.yml
level=info backfiller="new rule importer" start="09 Apr 22 16:52 UTC" end="11 May 22 13:52 UTC"
level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-filesystem
level=info backfiller="processing rule" id=0 name=node_filesystem_free_bytes
level=info backfiller="processing rule" id=1 name=node_filesystem_avail_bytes
level=info backfiller="processing rule" id=2 name=node_filesystem_size_bytes
level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-infiniband
level=info backfiller="processing rule" id=0 name=node_infiniband_port_data_received_bytes_total
level=info backfiller="processing rule" id=1 name=node_infiniband_port_data_transmitted_bytes_total
level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-interrupts
level=info backfiller="processing rule" id=0 name=node_interrupts_total
level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-memory
level=info backfiller="processing rule" id=0 name=node_memory_Active_bytes
level=info backfiller="processing rule" id=1 name=node_memory_Active_anon_bytes
level=info backfiller="processing rule" id=2 name=node_memory_Active_file_bytes
level=info backfiller="processing rule" id=3 name=node_memory_AnonHugePages_bytes
level=info backfiller="processing rule" id=4 name=node_memory_AnonPages_bytes
level=info backfiller="processing rule" id=5 name=node_memory_Bounce_bytes
Could not migrate database
EXITCODE=137
665f1329-4278-4745-9d13-7e44609d0900

-----------

RUNNING 50prometheus.inst
2022-05-12 11:45:01.887201736+02:00 (in joinscript_init)
/etc/prometheus/16-compatibility-rules.yml
level=info backfiller="new rule importer" start="10 Apr 22 09:45 UTC" end="12 May 22 06:45 UTC"
level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-buddyinfo
level=info backfiller="processing rule" id=0 name=node_buddyinfo_blocks
level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-stat
level=info backfiller="processing rule" id=0 name=node_boot_time_seconds
level=info backfiller="processing rule" id=1 name=node_time_seconds
Could not migrate database
EXITCODE=137
22712a71-bf89-4d0d-ad2d-520d6b32f82d

-----------

The UCS Dashboard works fine but shows only data collected after the App upgrade.
Comment 1 Ingo Steuwer univentionstaff 2022-05-13 10:27:24 CEST
App Versions installed now:

prometheus
  Name: UCS Dashboard Database
  Latest version: 2.35.0

prometheus-node-exporter
  Name: UCS Dashboard Client
  Latest version: 1.2

admin-dashboard
  Name: UCS Dashboard
  Latest version: 2.0
Comment 2 Ingo Steuwer univentionstaff 2022-05-13 14:09:33 CEST
potential cause: not enough memory (2Gb RAM + 2GB SWAP were mostly utilized while running the join script)
Comment 3 Florian Best univentionstaff 2022-05-16 14:25:25 CEST
I think we cannot fix this easily.

Idea:
* split the timespan into chunks and do migrations e.g. for each day separately

The README already mentions the high RAM consumption.