Univention Bugzilla – Bug 54754
50prometheus.inst fails during database migration
Last modified: 2022-05-16 14:25:25 CEST
I upgraded the Prometheus App on UCS 4.4 to the latest version. The database migration fails and so does the joinscript for prometheus. There have been several attempts to run the script, most of them while upgrading prometheus and the other UCS Dashboard Apps and at least one is a manuel attempt by me. join.log: ----------- RUNNING 50prometheus.inst 2022-05-11 17:49:24.548625005+02:00 (in joinscript_init) Compatibility rules not yet installed EXITCODE=1 df78dc1c-f6c5-4245-97ae-9a39b3a20c36 ----------- RUNNING 50prometheus.inst 2022-05-11 17:49:49.464148850+02:00 (in joinscript_init) /etc/prometheus/16-compatibility-rules.yml level=info backfiller="new rule importer" start="09 Apr 22 15:49 UTC" end="11 May 22 12:49 UTC" level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-stat level=info backfiller="processing rule" id=0 name=node_boot_time_seconds level=info backfiller="processing rule" id=1 name=node_time_seconds Could not migrate database EXITCODE=137 e8810c82-53c2-4dad-b8a8-99853905082e ----------- RUNNING 50prometheus.inst 2022-05-11 17:58:11.308640137+02:00 (in joinscript_init) /etc/prometheus/16-compatibility-rules.yml level=info backfiller="new rule importer" start="09 Apr 22 15:58 UTC" end="11 May 22 12:58 UTC" level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-interrupts level=info backfiller="processing rule" id=0 name=node_interrupts_total Could not migrate database EXITCODE=137 b4e5dba2-4396-459a-9b36-34dbee16592e ----------- RUNNING 50prometheus.inst 2022-05-11 18:38:56.070291151+02:00 (in joinscript_init) /etc/prometheus/16-compatibility-rules.yml level=info backfiller="new rule importer" start="09 Apr 22 16:38 UTC" end="11 May 22 13:38 UTC" level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-cpu level=info backfiller="processing rule" id=0 name=node_cpu level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-diskstats level=info backfiller="processing rule" id=0 name=node_disk_read_bytes_total level=info backfiller="processing rule" id=1 name=node_disk_written_bytes_total level=info backfiller="processing rule" id=2 name=node_disk_io_time_seconds_total Could not migrate database EXITCODE=137 8762e2cc-9b6c-4194-8f57-7303f8665ece ----------- RUNNING 50prometheus.inst 2022-05-11 18:52:45.942854126+02:00 (in joinscript_init) /etc/prometheus/16-compatibility-rules.yml level=info backfiller="new rule importer" start="09 Apr 22 16:52 UTC" end="11 May 22 13:52 UTC" level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-filesystem level=info backfiller="processing rule" id=0 name=node_filesystem_free_bytes level=info backfiller="processing rule" id=1 name=node_filesystem_avail_bytes level=info backfiller="processing rule" id=2 name=node_filesystem_size_bytes level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-infiniband level=info backfiller="processing rule" id=0 name=node_infiniband_port_data_received_bytes_total level=info backfiller="processing rule" id=1 name=node_infiniband_port_data_transmitted_bytes_total level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-interrupts level=info backfiller="processing rule" id=0 name=node_interrupts_total level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-memory level=info backfiller="processing rule" id=0 name=node_memory_Active_bytes level=info backfiller="processing rule" id=1 name=node_memory_Active_anon_bytes level=info backfiller="processing rule" id=2 name=node_memory_Active_file_bytes level=info backfiller="processing rule" id=3 name=node_memory_AnonHugePages_bytes level=info backfiller="processing rule" id=4 name=node_memory_AnonPages_bytes level=info backfiller="processing rule" id=5 name=node_memory_Bounce_bytes Could not migrate database EXITCODE=137 665f1329-4278-4745-9d13-7e44609d0900 ----------- RUNNING 50prometheus.inst 2022-05-12 11:45:01.887201736+02:00 (in joinscript_init) /etc/prometheus/16-compatibility-rules.yml level=info backfiller="new rule importer" start="10 Apr 22 09:45 UTC" end="12 May 22 06:45 UTC" level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-buddyinfo level=info backfiller="processing rule" id=0 name=node_buddyinfo_blocks level=info backfiller="processing group" name=/etc/prometheus/16-compatibility-rules.yml;node_exporter-16-stat level=info backfiller="processing rule" id=0 name=node_boot_time_seconds level=info backfiller="processing rule" id=1 name=node_time_seconds Could not migrate database EXITCODE=137 22712a71-bf89-4d0d-ad2d-520d6b32f82d ----------- The UCS Dashboard works fine but shows only data collected after the App upgrade.
App Versions installed now: prometheus Name: UCS Dashboard Database Latest version: 2.35.0 prometheus-node-exporter Name: UCS Dashboard Client Latest version: 1.2 admin-dashboard Name: UCS Dashboard Latest version: 2.0
potential cause: not enough memory (2Gb RAM + 2GB SWAP were mostly utilized while running the join script)
I think we cannot fix this easily. Idea: * split the timespan into chunks and do migrations e.g. for each day separately The README already mentions the high RAM consumption.