Univention Bugzilla – Bug 49063
The container for 4.3/prometheus=1.1 could not be started!
Last modified: 2019-09-26 10:13:39 CEST
Version: 4.4-0 errata5 (Blumenthal) The container for 4.3/prometheus=1.1 could not be started! docker logs 63b135ad418a23227006092002117a789ede87f217de2cb857111e577e136403: level=info ts=2019-03-13T20:53:40.906068074Z caller=main.go:238 msg="Starting Prometheus" version="(version=2.4.3, branch=HEAD, revision=167a4b4e73a8eca8df648d2d2043e21bdb9a7449)" level=info ts=2019-03-13T20:53:40.906228469Z caller=main.go:239 build_context="(go=go1.11.1, user=root@1e42b46043e9, date=20181004-08:42:02)" level=info ts=2019-03-13T20:53:40.906364528Z caller=main.go:240 host_details="(Linux 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3 (2019-02-02) x86_64 prome-01132129 (none))" level=info ts=2019-03-13T20:53:40.90646956Z caller=main.go:241 fd_limits="(soft=1048576, hard=1048576)" level=info ts=2019-03-13T20:53:40.906564076Z caller=main.go:242 vm_limits="(soft=unlimited, hard=unlimited)" level=info ts=2019-03-13T20:53:40.907441968Z caller=main.go:554 msg="Starting TSDB ..." level=info ts=2019-03-13T20:53:40.907614082Z caller=main.go:423 msg="Stopping scrape discovery manager..." level=info ts=2019-03-13T20:53:40.907653105Z caller=main.go:437 msg="Stopping notify discovery manager..." level=info ts=2019-03-13T20:53:40.907673873Z caller=main.go:459 msg="Stopping scrape manager..." level=info ts=2019-03-13T20:53:40.907689248Z caller=main.go:433 msg="Notify discovery manager stopped" level=info ts=2019-03-13T20:53:40.907730666Z caller=main.go:419 msg="Scrape discovery manager stopped" level=info ts=2019-03-13T20:53:40.907749387Z caller=main.go:453 msg="Scrape manager stopped" level=info ts=2019-03-13T20:53:40.907797202Z caller=manager.go:638 component="rule manager" msg="Stopping rule manager..." level=info ts=2019-03-13T20:53:40.907820024Z caller=manager.go:644 component="rule manager" msg="Rule manager stopped" level=info ts=2019-03-13T20:53:40.907849442Z caller=notifier.go:512 component=notifier msg="Stopping notification manager..." level=info ts=2019-03-13T20:53:40.907917859Z caller=main.go:608 msg="Notifier manager stopped" level=info ts=2019-03-13T20:53:40.908007055Z caller=web.go:397 component=web msg="Start listening for connections" address=0.0.0.0:9090 level=info ts=2019-03-13T20:53:40.908676951Z caller=web.go:440 component=web msg="router prefix" prefix=/metrics-prometheus level=error ts=2019-03-13T20:53:40.908836667Z caller=main.go:617 err="opening storage failed: mkdir data/: disk quota exceeded" dockerd logs: -- Logs begin at Tue 2017-12-05 01:02:34 CET, end at Wed 2019-03-13 21:53:49 CET. -- Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.501874726+01:00" level=warning msg="Your kernel does not support cgroup rt period" Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.502184772+01:00" level=warning msg="Your kernel does not support cgroup rt runtime" Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.511318340+01:00" level=info msg="Loading containers: start." Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.582756860+01:00" level=warning msg="libcontainerd: client is out of sync, restore was called on a fully synced container (536af2c00cd7d275b11f3b6fa63b077fa82786c78f4bee33e31d7d228d84f54c)." Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.583399690+01:00" level=warning msg="libcontainerd: failed to retrieve container 536af2c00cd7d275b11f3b6fa63b077fa82786c78f4bee33e31d7d228d84f54c state: rpc error: code = 2 desc = containerd: container not found" Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.583480511+01:00" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/536af2c00cd7d275b11f3b6fa63b077fa82786c78f4bee33e31d7d228d84f54c/shm: invalid argument" Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.888857324+01:00" level=warning msg="libcontainerd: client is out of sync, restore was called on a fully synced container (acfce2862950e44cc380bc00613c4a4d9ce32b80c96d455d5cc05c06b0c9852d)." Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.889673907+01:00" level=warning msg="libcontainerd: failed to retrieve container acfce2862950e44cc380bc00613c4a4d9ce32b80c96d455d5cc05c06b0c9852d state: rpc error: code = 2 desc = containerd: container not found" Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.890128554+01:00" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/acfce2862950e44cc380bc00613c4a4d9ce32b80c96d455d5cc05c06b0c9852d/shm: invalid argument" Mär 13 21:38:38 ucs dockerd[1398]: time="2019-03-13T21:38:38.496981803+01:00" level=info msg="Firewalld running: false" Mär 13 21:38:38 ucs dockerd[1398]: time="2019-03-13T21:38:38.822067469+01:00" level=info msg="Removing stale sandbox de88cf73683422562551df227c837c3cfba4caf19649103703d4724154252d1b (acfce2862950e44cc380bc00613c4a4d9ce32b80c96d455d5cc05c06b0c9852d)" Mär 13 21:38:39 ucs dockerd[1398]: time="2019-03-13T21:38:39.513990276+01:00" level=info msg="Removing stale sandbox fa8352c6e442a65b64fadcc215302b0336e8baf98b0ef60d5087588e8ab783d5 (536af2c00cd7d275b11f3b6fa63b077fa82786c78f4bee33e31d7d228d84f54c)" Mär 13 21:38:43 ucs dockerd[1398]: time="2019-03-13T21:38:43+01:00" level=info msg="Firewalld running: false" Mär 13 21:38:43 ucs dockerd[1398]: time="2019-03-13T21:38:43+01:00" level=info msg="Firewalld running: false" Mär 13 21:38:45 ucs dockerd[1398]: time="2019-03-13T21:38:45.097957253+01:00" level=info msg="Loading containers: done." Mär 13 21:38:45 ucs dockerd[1398]: time="2019-03-13T21:38:45.191108885+01:00" level=warning msg="failed to retrieve docker-init version: unknown output format: tini version 0.13.0\n" Mär 13 21:38:45 ucs dockerd[1398]: time="2019-03-13T21:38:45.227558407+01:00" level=info msg="Daemon has completed initialization" Mär 13 21:38:45 ucs dockerd[1398]: time="2019-03-13T21:38:45.228232759+01:00" level=info msg="Docker daemon" commit=092cba3 graphdriver=overlay version=1.13.1 Mär 13 21:38:45 ucs dockerd[1398]: time="2019-03-13T21:38:45.238840138+01:00" level=info msg="API listen on /var/run/docker.sock" Mär 13 21:48:41 ucs dockerd[1398]: time="2019-03-13T21:48:41.877437618+01:00" level=warning msg="failed to retrieve docker-init version: unknown output format: tini version 0.13.0\n" docker inspect: {u'Status': u'exited', u'Pid': 0, u'OOMKilled': False, u'Dead': False, u'Paused': False, u'Running': False, u'FinishedAt': u'2019-03-13T20:53:41.534674269Z', u'Restarting': False, u'Error': u'', u'StartedAt': u'2019-03-13T20:53:40.877539979Z', u'ExitCode': 1} {u'Data': {u'MergedDir': u'/var/lib/docker/overlay/33d6a3233d8e191a62eab065e60a8c43fe6271e38ccc2b11cc7d14b337255a0b/merged', u'WorkDir': u'/var/lib/docker/overlay/33d6a3233d8e191a62eab065e60a8c43fe6271e38ccc2b11cc7d14b337255a0b/work', u'LowerDir': u'/var/lib/docker/overlay/42deeb2fae0e430ab0ad30e0f13880bada1dfe95af3bad23d4ce189db68b572a/root', u'UpperDir': u'/var/lib/docker/overlay/33d6a3233d8e191a62eab065e60a8c43fe6271e38ccc2b11cc7d14b337255a0b/upper'}, u'Name': u'overlay'} Role: domaincontroller_master
I can reproduce this with all UCS 4.4-2 instances I start in EC2. I raise the User Pain as this leads to an invalid apache configuration after installing prometheus: as the docker container can't be started the configuration fails to retrive some information from the container and corrupts the apache configuration. Apache will not (re)start afterwards. # systemctl status apache2.service ● apache2.service - The Apache HTTP Server Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled) Active: active (running) (Result: exit-code) since Wed 2019-09-25 17:31:59 CEST; 2h 54min ago Process: 25045 ExecReload=/usr/sbin/apachectl graceful (code=exited, status=1/FAILURE) Main PID: 14476 (apache2) Tasks: 8 (limit: 4915) Memory: 28.7M CPU: 2.414s CGroup: /system.slice/apache2.service ├─ 5903 /usr/sbin/apache2 -k start ├─ 5904 /usr/sbin/apache2 -k start ├─ 5905 /usr/sbin/apache2 -k start ├─ 5906 /usr/sbin/apache2 -k start ├─ 5907 /usr/sbin/apache2 -k start ├─ 6055 /usr/sbin/apache2 -k start ├─14476 /usr/sbin/apache2 -k start └─26720 /usr/sbin/apache2 -k start Sep 25 20:20:31 dcm apachectl[23626]: The Apache error log may have more information. Sep 25 20:20:31 dcm systemd[1]: apache2.service: Control process exited, code=exited status=1 Sep 25 20:20:31 dcm systemd[1]: Reload failed for The Apache HTTP Server. Sep 25 20:22:01 dcm systemd[1]: Reloading The Apache HTTP Server. Sep 25 20:22:01 dcm apachectl[25045]: AH00526: Syntax error on line 8 of /etc/apache2/ucs-sites.conf.d/prometheus.conf: Sep 25 20:22:01 dcm apachectl[25045]: Bad LDAP URL while parsing. Sep 25 20:22:01 dcm apachectl[25045]: Action 'graceful' failed. Sep 25 20:22:01 dcm apachectl[25045]: The Apache error log may have more information. Sep 25 20:22:01 dcm systemd[1]: apache2.service: Control process exited, code=exited status=1 Sep 25 20:22:01 dcm systemd[1]: Reload failed for The Apache HTTP Server. # less /etc/apache2/ucs-sites.conf.d/prometheus.conf LDAPTrustedMode TLS <Location "/metrics-prometheus/"> AuthName "Prometheus Access" AuthType Basic require valid-user <IfModule mod_authnz_ldap.c> AuthBasicProvider ldap AuthLDAPUrl "ldap://:/?uid?sub?(objectClass=*)" AuthLDAPBindDN "" AuthLDAPBindPassword "exec:/bin/cat /etc/prometheus_ldap.secret" </IfModule> ProxyPass http://127.0.0.1:9090/metrics-prometheus/ retry=0 ProxyPassReverse http://127.0.0.1:9090/metrics-prometheus/ </Location>
Fixed by changing group ownership of certain (directories that are essentially) volumes. Already live. App should be installable