Bug 49063 - The container for 4.3/prometheus=1.1 could not be started!
The container for 4.3/prometheus=1.1 could not be started!
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: App Center
UCS 4.4
Other Linux
: P5 normal (vote)
: UCS 4.4-2
Assigned To: Dirk Wiesenthal
App Center maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-03-22 13:36 CET by Johannes Keiser
Modified: 2019-09-26 10:13 CEST (History)
2 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 3: Will affect average number of installed domains
How will those affected feel about the bug?: 4: A User would return the product
User Pain: 0.343
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2019031321001707
Bug group (optional): External feedback
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Johannes Keiser univentionstaff 2019-03-22 13:36:12 CET
Version: 4.4-0 errata5 (Blumenthal)

The container for 4.3/prometheus=1.1 could not be started!

docker logs 63b135ad418a23227006092002117a789ede87f217de2cb857111e577e136403:
level=info ts=2019-03-13T20:53:40.906068074Z caller=main.go:238 msg="Starting Prometheus" version="(version=2.4.3, branch=HEAD, revision=167a4b4e73a8eca8df648d2d2043e21bdb9a7449)"
level=info ts=2019-03-13T20:53:40.906228469Z caller=main.go:239 build_context="(go=go1.11.1, user=root@1e42b46043e9, date=20181004-08:42:02)"
level=info ts=2019-03-13T20:53:40.906364528Z caller=main.go:240 host_details="(Linux 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3 (2019-02-02) x86_64 prome-01132129 (none))"
level=info ts=2019-03-13T20:53:40.90646956Z caller=main.go:241 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-03-13T20:53:40.906564076Z caller=main.go:242 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-03-13T20:53:40.907441968Z caller=main.go:554 msg="Starting TSDB ..."
level=info ts=2019-03-13T20:53:40.907614082Z caller=main.go:423 msg="Stopping scrape discovery manager..."
level=info ts=2019-03-13T20:53:40.907653105Z caller=main.go:437 msg="Stopping notify discovery manager..."
level=info ts=2019-03-13T20:53:40.907673873Z caller=main.go:459 msg="Stopping scrape manager..."
level=info ts=2019-03-13T20:53:40.907689248Z caller=main.go:433 msg="Notify discovery manager stopped"
level=info ts=2019-03-13T20:53:40.907730666Z caller=main.go:419 msg="Scrape discovery manager stopped"
level=info ts=2019-03-13T20:53:40.907749387Z caller=main.go:453 msg="Scrape manager stopped"
level=info ts=2019-03-13T20:53:40.907797202Z caller=manager.go:638 component="rule manager" msg="Stopping rule manager..."
level=info ts=2019-03-13T20:53:40.907820024Z caller=manager.go:644 component="rule manager" msg="Rule manager stopped"
level=info ts=2019-03-13T20:53:40.907849442Z caller=notifier.go:512 component=notifier msg="Stopping notification manager..."
level=info ts=2019-03-13T20:53:40.907917859Z caller=main.go:608 msg="Notifier manager stopped"
level=info ts=2019-03-13T20:53:40.908007055Z caller=web.go:397 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-03-13T20:53:40.908676951Z caller=web.go:440 component=web msg="router prefix" prefix=/metrics-prometheus
level=error ts=2019-03-13T20:53:40.908836667Z caller=main.go:617 err="opening storage failed: mkdir data/: disk quota exceeded"


dockerd logs:
-- Logs begin at Tue 2017-12-05 01:02:34 CET, end at Wed 2019-03-13 21:53:49 CET. --
Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.501874726+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.502184772+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.511318340+01:00" level=info msg="Loading containers: start."
Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.582756860+01:00" level=warning msg="libcontainerd: client is out of sync, restore was called on a fully synced
container (536af2c00cd7d275b11f3b6fa63b077fa82786c78f4bee33e31d7d228d84f54c)."
Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.583399690+01:00" level=warning msg="libcontainerd: failed to retrieve container
536af2c00cd7d275b11f3b6fa63b077fa82786c78f4bee33e31d7d228d84f54c state: rpc error: code = 2 desc = containerd: container not found"
Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.583480511+01:00" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount
/var/lib/docker/containers/536af2c00cd7d275b11f3b6fa63b077fa82786c78f4bee33e31d7d228d84f54c/shm: invalid argument"
Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.888857324+01:00" level=warning msg="libcontainerd: client is out of sync, restore was called on a fully synced
container (acfce2862950e44cc380bc00613c4a4d9ce32b80c96d455d5cc05c06b0c9852d)."
Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.889673907+01:00" level=warning msg="libcontainerd: failed to retrieve container
acfce2862950e44cc380bc00613c4a4d9ce32b80c96d455d5cc05c06b0c9852d state: rpc error: code = 2 desc = containerd: container not found"
Mär 13 21:38:37 ucs dockerd[1398]: time="2019-03-13T21:38:37.890128554+01:00" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount
/var/lib/docker/containers/acfce2862950e44cc380bc00613c4a4d9ce32b80c96d455d5cc05c06b0c9852d/shm: invalid argument"
Mär 13 21:38:38 ucs dockerd[1398]: time="2019-03-13T21:38:38.496981803+01:00" level=info msg="Firewalld running: false"
Mär 13 21:38:38 ucs dockerd[1398]: time="2019-03-13T21:38:38.822067469+01:00" level=info msg="Removing stale sandbox
de88cf73683422562551df227c837c3cfba4caf19649103703d4724154252d1b (acfce2862950e44cc380bc00613c4a4d9ce32b80c96d455d5cc05c06b0c9852d)"
Mär 13 21:38:39 ucs dockerd[1398]: time="2019-03-13T21:38:39.513990276+01:00" level=info msg="Removing stale sandbox
fa8352c6e442a65b64fadcc215302b0336e8baf98b0ef60d5087588e8ab783d5 (536af2c00cd7d275b11f3b6fa63b077fa82786c78f4bee33e31d7d228d84f54c)"
Mär 13 21:38:43 ucs dockerd[1398]: time="2019-03-13T21:38:43+01:00" level=info msg="Firewalld running: false"
Mär 13 21:38:43 ucs dockerd[1398]: time="2019-03-13T21:38:43+01:00" level=info msg="Firewalld running: false"
Mär 13 21:38:45 ucs dockerd[1398]: time="2019-03-13T21:38:45.097957253+01:00" level=info msg="Loading containers: done."
Mär 13 21:38:45 ucs dockerd[1398]: time="2019-03-13T21:38:45.191108885+01:00" level=warning msg="failed to retrieve docker-init version: unknown output format: tini version
0.13.0\n"
Mär 13 21:38:45 ucs dockerd[1398]: time="2019-03-13T21:38:45.227558407+01:00" level=info msg="Daemon has completed initialization"
Mär 13 21:38:45 ucs dockerd[1398]: time="2019-03-13T21:38:45.228232759+01:00" level=info msg="Docker daemon" commit=092cba3 graphdriver=overlay version=1.13.1
Mär 13 21:38:45 ucs dockerd[1398]: time="2019-03-13T21:38:45.238840138+01:00" level=info msg="API listen on /var/run/docker.sock"
Mär 13 21:48:41 ucs dockerd[1398]: time="2019-03-13T21:48:41.877437618+01:00" level=warning msg="failed to retrieve docker-init version: unknown output format: tini version
0.13.0\n"


docker inspect:
{u'Status': u'exited', u'Pid': 0, u'OOMKilled': False, u'Dead': False, u'Paused': False, u'Running': False, u'FinishedAt': u'2019-03-13T20:53:41.534674269Z', u'Restarting': False,
u'Error': u'', u'StartedAt': u'2019-03-13T20:53:40.877539979Z', u'ExitCode': 1}
{u'Data': {u'MergedDir': u'/var/lib/docker/overlay/33d6a3233d8e191a62eab065e60a8c43fe6271e38ccc2b11cc7d14b337255a0b/merged', u'WorkDir':
u'/var/lib/docker/overlay/33d6a3233d8e191a62eab065e60a8c43fe6271e38ccc2b11cc7d14b337255a0b/work', u'LowerDir':
u'/var/lib/docker/overlay/42deeb2fae0e430ab0ad30e0f13880bada1dfe95af3bad23d4ce189db68b572a/root', u'UpperDir':
u'/var/lib/docker/overlay/33d6a3233d8e191a62eab065e60a8c43fe6271e38ccc2b11cc7d14b337255a0b/upper'}, u'Name': u'overlay'}

Role: domaincontroller_master
Comment 1 Ingo Steuwer univentionstaff 2019-09-25 20:29:59 CEST
I can reproduce this with all UCS 4.4-2 instances I start in EC2.

I raise the User Pain as this leads to an invalid apache configuration after installing prometheus: as the docker container can't be started the configuration fails to retrive some information from the container and corrupts the apache configuration. Apache will not (re)start afterwards.



# systemctl status apache2.service
● apache2.service - The Apache HTTP Server
   Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
   Active: active (running) (Result: exit-code) since Wed 2019-09-25 17:31:59 CEST; 2h 54min ago
  Process: 25045 ExecReload=/usr/sbin/apachectl graceful (code=exited, status=1/FAILURE)
 Main PID: 14476 (apache2)
    Tasks: 8 (limit: 4915)
   Memory: 28.7M
      CPU: 2.414s
   CGroup: /system.slice/apache2.service
           ├─ 5903 /usr/sbin/apache2 -k start
           ├─ 5904 /usr/sbin/apache2 -k start
           ├─ 5905 /usr/sbin/apache2 -k start
           ├─ 5906 /usr/sbin/apache2 -k start
           ├─ 5907 /usr/sbin/apache2 -k start
           ├─ 6055 /usr/sbin/apache2 -k start
           ├─14476 /usr/sbin/apache2 -k start
           └─26720 /usr/sbin/apache2 -k start

Sep 25 20:20:31 dcm apachectl[23626]: The Apache error log may have more information.
Sep 25 20:20:31 dcm systemd[1]: apache2.service: Control process exited, code=exited status=1
Sep 25 20:20:31 dcm systemd[1]: Reload failed for The Apache HTTP Server.
Sep 25 20:22:01 dcm systemd[1]: Reloading The Apache HTTP Server.
Sep 25 20:22:01 dcm apachectl[25045]: AH00526: Syntax error on line 8 of /etc/apache2/ucs-sites.conf.d/prometheus.conf:
Sep 25 20:22:01 dcm apachectl[25045]: Bad LDAP URL while parsing.
Sep 25 20:22:01 dcm apachectl[25045]: Action 'graceful' failed.
Sep 25 20:22:01 dcm apachectl[25045]: The Apache error log may have more information.
Sep 25 20:22:01 dcm systemd[1]: apache2.service: Control process exited, code=exited status=1
Sep 25 20:22:01 dcm systemd[1]: Reload failed for The Apache HTTP Server.



# less /etc/apache2/ucs-sites.conf.d/prometheus.conf
LDAPTrustedMode TLS
<Location "/metrics-prometheus/">
        AuthName "Prometheus Access"
        AuthType Basic
        require valid-user
        <IfModule mod_authnz_ldap.c>
                AuthBasicProvider ldap
                AuthLDAPUrl "ldap://:/?uid?sub?(objectClass=*)"
                AuthLDAPBindDN ""
                AuthLDAPBindPassword "exec:/bin/cat /etc/prometheus_ldap.secret"
        </IfModule>
        ProxyPass http://127.0.0.1:9090/metrics-prometheus/ retry=0
        ProxyPassReverse http://127.0.0.1:9090/metrics-prometheus/
</Location>
Comment 2 Dirk Wiesenthal univentionstaff 2019-09-26 10:13:09 CEST
Fixed by changing group ownership of certain (directories that are essentially) volumes.

Already live. App should be installable