Bug 48267 - appcenter traceback due to unaccessible /etc directory in docker compose generated app image
appcenter traceback due to unaccessible /etc directory in docker compose gene...
Product: UCS
Classification: Unclassified
Component: App Center
UCS 4.3
Other Linux
: P5 normal (vote)
: ---
Assigned To: App Center maintainers
App Center maintainers
Depends on:
  Show dependency treegraph
Reported: 2018-12-04 18:55 CET by Arvid Requate
Modified: 2018-12-04 19:41 CET (History)
1 user (show)

See Also:
What kind of report is it?: Development Internal
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted after Product Owner Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:

appcenter.log (3.77 MB, text/x-log)
2018-12-04 18:55 CET, Arvid Requate

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2018-12-04 18:55:50 CET
Created attachment 9767 [details]

A colleague today asked for advice because his freshly constructed app image caused the following appcenter-traceback when it was installed:

  4521 actions.upgrade-search           18-12-04 17:02:42 [   DEBUG]: Checking guacamole=2.1.0-99
  4521 actions.upgrade-search           18-12-04 17:02:43 [    INFO]: Executing interface update_available for guacamole
  4521 actions.upgrade-search           18-12-04 17:02:43 [    INFO]: No interface defined
  4521 actions.upgrade-search.progress  18-12-04 17:02:43 [   DEBUG]: 100
  4521 actions.install                  18-12-04 17:02:43 [   ERROR]: [Errno 2] Datei oder Verzeichnis nicht gefunden: u'/var/lib/docker/overlay/ca5f4ff9450e1d8f55948898811466
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/__init__.py", line 226, in call_with_namespace
    result = self.main(namespace)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/install.py", line 73, in main
    return self.do_it(args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/install_base.py", line 109, in do_it
    self._do_it(app, args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/docker_install.py", line 63, in _do_it
    ret = super(Install, self)._do_it(app, args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/install.py", line 90, in _do_it
    if self._install_app(app, args):
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/docker_install.py", line 57, in _install_app
    self._start_docker_image(app, hostdn, password, args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/docker_base.py", line 247, in _start_docker_image
    with open(f_name, 'w+b') as f:
IOError: [Errno 2] Datei oder Verzeichnis nicht gefunden: u'/var/lib/docker/overlay/ca5f4ff9450e1d8f55948898811466092be0ba5a467d63f14342ee56c9cd9347/merged/etc/machine.secret'
 11480 cache                            18-12-04 17:40:19 [   DEBUG]: Cache outdated. Need to rebuild
 11480 cache                            18-12-04 17:40:19 [   DEBUG]: Loaded 161 apps from cache
 11480 cache                            18-12-04 17:40:19 [   DEBUG]: Cache outdated. Need to rebuild
 11480 cache                            18-12-04 17:40:19 [   DEBUG]: Loaded 265 apps from cache
 11480 cache                            18-12-04 17:40:19 [   DEBUG]: Cache outdated. Need to rebuild
 11480 cache                            18-12-04 17:40:20 [   DEBUG]: Loaded 427 apps from cache
 11480 actions.remove                   18-12-04 17:40:20 [   DEBUG]: Calling remove
 11480 actions.remove.progress          18-12-04 17:40:20 [   DEBUG]: 0
 11480 actions.remove                   18-12-04 17:40:20 [    INFO]: Going to remove Guacamole (2.1.0-99)

After debugging this a bit, I think it may be cause by the following to points:

1. The colleague created is new image by docker compose. His Dockerfile
   uses as From line that references an external Docker image and additionally
   copies a file into it. When the image is built, this generates a new
   layer, which probably doesn't have an /etc directory by itself.

2. The appcenter code (the last line in the Python traceback above) tries
   to create a file docker.path("/etc/machine.secret"), which refers to
   a "merged" directory below /var/lib/docker/overlay.

When I look into that directory, there is no directory /etc in the "merged" directory. When use univention-app install with the option --do-not-revert and then run docker inspect on the (restarting) container, I can see the GraphDriver/overlay section. When I manually mount the given overlay fs stack to some temporary directory (e.g. /mnt) via the command below, I can see the /etc directory below the mount point:

root@ucs-4662:~# mount -t overlay overlay -o lowerdir=/var/lib/docker/overlay/d70c80212db488e3d7391801b4333a0cd87570e965873b77c4166c223964c6f7/root,upperdir=/var/lib/docker/overlay/8cc34a5eee9db38a0dd12bed84b1f230de5a55a741d2f8ea687e6ed3a8ad70c7/upper,workdir=/var/lib/docker/overlay/8cc34a5eee9db38a0dd12bed84b1f230de5a55a741d2f8ea687e6ed3a8ad70c7/work /mnt

I would assume that docker does the same, apart from the fact that it uses a different mount target ( /var/lib/docker/overlay/8cc34a5eee9db38a0dd12bed84b1f230de5a55a741d2f8ea687e6ed3a8ad70c7/merged ), but apparently this does not work as expected.
Comment 1 Arvid Requate univentionstaff 2018-12-04 19:32:09 CET
Ok, more debugging showed that the missing etc directory in the "merged" mountpoint is just an artifact: The appcenter started the container, but the program specified as entrypoint in the container image aborted (due to set -e) and then the docker container was in state "Restarting" and the appcenter just accessed "merged" directory in unmounted state.

So, closing as invalid.
Comment 2 Arvid Requate univentionstaff 2018-12-04 19:41:55 CET
Reopening because I think the error handling should be improved here:

* The appcenter should improve the docker.is_running() check before trying to write /etc/machine.secret

* Since we start the containers with RestartPolicy "always" it may well be that this check in unstable