Bug 52456 - App upgrade failed after second docker.pull/verify - leaves app in limbo
Summary: App upgrade failed after second docker.pull/verify - leaves app in limbo
Status: CLOSED FIXED
Alias: None
Product: UCS
Classification: Unclassified
Component: App Center
Version: UCS 4.4
Hardware: Other Linux
: P5 normal
Target Milestone: UCS 4.4-7-errata
Assignee: Felix Botner
QA Contact: Julia Bremer
URL: https://git.knut.univention.de/univen...
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-03 15:20 CET by Felix Botner
Modified: 2020-12-09 13:11 CET (History)
2 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.286
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Customer ID:
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Botner univentionstaff 2020-12-03 15:20:56 CET
Seen on a customer server (4.4-7)

App upgrade nextcloud. Here some lines from the appcenter.log.



Going to upgrade Nextcloud Hub (20.0.2-0)
Upgrading image (u'docker.software-univention.de/nextcloud:20.0.2-0')
...
Verifying Docker registry manifest for app image docker.software-univention.de/nextcloud:20.0.2-0
Pulling Docker image docker.software-univention.de/nextcloud:20.0.2-0
Downloading app image docker.software-univention.de/nextcloud:20.0.2-0
Running command: docker pull docker.software-univention.de/nextcloud:20.0.2-0 
20.0.2-0: Pulling from nextcloud
...
Status: Downloaded newer image for docker.software-univention.de/nextcloud:20.0.2-0

OK, so far, image downloaded

Saving data from old container (nextcloud=19.0.4-0)
...
Removing old container
Setting up new container (nextcloud=20.0.2-0)
Creating data directories for nextcloud...

OK, so far, but at this point, there is no going back, the old container is gone

Database and User already exist
nextcloud=20.0.2-0 already has its database
Registering schema /usr/share/univention-appcenter/apps/nextcloud/nextcloud.schema
Already found cn=nextc-95950727,cn=memberserver,cn=computers,dc=scs,dc=sovereignit,dc=de as a host for nextcloud. Trying to retrieve machine secret.
Verifying Docker registry manifest for app image docker.software-univention.de/nextcloud:20.0.2-0
Downloading app image docker.software-univention.de/nextcloud:20.0.2-0
Running command: docker pull docker.software-univention.de/nextcloud:20.0.2-0
Error response from daemon: Get https://docker.software-univention.de/v2/nextcloud/manifests/20.0.2-0: dial tcp 176.9.114.147:443: i/o timeout
Command docker pull docker.software-univention.de/nextcloud:20.0.2-0 failed with: Error response from daemon: Get https://docker.software-univention.de/v2/nextcloud/manifests/20.0.2-0: dial tcp 176.9.114.147:443: i/o timeout (1)
Releasing LOCK
Downloading Docker image docker.software-univention.de/nextcloud:20.0.2-0 failed: Error response from daemon: Get https://docker.software-univention.de/v2/nextcloud/manifests/20.0.2-0: dial tcp 176.9.114.147:443: i/o timeout

Aborting...

So the update failed during the unnecessary second docker.pull/verify. At this point the old container is gone, and the new not yet created, leaving the app in limbo.


We have to ensure that every resource/setup step that requires a network connection is done before we remove the old container, so that we can safely abort the upgrade in the old state (app). We already have code for that in _upgrade_image() (pull first, remove old, create new).

But during "create new" - _start_docker_image() - we pull the container a second time.
This can be disabled with args.pull_image, which we forgot to set in _upgrade_image (after we pulled the image the first time).

A patch could be

diff --git a/management/univention-appcenter/python/appcenter-docker/actions/docker_upgrade.py b/management/univention-appcenter/python/appcenter-docker/actions/docker_upgrade.py
index cdaf665fc0..d63515acfd 100644
--- a/management/univention-appcenter/python/appcenter-docker/actions/docker_upgrade.py
+++ b/management/univention-appcenter/python/appcenter-docker/actions/docker_upgrade.py
@@ -181,6 +181,7 @@ class Upgrade(Upgrade, Install, DockerActionMixin):
                ucr_save({app.ucr_image_key: None})
                old_configure = args.configure
                args.configure = False
+               args.pull_image = False
                self._install_new_app(app, args)
                self._update_converter_service(app)
                args.configure = old_configure
Comment 1 Felix Botner univentionstaff 2020-12-07 10:19:38 CET
univention-appcenter
0e3432bcc737813bd5c008a60dd582c6e8f297d4 - set args.pull_image = False before installing new app
3e9e4bd17285fccd8c128095b22c884f8e0e822d - removed deprecated verify (already removed in 5.0-0, so dropped from merge request
ce7dbaf21c8bd3bce94a083e757a6f97652a685b - yaml

qa: just upgrade an app with the old version
-> univention-app install nextcloud=19.0.3-0
-> univention-app upgrade nextcloud

you should see two docker pull calls in the appcenter.log, and only one with the new version
Comment 2 Julia Bremer univentionstaff 2020-12-08 10:32:00 CET
Docker pull is only executed once now: OK
Pull request: OK
yaml: OK
Verified