Bug 39429 - Handle a restart of docker gracefully
Handle a restart of docker gracefully
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Docker
UNSTABLE
Other Linux
: P5 normal (vote)
: UCS 4.1
Assigned To: Daniel Tröder
Felix Botner
: interim-2
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-29 14:24 CEST by Dirk Wiesenthal
Modified: 2015-11-17 12:12 CET (History)
2 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Wiesenthal univentionstaff 2015-09-29 14:24:03 CEST
When docker.io is restarted
  invoke-rc.d docker restart
it hard kills all containers.

Those containers do not come back again unless manually started.

We may want to patch the init script of docker to send stop to the containers (and wait) when docker is stopped. We may also want to save the state of the containers prior to stop and restart those containers.

Also check how docker behaves after a reboot of the host. And how the apps behave. They should be automatically started depending on $appid/autostart.

Does the init script of the apps wait until docker is actually up and running?
Comment 1 Daniel Tröder univentionstaff 2015-10-06 18:27:21 CEST
Commit 64262: fix univention-firewall if Docker has not yet run
Commit 64270: make docker, app-container and non-app-container {start,stop,restart} in correct order

START
1. docker.io is started
2. if containers were stopped on shutdown of docker.io they are started now
3. app-containers are started

STOP
1. app-containers are stopped
2. (docker.io init script) all running (non-app) containers are stopped, their IDs safed for START.2
3. docker.io is stopped

init order is hard coded, grep for 40/15 and 41/14 in r64270.
Comment 2 Stefan Gohmann univentionstaff 2015-10-07 06:23:41 CEST
(In reply to Daniel Tröder from comment #1)
> Commit 64270: make docker, app-container and non-app-container
> {start,stop,restart} in correct order

This seems to break the App tests. Simply try this:
 ucr set repository/online/unmaintained='yes'
 univention-install ucs-test-docker
 /usr/share/ucs-test/80_docker/55_app_modproxy -f

From the command line output:

Going to remove np1gw082xh (9.1.7)
Stopping np1gw082xh Container 2f896c594db7e0cf20b452c7efb2aedd4375e5995d07177568f127e9220c71eb ....
2f896c594db7
2f896c594db7e0cf20b452c7efb2aedd4375e5995d07177568f127e9220c71eb
File: /etc/univention/service.info/services/univention-appcenter.cfg
Multifile: /etc/apache2/sites-available/default-ssl
Multifile: /etc/apache2/sites-available/default
Registering UCR for np1gw082xh
Removing localhost from LDAP object
Removing LDAP object
Setting overview variables
File: /var/www/ucs-overview/entries.json
Reloading web server config: apache2.
update-rc.d -f docker-app-np1gw082xh remove
 Removing any system startup links for /etc/init.d/docker-app-np1gw082xh ...
   /etc/rc0.d/K14docker-app-np1gw082xh
   /etc/rc1.d/K14docker-app-np1gw082xh
   /etc/rc2.d/S41docker-app-np1gw082xh
   /etc/rc3.d/S41docker-app-np1gw082xh
   /etc/rc4.d/S41docker-app-np1gw082xh
   /etc/rc5.d/S41docker-app-np1gw082xh
   /etc/rc6.d/K14docker-app-np1gw082xh

Removing /etc/init.d/docker-app-np1gw082xh
File: /usr/share/univention-management-console/modules/apps.xml

File: /usr/share/univention-management-console/i18n/de/apps.mo

File: /etc/apt/apt.conf.d/55user_agent

Downloading "https://master441.deadlock44.intranet/meta-inf/categories.ini"...
Downloading "https://master441.deadlock44.intranet/meta-inf/4.1/index.json.gz"...
0 file(s) are new
'module' object has no attribute 'rm'
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/__init__.py", line 182, in call_with_namespace
    result = self.main(namespace)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/remove.py", line 59, in main
    self.do_it(args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/install_base.py", line 106, in do_it
    self._do_it(app, args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/docker_remove.py", line 54, in _do_it
    super(Remove, self)._do_it(app, args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/remove.py", line 67, in _do_it
    self._unregister_app(app, args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/docker_remove.py", line 84, in _unregister_app
    return super(Remove, self)._unregister_app(app, args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/remove.py", line 95, in _unregister_app
    os.rm(init_script)
AttributeError: 'module' object has no attribute 'rm'
Traceback (most recent call last):
  File "/usr/bin/univention-app", line 84, in <module>
    main()
  File "/usr/bin/univention-app", line 74, in main
    ret = args.func(args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/__init__.py", line 182, in call_with_namespace
    result = self.main(namespace)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/remove.py", line 59, in main
    self.do_it(args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/install_base.py", line 106, in do_it
    self._do_it(app, args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/docker_remove.py", line 54, in _do_it
    super(Remove, self)._do_it(app, args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/remove.py", line 67, in _do_it
    self._unregister_app(app, args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/docker_remove.py", line 84, in _unregister_app
    return super(Remove, self)._unregister_app(app, args)
  File "/usr/lib/pymodules/python2.7/univention/appcenter/actions/remove.py", line 95, in _unregister_app
    os.rm(init_script)
AttributeError: 'module' object has no attribute 'rm'
Cleanup after exception: <class 'dockertest.UCSTest_DockerApp_RemoveFailed'>
Comment 3 Daniel Tröder univentionstaff 2015-10-07 09:38:23 CEST
Ah great - I didn't know how to test the code.

Actually I found out, that the symlink is removed by other code already, so the offending code has been removed.

Fixed in 64296.
Comment 4 Felix Botner univentionstaff 2015-10-16 13:15:30 CEST
Seems to work,
but is this correct ->

-> started docker container
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER (0 references)
target     prot opt source               destination         
ACCEPT     tcp  --  0.0.0.0/0            172.17.0.20          tcp dpt:23
ACCEPT     tcp  --  0.0.0.0/0            172.17.0.20          tcp dpt:21


-> univention-firewall stop

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (0 references)
target     prot opt source               destination 


-> univention-firewall start
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     tcp  --  0.0.0.0/0            172.17.0.20          tcp dpt:21
ACCEPT     tcp  --  0.0.0.0/0            172.17.0.20          tcp dpt:23

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER (0 references)
target     prot opt source               destination  

-> docker container restarted

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     tcp  --  0.0.0.0/0            172.17.0.20          tcp dpt:21
ACCEPT     tcp  --  0.0.0.0/0            172.17.0.20          tcp dpt:23

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER (0 references)
target     prot opt source               destination         
ACCEPT     tcp  --  0.0.0.0/0            172.17.0.21          tcp dpt:21
ACCEPT     tcp  --  0.0.0.0/0            172.17.0.21          tcp dpt:23
Comment 5 Daniel Tröder univentionstaff 2015-10-20 15:32:03 CEST
(IMO this kind of belongs to #38307, but I fixed it referencing this bug.)

I moved the Docker container rules to the DOCKER chain, so the Docker engine removes them when it shuts down containers.

Commit: 64626
Comment 6 Felix Botner univentionstaff 2015-10-28 17:36:39 CET
(In reply to Daniel Tröder from comment #5)
> (IMO this kind of belongs to #38307, but I fixed it referencing this bug.)
> 
> I moved the Docker container rules to the DOCKER chain, so the Docker engine
> removes them when it shuts down containers.
> 
> Commit: 64626

ok, reopend #38307

After stopping containers via /etc/init.d/docker stop, the id is written to $CONT_ID_FILE.
/etc/init.d/docker start starts all containers in $CONT_ID_FILE.

But as soon as i use "/etc/init.d/docker stop" once, the id exists in $CONT_ID_FILE and the container is always started during docker start, regardless if the container was stopped by docker or the app init script. Who is responsible for cleaning up $CONT_ID_FILE?
Comment 7 Stefan Gohmann univentionstaff 2015-10-29 14:27:26 CET
(In reply to Felix Botner from comment #6)
> But as soon as i use "/etc/init.d/docker stop" once, the id exists in
> $CONT_ID_FILE and the container is always started during docker start,
> regardless if the container was stopped by docker or the app init script.
> Who is responsible for cleaning up $CONT_ID_FILE?

The file is now cleaned after starting the container: r64976
Comment 8 Felix Botner univentionstaff 2015-10-29 14:29:49 CET
previous_containers_list_clean() {
    ehco -n > "$CONT_ID_FILE"
}

=> echo
Comment 9 Stefan Gohmann univentionstaff 2015-10-29 15:12:51 CET
(In reply to Felix Botner from comment #8)
> previous_containers_list_clean() {
>     ehco -n > "$CONT_ID_FILE"
> }
> 
> => echo

Yes: r64977
Comment 10 Felix Botner univentionstaff 2015-10-29 15:39:13 CET
OK, works fine
Comment 11 Stefan Gohmann univentionstaff 2015-11-17 12:12:35 CET
UCS 4.1 has been released:
 https://docs.software-univention.de/release-notes-4.1-0-en.html
 https://docs.software-univention.de/release-notes-4.1-0-de.html

If this error occurs again, please use "Clone This Bug".