Univention Bugzilla – Bug 39482
atd hangs using 100% CPU in docker
Last modified: 2015-11-30 19:43:26 CET
Whenever a docker.software-univention.de/ucs-appbox-amd64:4.1-0 is started, the atd process inside the container hangs using a full core.
I cannot reproduce this on any other host than 10.200.3.26. Atds man page: WARNING atd won't work if its spool directory is mounted via NFS even if no_root_squash is set. Of cause it's not, but maybe on that host there is something different about the storage that is not apparent to me. [..] Used "strace -f /usr/sbin/atd" to see what was going on. After some initialization this happens infinitively: stat(".", {st_mode=S_IFDIR|S_ISVTX|0770, st_size=4096, ...}) = 0 open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 getdents(4, /* 4 entries */, 32768) = 112 stat("a00001016f9298", {st_mode=S_IFREG|0700, st_size=767, ...}) = 0 unlink("=00001016f9298") = -1 ENOENT (No such file or directory) getdents(4, /* 0 entries */, 32768) = 0 close(4) = 0 The file "a00001016f9298" exits in /var/spool/cron/atjobs/ and is the create-dh-parameter-files.sh for in 1d, created by univention-mail-postfix. Why it is (unsuccessfully) trying to unlink "=00001016f9298" is not clear. There seemed to be an error in the base image. After deleting all images and installing a docker-app from scratch, the problem disappeared.
21885 daemon 20 0 16676 144 0 R 96,7 0,0 45:47.95 atd UCS 4.1 dudle-docker app installed
Maybe overlayfs is the problem?
The scheduling of the Postfix DH parameters regeneration for the day after the Docker _image_ generation didn't make much sense. It was moved to one day after the creation of the _container_. Commit: 65106 Packages: univention-mail-postfix, univention-appcenter Please rebuild the images that install univention-container-role-server-common, as that depends on univention-mail-postfix. The scripts installing those are: * /usr/share/univention-docker-dev/scripts/create-docker-ucs-appbox-image.sh * /usr/share/univention-docker-dev/scripts/create-docker-ucs-role-image.sh This leaves the question of the same problem for images for other virtualization open. IMHO we could completely remove the recreation of the 2048b DH parameters file. It is the one from the RFC and is considered safe. The immediate recreation of the 512b file could be done at package post-installation time, as it costs very little time/resources.
(In reply to Daniel Tröder from comment #4) > The scheduling of the Postfix DH parameters regeneration for the day after > the Docker _image_ generation didn't make much sense. It was moved to one > day after the creation of the _container_. The following line does not work as detection if the postinst is run within a container. The component ucs-container will be removed (soon). > if ! is_ucr_true repository/online/component/ucs-container; then The following might help detecting if run within a container. Maybe the at job itself has to test after 1 day, if the at job is run within a docker container and not on bare metal. container_uuid=$(grep -oe "docker.*" /proc/self/cgroup | head -1 | \ sed -r 's|docker.*([a-z0-9]{64}).*|\1|') if [ -n "$container_uuid" ]; then echo "RUNNING IN DOCKER CONTAINER" else echo "RUNNING ON 'BARE' METAL" fi → REOPEN
Alternatively the UCR variable docker/container/uuid should be set automatically during startup of the container. So if the variable is present, it should be a container.
The atd in the postinst can safely run inside a started container. It should just not run when creating the image. That is done in a chroot weeks or months ahead. Detection with docker/container/uuid and /proc/self/cgroup... do not work in the chroot, as it is not a started Docker container. The UCRV repository/online/component/ucs-container exists at that time in the chroot (at least that is, what I gather from reading create-docker-ucs-appbox-image.sh).
Regarding Comment 6: This is done already in /sbin/init but doesn't help here since the at-Job is currently created in a postinst-Script at the build-time of a container. I support the proposal of Comment 4: Removal of the scheduled automatic creation of individualized DH parameters. If people need that, they can run the creation script once manually. An SDB article should be created to explain the details. Proposal for a corresponding UCS 4.1-0 changelog entry: "The automatic generation of individual DH-Parameters has been disabled to avoid issues with pre-Installed images like Docker containers and Appliances. Individual DH-Parameters can be created manually at any time by running /usr/share/univention-mail-postfix/create-dh-parameter-files.sh once and restarting the affected services (e.g. postfix ). A forthcomming SDB article will explain the details" If this is not acceptable for anybody, my next proposal would be to move the creation of the at-job from the postinst to a joinscript. This would fix the issue for all docker containes that are not Master. For Master appliances we may want to disable running joinscripts at build-time anyway and postpone it to the time of instanciation.
(In reply to Arvid Requate from comment #8) > Regarding Comment 6: This is done already in /sbin/init but doesn't help > here since the at-Job is currently created in a postinst-Script at the > build-time of a container. I think, the at job should be set up in any case. But the at job itself should check if the at job is executed in an container environment and may behave differently in such a case. > I support the proposal of Comment 4: Removal of the scheduled automatic > creation of individualized DH parameters. If people need that, they can run > the creation script once manually. An SDB article should be created to > explain the details. Creating ones own DH2048 file avoids/impedes brute force attacks with precalculated rainbow tables. In the past the main problem was that every one used the hardcoded compiled-in dh file. I think, the dh file should be calculated automatically in the background after the instance has been instanciated. Does the installation of haveged on the host / container help creating the dh file? If not, I would suggest, that we precalculate a bunch of dh2048 files, ship them with univention-mail-postfix and select one randomly during postinst. The precalculated one is replaced by the at job later on. > If this is not acceptable for anybody, my next proposal would be to move the > creation of the at-job from the postinst to a joinscript. This would fix the > issue for all docker containes that are not Master. For Master appliances we > may want to disable running joinscripts at build-time anyway and postpone it > to the time of instanciation. I would second that. Aren't the join scripts skipped during DC master installation by the scripts itself (→ master not joined during join script execution)?
See Bug 38685#c4 and Bug 38685#c7 for the rationale of the arguments to use group 14 of RFC 3526. For additional reference https://weakdh.org/sysadmin.html says: "It is fine to leave diffie-hellman-group14-sha1, which uses a 2048-bit prime." Remember also, that currently only smaller than 1024bit keys have been shown to be precalculatable for *nation state* scale budgets. But, at the end of the day, I agree, it is always safer to create individual DH parameter groups per server. This should be done once for each server, not for each service individually. That is planned via Bug 39158. And yes, haveged helps, but that is not the issue of this bug. As far as I understood Daniel, *atd* is having a problem, regardless of what the script does. If we want to stick with automatic scheduling of DH parameter creation, then we should move the creation of the at-Job to the joinscript. I guess everybody would be happy with that?
(In reply to Arvid Requate from comment #10) > If we want to stick with automatic scheduling of DH parameter creation, then > we should move the creation of the at-Job to the joinscript. I guess > everybody would be happy with that? Yes, that would be fine.
* The scheduling was removed from the appcenter code. * The DH parameter generation is now scheduled in the join script. * 512 bit DH parameters are created in postinst as they are the only ones that are possibly weak and are be generated in about 1 second. Commit: 65166 4.1 changelog entry: 65168 + 65169
OK - atd in docker OK - DH parameters in univention-mail-postfix OK - changelog entry
UCS 4.1 has been released: https://docs.software-univention.de/release-notes-4.1-0-en.html https://docs.software-univention.de/release-notes-4.1-0-de.html If this error occurs again, please use "Clone This Bug".