At the end of USS dhclient is no longer running: the VM looses its network connection when the DHCP lease expires. - during the setup /usr/lib/univention-system-setup/scripts/30_net/11primary restarts the interface by setting UCRV `interfaces/primary` via `ucr.set`. - this triggers `/etc/univention/templates/modules/interfaces.py` to `ifdown` / `ifup` the interface, which kills and restarts `dhclient`. - As this is directly called from the UMC process, the newly forked process "dhclient" is then associated with the cgroup of UMC: > 29353 > 10:pids:/system.slice/univention-management-console-server.service, > 9:memory:/system.slice/univention-management-console-server.service, > 7:devices:/system.slice/univention-management-console-server.service, > 6:blkio:/system.slice/univention-management-console-server.service, > 3:cpu,cpuacct:/system.slice/univention-management-console-server.service, > 1:name=systemd:/system.slice/univention-management-console-server.service, > 0::/system.slice/univention-management-console-server.service - at the end of USS UMC is restarted and kills all its (pending) child processes: > Sep 25 06:17:34 ucs-1671 systemd[1]: univention-management-console-server.service: Killing process 1130 (python3) with signal SIGKILL. > Sep 25 06:17:34 ucs-1671 systemd[1]: univention-management-console-server.service: Killing process 29353 (dhclient) with signal SIGKILL. > Sep 25 06:17:34 ucs-1671 systemd[1]: univention-management-console-server.service: Killing process 3048 (univention-cli-) with signal SIGKILL. > Sep 25 06:17:34 ucs-1671 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=univention-management-console-server comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' # systemctl cat univention-management-console-server.service ... [Service] KillMode=mixed Previous workaround: All files `test/scenarios/**/*.cfg` did a `reboot` after USS, which started `dhclient` during to bootup from `networking.service`. Nevertheless this is a "workaround" and hides the real bug: - either USS should strongly suggest to reboot the host again after the initial setup - or the code needs to be changed so that the `dhclient` process is not associated with USS itself. Additional data collected from previous runs: lastcomm: dhclient SF X root __ 0.00 secs Sun Sep 24 18:13 dhclient root __ 0.00 secs Sun Sep 24 18:13 dhclient S root __ 0.00 secs Sun Sep 24 18:13 dhclient SF X root __ 0.01 secs Sun Sep 24 17:03 strace: 18:13:49.802045 +++ killed by SIGTERM +++ 18:20:02.270587 +++ killed by SIGKILL +++ /var/log/univention/setup.log === 30_net/11primary (2023-09-24 18:13:48) === __NAME__:30_net/11primary Setting primary network interface __MSG__:Setting primary interface to: enp1s0 Killed old client process DHCPRELEASE of 192.168.122.83 on enp1s0 to 192.168.122.1 port 67 ... === done (2023-09-24 18:20:04) ===
This is still an Issue especially within our OpenStack environment. See Helpdesk-Ticket https://helpdesk.knut.univention.de/#ticket/zoom/7097 for further info.
This problem happens as well if the IP network config is altered in the UMC network settings module and DHCP is in use. As long as the UMC module is open, the dhclient process exists as well. When the UMC module is closed in the web frontend or the idle timeout terminates itself, the dhclient process is killed as well. The automatic interface configuration performed in UCR should make sure that the commands are running in a seperate control group (cgroup). So all the "ifup"/"ifdown" handling is affected and maybe more! Example for cgroup assignment: # apt install cgroup-tools → package provides the following commands # cgcreate -g memory:/run-by-UCR → creates a new cgroup "run-by-UCR" # cgexec -g memory:/run-by-UCR ifup -a → runs the command "ifup -a" in the new cgroup Hint: "ifup -a" will start a dhclient process und DHCP is configured for an interface.
Example: --- base/univention-base-files/conffiles/interfaces.py +++ base/univention-base-files/conffiles/interfaces.py @@ -33,20 +33,21 @@ def _common(ucr, changes, command): for key in changes.keys(): if key in SKIP: continue match = RE_IFACE.match(key) if not match: continue iface, _subkey, _ipv6_name = match.groups() interfaces.add(iface.replace('_', ':')) # Shutdown changed interfaces + call(('cgcreate', '-g', 'memory:/run-by-UCR')) for iface in interfaces: - call((command, iface)) + call(('cgexec', '-g', 'memory:/run-by-UCR', command, iface)) def preinst(ucr, changes): """Pre run handler to shutdown changed interfaces.""" _common(ucr, changes, 'ifdown') def postinst(ucr, changes): """Post run handler to start changed interfaces."""
This might affect test customers that have freshly set up their test environment with DHCP and when the lease of their DHCP server expires, the env is without working network. But they are affected only once and after a reboot the problem is more or less gone forever. It just doesn't leave a good impression if the environment is no longer accessible just 2 hours[*] after installation. [*] Depends on the lease time of the DHCP server (Fritzbox, Cisco, ...)
I was curious what the debian + systemd way would be :) ``` # Restart all devices that are marked "auto ..." e.g. "auto eth0" in /etc/network/interfaces systemctl restart networking # Restart a single device, must be marked as "allow-hotplug ..." e.g. "allow-hotplug eth0" in /etc/network/interfaces. Missing in ucs! systemctl stop ifup@eth0 && systemctl start ifup@eth0" ``` Warning: systemctl restart ifup@eth0, only works if the interface is only managed through the systemd service. Otherwise, systemd might not call ifdown on a restart, because it assumes the service is already stopped :/