Bug 56646 - dhclient killed by UMC restart at end of USS - network connection lost
Summary: dhclient killed by UMC restart at end of USS - network connection lost
Status: NEW
Alias: None
Product: UCS
Classification: Unclassified
Component: System setup
Version: UCS 5.2
Hardware: Other Linux
: P5 normal
Target Milestone: ---
Assignee: UCS maintainers
QA Contact: UCS maintainers
URL:
Keywords:
Depends on: 53885
Blocks:
  Show dependency treegraph
 
Reported: 2023-09-25 06:40 CEST by Philipp Hahn
Modified: 2025-06-25 14:43 CEST (History)
3 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 6: Setup Problem: Issue for the setup process
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.171
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Customer ID:
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2023-09-25 06:40:08 CEST
At the end of USS dhclient is no longer running: the VM looses its network connection when the DHCP lease expires.

- during the setup /usr/lib/univention-system-setup/scripts/30_net/11primary restarts the interface by setting UCRV `interfaces/primary` via `ucr.set`.

- this triggers `/etc/univention/templates/modules/interfaces.py` to `ifdown` / `ifup` the interface, which kills and restarts `dhclient`.

- As this is directly called from the UMC process, the newly forked process "dhclient" is then associated with the cgroup of UMC:

> 29353
> 10:pids:/system.slice/univention-management-console-server.service,
> 9:memory:/system.slice/univention-management-console-server.service,
> 7:devices:/system.slice/univention-management-console-server.service,
> 6:blkio:/system.slice/univention-management-console-server.service,
> 3:cpu,cpuacct:/system.slice/univention-management-console-server.service,
> 1:name=systemd:/system.slice/univention-management-console-server.service,
> 0::/system.slice/univention-management-console-server.service

- at the end of USS UMC is restarted and kills all its (pending) child processes:

> Sep 25 06:17:34 ucs-1671 systemd[1]: univention-management-console-server.service: Killing process 1130 (python3) with signal SIGKILL.
> Sep 25 06:17:34 ucs-1671 systemd[1]: univention-management-console-server.service: Killing process 29353 (dhclient) with signal SIGKILL.
> Sep 25 06:17:34 ucs-1671 systemd[1]: univention-management-console-server.service: Killing process 3048 (univention-cli-) with signal SIGKILL.
> Sep 25 06:17:34 ucs-1671 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=univention-management-console-server comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'


# systemctl cat univention-management-console-server.service
...
[Service]
KillMode=mixed


Previous workaround: All files `test/scenarios/**/*.cfg` did a `reboot` after USS, which started `dhclient` during to bootup from `networking.service`.

Nevertheless this is a "workaround" and hides the real bug:
- either USS should strongly suggest to reboot the host again after the initial setup
- or the code needs to be changed so that the `dhclient` process is not associated with USS itself.


Additional data collected from previous runs:

lastcomm:
dhclient         SF  X root     __         0.00 secs Sun Sep 24 18:13
dhclient               root     __         0.00 secs Sun Sep 24 18:13
dhclient         S     root     __         0.00 secs Sun Sep 24 18:13
dhclient         SF  X root     __         0.01 secs Sun Sep 24 17:03


strace:
18:13:49.802045 +++ killed by SIGTERM +++
18:20:02.270587 +++ killed by SIGKILL +++


/var/log/univention/setup.log
=== 30_net/11primary (2023-09-24 18:13:48) ===
__NAME__:30_net/11primary Setting primary network interface
__MSG__:Setting primary interface to: enp1s0
Killed old client process
DHCPRELEASE of 192.168.122.83 on enp1s0 to 192.168.122.1 port 67
...
=== done (2023-09-24 18:20:04) ===
Comment 1 Nico Stöckigt univentionstaff 2025-06-19 15:33:31 CEST
This is still an Issue especially within our OpenStack environment.

See Helpdesk-Ticket https://helpdesk.knut.univention.de/#ticket/zoom/7097 for further info.
Comment 2 Sönke Schwardt-Krummrich univentionstaff 2025-06-25 12:11:32 CEST
This problem happens as well if the IP network config is altered in the UMC network settings module and DHCP is in use. As long as the UMC module is open, the dhclient process exists as well. When the UMC module is closed in the web frontend or the idle timeout terminates itself, the dhclient process is killed as well.

The automatic interface configuration performed in UCR should make sure that the commands are running in a seperate control group (cgroup). So all the "ifup"/"ifdown" handling is affected and maybe more!

Example for cgroup assignment:

# apt install cgroup-tools  →  package provides the following commands
# cgcreate -g memory:/run-by-UCR  →  creates a new cgroup "run-by-UCR"
# cgexec -g memory:/run-by-UCR ifup -a  →  runs the command "ifup -a" in the new cgroup

Hint: "ifup -a" will start a dhclient process und DHCP is configured for an interface.
Comment 3 Sönke Schwardt-Krummrich univentionstaff 2025-06-25 12:16:51 CEST
Example:

--- base/univention-base-files/conffiles/interfaces.py
+++ base/univention-base-files/conffiles/interfaces.py
@@ -33,20 +33,21 @@ def _common(ucr, changes, command):
         for key in changes.keys():
             if key in SKIP:
                 continue
             match = RE_IFACE.match(key)
             if not match:
                 continue
             iface, _subkey, _ipv6_name = match.groups()
             interfaces.add(iface.replace('_', ':'))
     # Shutdown changed interfaces
+    call(('cgcreate', '-g', 'memory:/run-by-UCR'))
     for iface in interfaces:
-        call((command, iface))
+        call(('cgexec', '-g', 'memory:/run-by-UCR', command, iface))
 
 
 def preinst(ucr, changes):
     """Pre run handler to shutdown changed interfaces."""
     _common(ucr, changes, 'ifdown')
 
 
 def postinst(ucr, changes):
     """Post run handler to start changed interfaces."""
Comment 4 Sönke Schwardt-Krummrich univentionstaff 2025-06-25 14:00:30 CEST
This might affect test customers that have freshly set up their test environment with DHCP and when the lease of their DHCP server expires, the env is without working network. But they are affected only once and after a reboot the problem is more or less gone forever. It just doesn't leave a good impression if the environment is no longer accessible just 2 hours[*] after installation.

[*] Depends on the lease time of the DHCP server (Fritzbox, Cisco, ...)
Comment 5 Jürn Brodersen univentionstaff 2025-06-25 14:43:30 CEST
I was curious what the debian + systemd way would be :)

```
# Restart all devices that are marked "auto ..." e.g. "auto eth0" in /etc/network/interfaces
systemctl restart networking

# Restart a single device, must be marked as "allow-hotplug ..." e.g. "allow-hotplug eth0" in /etc/network/interfaces. Missing in ucs!
systemctl stop ifup@eth0 && systemctl start ifup@eth0"
```

Warning: systemctl restart ifup@eth0, only works if the interface is only managed through the systemd service. Otherwise, systemd might not call ifdown on a restart, because it assumes the service is already stopped :/