Bug 56646 – dhchlient killed by UMC restart at end of USS - network connection lost

Bug 56646 - dhchlient killed by UMC restart at end of USS - network connection lost


Summary:	dhchlient killed by UMC restart at end of USS - network connection lost

Status:	NEW

Product:	UCS
Classification:	Unclassified
Component:	System setup
Version:	UCS 5.0
Hardware:	Other Linux

Importance:	P5 normal (vote)
Target Milestone:	---
Assigned To:	UCS maintainers
QA Contact:	UCS maintainers

URL:
Keywords:

Depends on:	53885
Blocks:
	Show dependency tree / graph

Reported:	2023-09-25 06:40 CEST by Philipp Hahn
Modified:	2023-09-25 10:00 CEST (History)
CC List:	1 user (show)

See Also:
What kind of report is it?:	Bug Report
What type of bug is this?:	6: Setup Problem: Issue for the setup process
Who will be affected by this bug?:	1: Will affect a very few installed domains
How will those affected feel about the bug?:	2: A Pain – users won’t like this once they notice it
User Pain:	0.069
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Philipp Hahn

2023-09-25 06:40:08 CEST

At the end of USS dhclient is no longer running: the VM looses its network connection when the DHCP lease expires.

- during the setup /usr/lib/univention-system-setup/scripts/30_net/11primary restarts the interface by setting UCRV `interfaces/primary` via `ucr.set`.

- this triggers `/etc/univention/templates/modules/interfaces.py` to `ifdown` / `ifup` the interface, which kills and restarts `dhclient`.

- As this is directly called from the UMC process, the newly forked process "dhclient" is then associated with the cgroup of UMC:

> 29353
> 10:pids:/system.slice/univention-management-console-server.service,
> 9:memory:/system.slice/univention-management-console-server.service,
> 7:devices:/system.slice/univention-management-console-server.service,
> 6:blkio:/system.slice/univention-management-console-server.service,
> 3:cpu,cpuacct:/system.slice/univention-management-console-server.service,
> 1:name=systemd:/system.slice/univention-management-console-server.service,
> 0::/system.slice/univention-management-console-server.service

- at the end of USS UMC is restarted and kills all its (pending) child processes:

> Sep 25 06:17:34 ucs-1671 systemd[1]: univention-management-console-server.service: Killing process 1130 (python3) with signal SIGKILL.
> Sep 25 06:17:34 ucs-1671 systemd[1]: univention-management-console-server.service: Killing process 29353 (dhclient) with signal SIGKILL.
> Sep 25 06:17:34 ucs-1671 systemd[1]: univention-management-console-server.service: Killing process 3048 (univention-cli-) with signal SIGKILL.
> Sep 25 06:17:34 ucs-1671 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=univention-management-console-server comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'


# systemctl cat univention-management-console-server.service
...
[Service]
KillMode=mixed


Previous workaround: All files `test/scenarios/**/*.cfg` did a `reboot` after USS, which started `dhclient` during to bootup from `networking.service`.

Nevertheless this is a "workaround" and hides the real bug:
- either USS should strongly suggest to reboot the host again after the initial setup
- or the code needs to be changed so that the `dhclient` process is not associated with USS itself.


Additional data collected from previous runs:

lastcomm:
dhclient         SF  X root     __         0.00 secs Sun Sep 24 18:13
dhclient               root     __         0.00 secs Sun Sep 24 18:13
dhclient         S     root     __         0.00 secs Sun Sep 24 18:13
dhclient         SF  X root     __         0.01 secs Sun Sep 24 17:03


strace:
18:13:49.802045 +++ killed by SIGTERM +++
18:20:02.270587 +++ killed by SIGKILL +++


/var/log/univention/setup.log
=== 30_net/11primary (2023-09-24 18:13:48) ===
__NAME__:30_net/11primary Setting primary network interface
__MSG__:Setting primary interface to: enp1s0
Killed old client process
DHCPRELEASE of 192.168.122.83 on enp1s0 to 192.168.122.1 port 67
...
=== done (2023-09-24 18:20:04) ===