Bug 37491 - Installing KVM during system setup breaks external DNS resolution
Installing KVM during system setup breaks external DNS resolution
Status: RESOLVED WONTFIX
Product: UCS
Classification: Unclassified
Component: System setup
UCS 4.3
Other Linux
: P3 normal (vote)
: ---
Assigned To: UCS maintainers
:
Depends on: 36085
Blocks:
  Show dependency treegraph
 
Reported: 2015-01-08 22:12 CET by Michael Grandjean
Modified: 2021-05-14 16:34 CEST (History)
5 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 6: Setup Problem: Issue for the setup process
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.137
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Grandjean univentionstaff 2015-01-08 22:12:06 CET
I just installed a fresh UCS 4.0-0 from DVD/ISO and specified '8.8.4.4' as nameserver in the installer (d-i). I then configured the system to become a DC Master. After the installation finished, I ended up with these DNS settings:

> dns/forwarder1: 8.8.4.4
> nameserver1: 10.200.30.22
> nameserver2: 8.8.4.4

I didn't expect that 8.8.4.4 would also be added as nameserver2, since there is little chance 8.8.4.4 will be able to resolve anything in my domain.
Comment 1 Philipp Hahn univentionstaff 2017-09-18 11:38:02 CEST
Task #6732 UCS Technical Training

I experienced this on all 6 setups:
- I Installed UCS-4.2-1  on Wednesday
- I upgrades successfully to UCS-4.2-2 on Thursday morning using updates.software-univention.de, so DNS was working than
- later I tried to setup a Windows-VM, which complained about missing "Internet Connection" - this was caused by the DC Master not doing external DNS resolution, as 172.1..0.1 was configures as UCRV "nameserver2", not as UCRV "dns/forwarder1".
- After running /usr/share/univention-server/univention-fix-ucr-dns manually NS was fixed by moving the external DNS to forwarder.

I was able to reproduce the bug after the training by setting up a new UCS-4.2-1 system:
- after a fresh install the external DNS server is configures as UCRV "nameserver2", not as UCRV "dns/forwarder1".

/var/log/univention/setup.log has this:
>=== 30_net/16forwarder (2017-09-15 18:11:56) ===
>__NAME__:30_net/16forwarder Setting external name servers
>Restarting bind9 Domain Name Server (DNS): Unknown DNS backend  failed!
>run-parts: executing /usr/lib/univention-system-setup/scripts/30_net/18proxy --network-only --appliance-mode
...
>Configure /usr/lib/univention-install/90univention-bind-post.inst
>2017-09-15 18:16:05.564326527+02:00 (in joinscript_init)
>Create dns/backend
>2017-09-15 18:16:06,090 INFO    __main__.ucr/ns   Found server 172.16.0.1 from UCRV nameserver1
>2017-09-15 18:16:36,106 WARNING __main__.val      Connection check to 172.16.0.1 (Timeout) failed, maybe down?!
>2017-09-15 18:16:36,106 INFO    __main__.val      Leaving it configured as nameserver anyway
>2017-09-15 18:16:36,106 INFO    __main__.xor      Skip removing nameservers from forwarders
>2017-09-15 18:16:36,110 INFO    __main__.ucr/self Default IP address configured in UCR: 172.16.1.50
>2017-09-15 18:16:36,110 INFO    __main__.ns       Skip adding NS
>2017-09-15 18:16:36,110 INFO    __main__.ldap     Skip adding master
>2017-09-15 18:16:36,111 INFO    __main__.ucr      Updating 'nameserver1': '172.16.0.1' -> '172.16.1.50'
>2017-09-15 18:16:36,111 INFO    __main__.ucr      Updating 'nameserver2': None -> '172.16.0.1'
>2017-09-15 18:16:36,333 INFO    __main__.ucr      Reloading BIND
>File: /etc/resolv.conf
>Restarting bind9 Domain Name Server (DNS): samba4 ldap proxy failed!
>invoke-rc.d: initscript bind9, action "restart" failed.
>Wait for bind9: .Restarting bind9 Domain Name Server (DNS): samba4 ldap proxy.
>done
>done
>Object modified: cn=default-settings,cn=dns,cn=dhcp,cn=policies,dc=schulung5-ucs,dc=intranet
>Object exists: cn=services,cn=univention,dc=schulung5-ucs,dc=intranet
>Object created: cn=DNS,cn=services,cn=univention,dc=schulung5-ucs,dc=intranet
>Object modified: cn=dc0,cn=dc,cn=computers,dc=schulung5-ucs,dc=intranet
>2017-09-15 18:16:51.729830310+02:00 (in joinscript_save_current_version)
...
>=== 90_postjoin/20upgrade (2017-09-15 18:17:21) ===
>__NAME__:90_postjoin/20upgrade Upgrading the system
>Setting repository/online
>File: /etc/apt/mirror.list
>File: /etc/apt/sources.list.d/15_ucs-online-version.list
>File: /etc/apt/sources.list.d/20_ucs-online-component.list
>__MSG__:This might take a while depending on the number of pending updates.
>Running upgrade on DC Master: univention-upgrade --noninteractive --updateto 4.2-99
>
>Starting univention-upgrade. Current UCS version is 4.2-1 errata52
>
>Checking for local repository:                          none
>The connection to the repository server failed: Configuration error: host is unresolvable. Please check the repository configuration and the network connection.
..
>=== DONE (2017-09-15 18:17:29) ===
...
>=== done (2017-09-15 18:17:38) ===

This only happens when "KVM" is selected during system setup, which configures the network bridge in the chroot environment, which breaks networking:
> $ ip r
> default via 172.16.1.1 dev eth0
> 172.16.1.0/24 dev eth0  proto kernel  scope link  src 172.16.1.50
> 172.16.1.0/24 dev br0  proto kernel  scope link  src 172.16.1.50

> $ ip a
> 2: eth0:
>    inet 172.16.1.50/24 ...
> 3: br0:
>    inet 172.16.1.50/24 ...

pinging 172.16.0.1 no longer works.

Looking in /var/log/univention/config-registry.replog shows this:
~2017-09-17 09:45:49  interfaces/eth0/* is configured
~2017-09-17 10:07:30  ucs-kvm-setup-bridge transferred the settings from eth0 to br0

Bug #36085 comment 3 (ucs-4.0-0@55526) moved the code for unsetting "interfaces/restart/auto" earlier, so the code now gets executed while still in the chroot environment.
It works when I set interfaces/restart/auto=no manually on the text console as soon as USS is started.

Short-term we should prevent "ucs-kvm-setup-bridge" from updating the interface until the next reboot.
Long-term we should make "interfaces/restart/auto=no" the default.

This also explains why non of out tests detected this, as we don't test nested virtualization in EC2!
Comment 2 Philipp Hahn univentionstaff 2018-01-30 17:46:22 CET
Task #9985 UCS Technical Training (again)
Comment 3 Philipp Hahn univentionstaff 2018-04-20 18:26:52 CEST
Task #10198 UCS Technical Training (again)
Comment 4 Michael Grandjean univentionstaff 2018-06-22 15:43:55 CEST
Task #10200 UCS Technical Training (again):

> dns/forwarder1: <empty>
> nameserver1: 172.16.1.10    <- UCS Master
> nameserver2: 172.16.0.1     <- Extneral DNS server, should be dns/forwarder1

Philipp pointed out the underlying reason for this in Comment 1.
The tl;dr is: 
Installing KVM during system setup breaks external DNS resolution.
Comment 5 Stefan Gohmann univentionstaff 2019-01-15 15:06:20 CET
Does this issue still happen?
Comment 6 Philipp Hahn univentionstaff 2019-01-17 18:09:39 CET
(In reply to Stefan Gohmann from comment #5)
> Does this issue still happen?

Yes: I just tried UCS-4.3-2

(In reply to Philipp Hahn from comment #1)
> > $ ip r
> > default via 172.16.1.1 dev eth0
                               ^^^^
> > 172.16.1.0/24 dev eth0  proto kernel  scope link  src 172.16.1.50
> > 172.16.1.0/24 dev br0  proto kernel  scope link  src 172.16.1.50

If I do "ip addr flush eth0" inside the chroot, the default route is removed, but after that I can ping the gateway fine.

"ifup -v br0" fails as that address is already configured:
> /bin/ip addr add 10.200.17.6/25.255.255.0 broadcast 10.200.17.255 dev br0 label br0
> RTNETLINK answers: File exists
respective
> /bin/ip route add default via 10.200.17.1 dev br0 onlink
> RTNETLINK answers: File exists

Test VM "pmhahn_bug37491" @ "utby" with snapshots available for further testing.
Comment 7 Erik Damrose univentionstaff 2019-01-18 09:50:46 CET
(In reply to Philipp Hahn from comment #6)
> (In reply to Stefan Gohmann from comment #5)
> > Does this issue still happen?
> 
> Yes: I just tried UCS-4.3-2

...


> If I do "ip addr flush eth0" inside the chroot, the default route is
> removed, but after that I can ping the gateway fine.

Is that not the issue that was fixed with 4.3-2e305, bug 47767? I am asking because you used a 4.3-2 install medium, for which the issue was identified. IMHO we need to re-check this with 4.3-3
Comment 8 Philipp Hahn univentionstaff 2019-01-18 12:02:19 CET
(In reply to Erik Damrose from comment #7)
> (In reply to Philipp Hahn from comment #6)
> > (In reply to Stefan Gohmann from comment #5)
> > > Does this issue still happen?
> > 
> > Yes: I just tried UCS-4.3-2
...
> Is that not the issue that was fixed with 4.3-2e305, bug 47767? I am asking
> because you used a 4.3-2 install medium, for which the issue was identified.
> IMHO we need to re-check this with 4.3-3

I already re-did the test with UCS-4.3-3 also and it still failed.
(I forgot to update the version number in comment #6)
Comment 9 Philipp Hahn univentionstaff 2019-08-07 09:44:55 CEST
Again at 4 customer environments: As soon as uvmm-node-kvm gets installed during the PXE installation, the networks afterwards is broken. This especially breaks un-setting the "reinstall" option on the computer account at the end of the PXE installation.

Also there is no option to really disable the bridge creation:
- If ucrv:uvmm/kvm/bridge/autostart is set to 'yes', debian/univention-virtual-machine-manager-node-kvm.init creates 'eth0' as a bridge with the original 'eth0' being renamed to 'peth0'.
- If ucrv:uvmm/kvm/bridge/autostart is set to 'no' or 'manually', debian/univention-virtual-machine-manager-node-kvm.postins creates the 'br0' bridge with 'eth0' enslaved through UCRVs.
- There is no option to disable both mechanisms, which is required at the customer site as they have several bridges to setup and the interface names are not stable (8-12 interfaces to different networks).

I had to divert the script to get the work done:
  dpkg-divert --local --rename --divert /usr/lib/univention-virtual-machine-manager-node-kvm/ucs-kvm-setup-bridge.XXX --add /usr/lib/univention-virtual-machine-manager-node-kvm/ucs-kvm-setup-bridge

That was further complicated by the fact that loading the "bridge" Linux kernel module failed due to Bug #48123
Comment 10 Ingo Steuwer univentionstaff 2021-05-14 15:42:48 CEST
This issue has been filed against UCS 4.3.

UCS 4.3 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed.

If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.