Bug 37491 - Installing KVM during system setup breaks external DNS resolution
Installing KVM during system setup breaks external DNS resolution
Status: NEW
Product: UCS
Classification: Unclassified
Component: System setup
UCS 4.3
Other Linux
: P3 normal (vote)
: ---
Assigned To: UCS maintainers
:
Depends on: 36085
Blocks:
  Show dependency treegraph
 
Reported: 2015-01-08 22:12 CET by Michael Grandjean
Modified: 2019-01-18 12:02 CET (History)
5 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 6: Setup Problem: Issue for the setup process
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.137
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Grandjean univentionstaff 2015-01-08 22:12:06 CET
I just installed a fresh UCS 4.0-0 from DVD/ISO and specified '8.8.4.4' as nameserver in the installer (d-i). I then configured the system to become a DC Master. After the installation finished, I ended up with these DNS settings:

> dns/forwarder1: 8.8.4.4
> nameserver1: 10.200.30.22
> nameserver2: 8.8.4.4

I didn't expect that 8.8.4.4 would also be added as nameserver2, since there is little chance 8.8.4.4 will be able to resolve anything in my domain.
Comment 1 Philipp Hahn univentionstaff 2017-09-18 11:38:02 CEST
Task #6732 UCS Technical Training

I experienced this on all 6 setups:
- I Installed UCS-4.2-1  on Wednesday
- I upgrades successfully to UCS-4.2-2 on Thursday morning using updates.software-univention.de, so DNS was working than
- later I tried to setup a Windows-VM, which complained about missing "Internet Connection" - this was caused by the DC Master not doing external DNS resolution, as 172.1..0.1 was configures as UCRV "nameserver2", not as UCRV "dns/forwarder1".
- After running /usr/share/univention-server/univention-fix-ucr-dns manually NS was fixed by moving the external DNS to forwarder.

I was able to reproduce the bug after the training by setting up a new UCS-4.2-1 system:
- after a fresh install the external DNS server is configures as UCRV "nameserver2", not as UCRV "dns/forwarder1".

/var/log/univention/setup.log has this:
>=== 30_net/16forwarder (2017-09-15 18:11:56) ===
>__NAME__:30_net/16forwarder Setting external name servers
>Restarting bind9 Domain Name Server (DNS): Unknown DNS backend  failed!
>run-parts: executing /usr/lib/univention-system-setup/scripts/30_net/18proxy --network-only --appliance-mode
...
>Configure /usr/lib/univention-install/90univention-bind-post.inst
>2017-09-15 18:16:05.564326527+02:00 (in joinscript_init)
>Create dns/backend
>2017-09-15 18:16:06,090 INFO    __main__.ucr/ns   Found server 172.16.0.1 from UCRV nameserver1
>2017-09-15 18:16:36,106 WARNING __main__.val      Connection check to 172.16.0.1 (Timeout) failed, maybe down?!
>2017-09-15 18:16:36,106 INFO    __main__.val      Leaving it configured as nameserver anyway
>2017-09-15 18:16:36,106 INFO    __main__.xor      Skip removing nameservers from forwarders
>2017-09-15 18:16:36,110 INFO    __main__.ucr/self Default IP address configured in UCR: 172.16.1.50
>2017-09-15 18:16:36,110 INFO    __main__.ns       Skip adding NS
>2017-09-15 18:16:36,110 INFO    __main__.ldap     Skip adding master
>2017-09-15 18:16:36,111 INFO    __main__.ucr      Updating 'nameserver1': '172.16.0.1' -> '172.16.1.50'
>2017-09-15 18:16:36,111 INFO    __main__.ucr      Updating 'nameserver2': None -> '172.16.0.1'
>2017-09-15 18:16:36,333 INFO    __main__.ucr      Reloading BIND
>File: /etc/resolv.conf
>Restarting bind9 Domain Name Server (DNS): samba4 ldap proxy failed!
>invoke-rc.d: initscript bind9, action "restart" failed.
>Wait for bind9: .Restarting bind9 Domain Name Server (DNS): samba4 ldap proxy.
>done
>done
>Object modified: cn=default-settings,cn=dns,cn=dhcp,cn=policies,dc=schulung5-ucs,dc=intranet
>Object exists: cn=services,cn=univention,dc=schulung5-ucs,dc=intranet
>Object created: cn=DNS,cn=services,cn=univention,dc=schulung5-ucs,dc=intranet
>Object modified: cn=dc0,cn=dc,cn=computers,dc=schulung5-ucs,dc=intranet
>2017-09-15 18:16:51.729830310+02:00 (in joinscript_save_current_version)
...
>=== 90_postjoin/20upgrade (2017-09-15 18:17:21) ===
>__NAME__:90_postjoin/20upgrade Upgrading the system
>Setting repository/online
>File: /etc/apt/mirror.list
>File: /etc/apt/sources.list.d/15_ucs-online-version.list
>File: /etc/apt/sources.list.d/20_ucs-online-component.list
>__MSG__:This might take a while depending on the number of pending updates.
>Running upgrade on DC Master: univention-upgrade --noninteractive --updateto 4.2-99
>
>Starting univention-upgrade. Current UCS version is 4.2-1 errata52
>
>Checking for local repository:                          none
>The connection to the repository server failed: Configuration error: host is unresolvable. Please check the repository configuration and the network connection.
..
>=== DONE (2017-09-15 18:17:29) ===
...
>=== done (2017-09-15 18:17:38) ===

This only happens when "KVM" is selected during system setup, which configures the network bridge in the chroot environment, which breaks networking:
> $ ip r
> default via 172.16.1.1 dev eth0
> 172.16.1.0/24 dev eth0  proto kernel  scope link  src 172.16.1.50
> 172.16.1.0/24 dev br0  proto kernel  scope link  src 172.16.1.50

> $ ip a
> 2: eth0:
>    inet 172.16.1.50/24 ...
> 3: br0:
>    inet 172.16.1.50/24 ...

pinging 172.16.0.1 no longer works.

Looking in /var/log/univention/config-registry.replog shows this:
~2017-09-17 09:45:49  interfaces/eth0/* is configured
~2017-09-17 10:07:30  ucs-kvm-setup-bridge transferred the settings from eth0 to br0

Bug #36085 comment 3 (ucs-4.0-0@55526) moved the code for unsetting "interfaces/restart/auto" earlier, so the code now gets executed while still in the chroot environment.
It works when I set interfaces/restart/auto=no manually on the text console as soon as USS is started.

Short-term we should prevent "ucs-kvm-setup-bridge" from updating the interface until the next reboot.
Long-term we should make "interfaces/restart/auto=no" the default.

This also explains why non of out tests detected this, as we don't test nested virtualization in EC2!
Comment 2 Philipp Hahn univentionstaff 2018-01-30 17:46:22 CET
Task #9985 UCS Technical Training (again)
Comment 3 Philipp Hahn univentionstaff 2018-04-20 18:26:52 CEST
Task #10198 UCS Technical Training (again)
Comment 4 Michael Grandjean univentionstaff 2018-06-22 15:43:55 CEST
Task #10200 UCS Technical Training (again):

> dns/forwarder1: <empty>
> nameserver1: 172.16.1.10    <- UCS Master
> nameserver2: 172.16.0.1     <- Extneral DNS server, should be dns/forwarder1

Philipp pointed out the underlying reason for this in Comment 1.
The tl;dr is: 
Installing KVM during system setup breaks external DNS resolution.
Comment 5 Stefan Gohmann univentionstaff 2019-01-15 15:06:20 CET
Does this issue still happen?
Comment 6 Philipp Hahn univentionstaff 2019-01-17 18:09:39 CET
(In reply to Stefan Gohmann from comment #5)
> Does this issue still happen?

Yes: I just tried UCS-4.3-2

(In reply to Philipp Hahn from comment #1)
> > $ ip r
> > default via 172.16.1.1 dev eth0
                               ^^^^
> > 172.16.1.0/24 dev eth0  proto kernel  scope link  src 172.16.1.50
> > 172.16.1.0/24 dev br0  proto kernel  scope link  src 172.16.1.50

If I do "ip addr flush eth0" inside the chroot, the default route is removed, but after that I can ping the gateway fine.

"ifup -v br0" fails as that address is already configured:
> /bin/ip addr add 10.200.17.6/25.255.255.0 broadcast 10.200.17.255 dev br0 label br0
> RTNETLINK answers: File exists
respective
> /bin/ip route add default via 10.200.17.1 dev br0 onlink
> RTNETLINK answers: File exists

Test VM "pmhahn_bug37491" @ "utby" with snapshots available for further testing.
Comment 7 Erik Damrose univentionstaff 2019-01-18 09:50:46 CET
(In reply to Philipp Hahn from comment #6)
> (In reply to Stefan Gohmann from comment #5)
> > Does this issue still happen?
> 
> Yes: I just tried UCS-4.3-2

...


> If I do "ip addr flush eth0" inside the chroot, the default route is
> removed, but after that I can ping the gateway fine.

Is that not the issue that was fixed with 4.3-2e305, bug 47767? I am asking because you used a 4.3-2 install medium, for which the issue was identified. IMHO we need to re-check this with 4.3-3
Comment 8 Philipp Hahn univentionstaff 2019-01-18 12:02:19 CET
(In reply to Erik Damrose from comment #7)
> (In reply to Philipp Hahn from comment #6)
> > (In reply to Stefan Gohmann from comment #5)
> > > Does this issue still happen?
> > 
> > Yes: I just tried UCS-4.3-2
...
> Is that not the issue that was fixed with 4.3-2e305, bug 47767? I am asking
> because you used a 4.3-2 install medium, for which the issue was identified.
> IMHO we need to re-check this with 4.3-3

I already re-did the test with UCS-4.3-3 also and it still failed.
(I forgot to update the version number in comment #6)