Bug 47767 - UCS Appliance (vmware, virtualbox) looses network connection during the setup
UCS Appliance (vmware, virtualbox) looses network connection during the setup
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: System setup
UCS 4.3
Other Linux
: P5 normal (vote)
: UCS 4.3-2-errata
Assigned To: Arvid Requate
Felix Botner
:
: 47995 (view as bug list)
Depends on: 26338 28670 42022
Blocks: 47943
  Show dependency treegraph
 
Reported: 2018-09-07 16:08 CEST by Felix Botner
Modified: 2018-11-07 14:33 CET (History)
5 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 6: Setup Problem: Issue for the setup process
Who will be affected by this bug?: 3: Will affect average number of installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.309
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Botner univentionstaff 2018-09-07 16:08:42 CEST
During the UMC setup the network interface are restarted (a) after the network settings (b) after starting the configuration.

After the second restart all interfaces are down and the machine looses its network connection (and the join after the setup fails).

interface restart, see -> umc/python/setup/netconf/modules/RestartAllInterfaces.py
down: ifdown --all --exclude lo
up: ifup --all



ip addr before starting the configuration:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether 08:00:27:80:00:99 brd ff:ff:ff:ff:ff:ff
    inet 169.254.220.217/16 brd 169.254.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.200.7.192/24 brd 10.200.7.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe80:99/64 scope link 
       valid_lft forever preferred_lft forever

setup.log:
INFO:uss.network.phase.ResolvConv:Committing /etc/resolv.conf...
INFO:uss.network.plug:Calling RestartAllInterfaces.post() at 50...
File: /etc/dhcp/dhclient.conf
RTNETLINK answers: File exists
ifup: failed to bring up eth0
File: /etc/dhcp/dhclient.conf
RTNETLINK answers: File exists
ifup: failed to bring up eth0:0


in my test machine (setup aborted after failed join)

-> ifup --all
RTNETLINK answer: File exists
ifup: failed to bring up eth0

-> ip addr
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 08:00:27:80:00:99 brd ff:ff:ff:ff:ff:ff
    inet 169.254.220.217/16 brd 169.254.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.200.7.192/24 brd 10.200.7.255 scope global eth0
       valid_lft forever preferred_lft forever

-> ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.42.1  netmask 255.255.0.0  broadcast 0.0.0.0
        ether 02:42:db:f7:91:af  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1  (Lokale Schleife)
        RX packets 3722  bytes 3616931 (3.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3722  bytes 3616931 (3.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

What seems to help is

-> ip addr flush dev eth0

After that ifup works.

see https://unix.stackexchange.com/questions/100588/using-ip-addr-instead-of-ifconfig-reports-rtnetlink-answers-file-exists-on-de
Comment 1 Erik Damrose univentionstaff 2018-09-10 17:30:51 CEST
In my test it worked when downgrading to the previous Kernel (4.9.0-6); univention-kernel-image=11.0.1-7A~4.3.0.201803021350

As a first step we could rebuild the appliances with the older kernel
Comment 2 Erik Damrose univentionstaff 2018-09-11 10:44:46 CEST
My network setup with the old kernel was wrong. It does _not_ work with the 4.3-2 appliance, downgraded to use the 4.9.0-6 kernel!
Comment 3 Philipp Hahn univentionstaff 2018-09-11 11:40:17 CEST
Probably Bug #42153 related: The old "virtual alias" syntax is deprecated since 2009 and nobody should use it anymore - it probably receives very little to no testing therefore.
<https://wiki.debian.org/NetworkConfiguration#Legacy_method>

Also see Bug #36532 comment 20.

Maybe fix the USS hack to do it like documented in 'dhclient.conf -> "alias"'?
Comment 4 Hendrik Peter univentionstaff 2018-10-25 16:20:31 CEST
*** Bug 47995 has been marked as a duplicate of this bug. ***
Comment 5 Arvid Requate univentionstaff 2018-10-29 16:32:28 CET
Quoting the original Bug Description:

>    inet 169.254.220.217/16 brd 169.254.255.255 scope global eth0
>       valid_lft forever preferred_lft forever
>    inet 10.200.7.192/24 brd 10.200.7.255 scope global eth0
>       valid_lft forever preferred_lft forever


The "RTNETLINK answers: File exists" happens because "eth0" is still configured for the old address, when it should actually be re-configured as "eth0:0". There is an "ip adrr flush" in FlushOldAddresses but it was explicitly skipped in appliance mode. No clue why this skip was added during the conversion from shell to python (Bug #28670).


But this only hid a more severe regression:
The join check introduced by Bug #42022 set server/role (invisible in config_replog) and this should only be done temporarily, otherwise 05_role/10role comes to the conclusion that no packages need to be installed.
The was especially nasty to debug as the code used the ucr save() method from univention.config_registry.backend, which doesn't log anything in config_registry.replog.


7c2057fed5 | Revert temporary adjustment of server/role,
             must not be set before 05_role/10role
31d208400e | Run FlushOldAddresses pre() also in appliance mode
4e0e96ab05 | Advisory
Comment 6 Arvid Requate univentionstaff 2018-10-29 16:33:52 CET
Once this bug has passed QA successfully, we need to rebuild the appliances. Maybe we need a new bug for that?
Comment 7 Felix Botner univentionstaff 2018-11-01 10:36:52 CET
(In reply to Arvid Requate from comment #6)
> Once this bug has passed QA successfully, we need to rebuild the appliances.
> Maybe we need a new bug for that?

we update the appliance with Bug #47943
Comment 8 Felix Botner univentionstaff 2018-11-01 15:28:42 CET
OK - UCRV server/role
OK - network setup

OK - yaml
Comment 9 Arvid Requate univentionstaff 2018-11-07 14:33:44 CET
<http://errata.software-univention.de/ucs/4.3/305.html>