Bug 47585 - univention-register-network-address may block UCS boot for some minutes when using DHCP
univention-register-network-address may block UCS boot for some minutes when ...
Status: RESOLVED WONTFIX
Product: UCS
Classification: Unclassified
Component: Network
UCS 4.3
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-08-13 21:12 CEST by Erik Damrose
Modified: 2022-08-01 19:06 CEST (History)
2 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.114
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional): Appliance
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Erik Damrose univentionstaff 2018-08-13 21:12:41 CEST
While analyzing a very slow UCS boot, i noticed univention-register-network-address blocks the boot a very long time, in my case almost 3 minutes.

I was using a UCS with network devices configured to DHCP in system setup. DHCP was available for the system, so it is not a dhcp timeout issue.

Symptoms: UCS welcome-screen is shown, but no console available. Canceling the welcome screen shows "Loading, please wait...". No tty or ssh or UMC login is possible. This is a bad user experience, the welcome-screen makes is look like UCS is finished with booting.

What is taking so long?
# systemd-analyze blame var-lib-docker-overlay.mount
    2min 45.767s networking.service
          7.817s univention-network-common.service

Why does networking.service need almost 3 minutes?
journalctl -u networking.service
Aug 13 20:07:31 ucs-5392 dhclient[560]: DHCPACK of 10.0.2.15 from 10.0.2.2
Aug 13 20:07:33 ucs-5392 ifup[506]: File: /etc/resolv.conf
Aug 13 20:07:33 ucs-5392 ifup[506]: bound to 10.0.2.15 -- renewal in 38190 seconds.
Aug 13 20:07:34 ucs-5392 ifup[506]: File: /etc/resolv.conf
Aug 13 20:09:43 ucs-5392 ifup[506]: ERROR: IP registration for enp0s3 failed with code 1
Aug 13 20:09:43 ucs-5392 ifup[506]: ERROR: IP registration for enp0s8 failed with code 1

The ERROR: message is from base/univention-network-manager/univention-register-network-address. It tries to register the IP addresses in LDAP to keep it up-to-date by issuing a umc-command. u-register-network-address is called by base/univention-network-manager/etc/network/if-up.d/90_dns_update.

Problem: slapd is not running at this time during boot, the LDAP is not updated at all. The long wait time is due to the http timeout.
Comment 1 Florian Best univentionstaff 2020-07-06 12:58:49 CEST
(In reply to Erik Damrose from comment #0)
> The long wait time is due to the http timeout.
There is no HTTP involved in univention-register-network-address because it connected directly to the UMC-Server via port 6670.

This changed in Bug #42128 so that HTTP is used. The timeouts there might be shorter than socket.getdefaulttimeout().
Comment 2 Philipp Hahn univentionstaff 2021-03-25 19:35:35 CET
(In reply to Florian Best from comment #1)
> (In reply to Erik Damrose from comment #0)
> > The long wait time is due to the http timeout.
> There is no HTTP involved in univention-register-network-address because it
> connected directly to the UMC-Server via port 6670.
> 
> This changed in Bug #42128 so that HTTP is used. The timeouts there might be
> shorter than socket.getdefaulttimeout().

I stumbled over this behavior while working on Bug #52959, which has a fix for this: It moves "univention-network-common.service" **after** "apache2.service" (this is a MUST for the "Primary node". For all other roles this is not required, but systemd has no way to easily express this; it would require an dynamically generated overwrite file, which inserts that "After=apache2.service" only for the Primary).

As registering the IP as mostly optional, delaying it a little bit should not hurt.
Comment 3 Philipp Hahn univentionstaff 2021-03-26 06:40:58 CET
(In reply to Florian Best from comment #1)
> (In reply to Erik Damrose from comment #0)
> > The long wait time is due to the http timeout.
> There is no HTTP involved in univention-register-network-address because it
> connected directly to the UMC-Server via port 6670.
> 
> This changed in Bug #42128 so that HTTP is used. The timeouts there might be
> shorter than socket.getdefaulttimeout().

The change to HTTP was not followed in base/univention-network-manager/etc/network/if-up.d/90_dns_updatem, which still contained this code:
-if /bin/netcat -q0 -w1 "$ldap_master" 6670 </dev/null; then

It was only checking that UMC-server was running, but the new code also needs Apache to be running as the new code uses https://.

My new implementation proposes this:
+save_to_umc () {
+       [ -s /etc/machine.secret ] &&
+               have univention-register-network-address &&
+               netcat -q0 -w1 "$(ucr get ldap/master)" 443 </dev/null &&
+               timeout 10 univention-register-network-address --interface "$interface"
Comment 4 Philipp Hahn univentionstaff 2021-03-27 07:27:00 CET
(In reply to Philipp Hahn from comment #2)
> As registering the IP as mostly optional, delaying it a little bit should
> not hurt.

Actually it is not optional for DRS replication, which requires up-to-date DNS SRC RRs.
Comment 5 Ingo Steuwer univentionstaff 2021-05-14 16:47:16 CEST
This issue has been filed against UCS 4.3.

UCS 4.3 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed.

If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.