Univention Bugzilla – Bug 30005
endless loop in univention-join if no host certificate available
Last modified: 2018-11-28 12:10:40 CET
Seen at Ticket #2013010921000847. univention-join executes the following code when fetching the new host certificate: " [...] download_host_certificate if [ ! -d "/etc/univention/ssl/$hostname" ] && [ ! -d "/etc/univention/ssl/$hostname.$domainname" ]; then echo "failed to get host certificate" failed_message "failed to get host certificate" fi [...] " The check is never reached, when the remote certificate is not available at the remote host, because the scp runs in an endless loop: " download_host_certificate () { echo -n "Download host certificate " local HOSTPWD="/etc/machine.secret" local HOSTACCOUNT="$hostname\$" while true do univention-scp "$HOSTPWD" -q -r \ "$HOSTACCOUNT@$DCNAME:/etc/univention/ssl/$hostname" \ "$HOSTACCOUNT@$DCNAME:/etc/univention/ssl/$hostname.$domainname" \ /etc/univention/ssl/ >>/var/log/univention/join.log 2>&1 [ -d "/etc/univention/ssl/$hostname" ] && [ -d "/etc/univention/ssl/$hostname.$domainname" ] && break echo -n "." sleep 20 done echo -e "\033[60Gdone" } " Because of that, you don't get an appropriate failure message at shell - it keeps printing dots...
On different computers, we have the same problem, too. It seems that it happens when we re-join a system after changing its IP address. Copying the certificate for that specific system from the DC manually, was a workaround for us.
Happened to me again in UCS 4.0 when joining a slave. The UMC System setup hangs then forever without any information. The join should fail after some time.
Still there when joining a UCS4 Memberserver.
(In reply to Sebastian from comment #1) > On different computers, we have the same problem, too. It seems that it > happens when we re-join a system after changing its IP address. This may be caused by Bug #31926, which triggers when a computer is removed and re-joined in short order. This also happens with an older version of the univention-directory-listener (or an system upgraded from UCS-3.x), where the old cached data is not removed if a listener module fails. This then causes gencertificate.py:114 to take the "else" path instead of re-creating the missing certificate (Bug #35261)
Happend again in a Jenkins-UCS@school test, where the - Listener was killed - did not terminate because of Bug #27895, - after force-kill cache contains already new computer - certificate was not created - joining slave hangs since 2 days. Time to debug this: 1h gencertificate.py should be changed to *always* create any missing certificate instead of doing that only for *old_cache=None*.
Created attachment 6749 [details] Always create missing SSL certificates.
hit me again!
(In reply to Florian Best from comment #7) > hit me again! Version?
(In reply to Stefan Gohmann from comment #8) > (In reply to Florian Best from comment #7) > > hit me again! > > Version? UCS 4.1-3
One reason why the ssh login fails is the NSCD uid cache if the computer object was removed recently before (Bug #31926).
Our today UCS 4.2-3 Jenkins job are hanging in this loop. root 3697 0.0 0.1 93676 7196 ? Ss Nov29 0:00 sshd: root@notty root 3699 0.0 0.0 15268 3568 ? Ss Nov29 0:00 \_ bash -c . utils.sh; run_setup_join_on_non_master "XXXXX" root 3704 0.0 0.1 15520 3876 ? S Nov29 0:00 \_ /bin/bash /usr/lib/univention-system-setup/scripts/setup-join.sh --dcaccount Administrator --password_file /tmp/univention root 11697 0.0 0.0 15520 2716 ? S Nov29 0:00 \_ /bin/bash /usr/lib/univention-system-setup/scripts/setup-join.sh --dcaccount Administrator --password_file /tmp/univention root 11701 0.0 0.1 15908 4296 ? S Nov29 0:01 | \_ /bin/bash /usr/share/univention-join/univention-join -dcaccount Administrator -dcpwd /tmp/tmp.Ctura0G3V5 root 21070 0.0 0.0 7272 652 ? S 09:27 0:00 | \_ sleep 20
Could not chdir to home directory /dev/null: Not a directory scp: /etc/univention/ssl/slave095: No such file or directory
I added logic to univention-join that will try to download the host certificate for 10 minutes. If that does not succeed, abort the join with an error message. ad722fbf Stop trying to download host certificate during join after 10 minutes have passed, then mark the join as failed a1d148bc yaml Package: univention-join Version: 10.0.0-25A~4.3.0.201811271547
OK: ad722fbf OK: a1d148bc OK: errata-announce -V --only univention-join.yaml OK: univention-join.yaml OK: univention-join -dcaccount Administrator -dcpwd <(exec printf univention) > Download host certificate: ..............................failed to get host certificate
Endless loop appears again, installing OpenProject 7.3.1 via: univention-app install openproject I think this app is based on a ucs4.1 docker image, doesn't it?
<http://errata.software-univention.de/ucs/4.3/340.html>