Univention Bugzilla – Bug 51776
Hanging Join process during certificate retrieval
Last modified: 2021-09-20 08:57:23 CEST
One of our Jenkins-Test instance hangs since 1 day: root 12539 0.0 0.0 7304 3628 ? Ss Aug05 0:00 \_ bash -c . utils.sh && run_setup_join_on_non_master root 12545 0.0 0.0 7324 3888 ? S Aug05 0:00 \_ /bin/bash /usr/lib/univention-system-setup/scripts/setup-join.sh --dcaccount Administrator --password_file /tmp/univention root 8270 0.0 0.0 7324 2448 ? S Aug05 0:00 \_ /bin/bash /usr/lib/univention-system-setup/scripts/setup-join.sh --dcaccount Administrator --password_file /tmp/univention root 8274 0.0 0.1 7684 4208 ? S Aug05 0:00 | \_ /bin/bash /usr/share/univention-join/univention-join -dcaccount Administrator -dcpwd /tmp/tmp.gkqnTREeI5 root 10031 0.0 0.0 2388 1708 ? S Aug05 0:04 | \_ /bin/sh /usr/sbin/univention-fetch-certificate slave098 master098.autotest098.local root 31664 1.0 0.1 15720 7804 ? S 09:56 0:00 | \_ /usr/bin/python2.7 /usr/sbin/univention-scp /etc/machine.secret -r slave098$@master098.autotest098.local:/etc/univention/ssl/slave098 slave098$@master098.autotest098.local:/etc/univention/ssl/slave098.autotest098.local /etc/univention/ssl/ root 31665 0.0 0.0 5356 760 ? Ss 09:56 0:00 | \_ scp -o StrictHostKeyChecking=no -o ControlPath=none -r slave098$@master098.autotest098.local:/etc/univention/ssl/slave098 slave098$@master098.autotest098.local:/etc/univention/ssl/slave098.autotest098.local /etc/univention/ssl/ root 31666 1.0 0.0 0 0 ? Z 09:56 0:00 | \_ [ssh] <defunct> root 31668 0.0 0.1 15816 7448 ? S 09:56 0:00 | \_ /usr/bin/ssh -x -oForwardAgent=no -oPermitLocalCommand=no -oClearAllForwardings=yes -oRemoteCommand=none -oRequestTTY=no -o StrictHostKeyChecking=no -o ControlPath=none -l slave098$ -- master098.autotest098.local scp -r -d -f /etc/univention/ssl/slave098.autotest098.local root 8271 0.0 0.0 7324 2368 ? S Aug05 0:00 \_ /bin/bash /usr/lib/univention-system-setup/scripts/setup-join.sh --dcaccount Administrator --password_file /tmp/univention
This happened twice again in our Jenkins environment, again on a DC Slave.
I had a similar problem caused by this Bug #51804 (comment 9), could this be the problem here as well?
(In reply to Felix Botner from comment #2) > I had a similar problem caused by this Bug #51804 (comment 9), could this be > the problem here as well? Sounds like, yes. Maybe we can use Bug #51804 to fix the issue and this bug to fix the error handling, so that the join aborts after 1 hour when the certificates aren't created.
(In reply to Florian Best from comment #3) > (In reply to Felix Botner from comment #2) > > I had a similar problem caused by this Bug #51804 (comment 9), could this be > > the problem here as well? > > Sounds like, yes. > Maybe we can use Bug #51804 to fix the issue and this bug to fix the error > handling, so that the join aborts after 1 hour when the certificates aren't > created. Yep, that makes sense.
Jenkins → UCSschool-4.4 → Install U@S 4.4 Multiserver Large Env is stalled since 11 days: @slave300-s3 > 06:32:11 [slave300-s3] . utils.sh; run_setup_join_on_non_master # ps axfu root 4040 0.0 0.1 144192 11012 ? Ss Nov15 0:05 sshd: root@notty root 4051 0.0 0.0 13552 3716 ? Ss Nov15 0:00 \_ bash -c . utils.sh; run_setup_join_on_non_master root 4059 0.0 0.0 13692 3808 ? S Nov15 0:00 \_ /bin/bash /usr/lib/univention-system-setup/scripts/setup-join.sh --dcaccount Administrator --password_file /tmp/univention root 5555 0.0 0.0 13692 2828 ? S Nov15 0:00 \_ /bin/bash /usr/lib/univention-system-setup/scripts/setup-join.sh --dcaccount Administrator --password_file /tmp/univention root 5559 0.0 0.0 13908 4296 ? S Nov15 0:00 | \_ /bin/bash /usr/share/univention-join/univention-join -dcaccount Administrator -dcpwd /tmp/tmp.5xlPTsybgu root 7838 0.0 0.0 4276 1596 ? S Nov15 0:19 | \_ /bin/sh /usr/sbin/univention-fetch-certificate slave300-s3 master300.autotest300.local root 8301 0.0 0.0 7364 668 ? S 08:06 0:00 | \_ sleep 20 root 5556 0.0 0.0 13692 2704 ? S Nov15 0:00 \_ /bin/bash /usr/lib/univention-system-setup/scripts/setup-join.sh --dcaccount Administrator --password_file /tmp/univention # find /etc/univention/ssl -ls 2878206 4 drwxr-xr-x 3 root root 4096 Nov 15 06:37 /etc/univention/ssl 2878207 4 drwxr-xr-x 2 root root 4096 Nov 15 06:37 /etc/univention/ssl/ucsCA 2878246 4 -rw-r--r-- 1 root root 1948 Nov 15 06:37 /etc/univention/ssl/ucsCA/CAcert.pem @master300: # less /home/Administrator/.univention-server-join.log 10.11.20 05:52:24.077 DEBUG_INIT univention-server-join called Parameter: -bindpwfile /tmp/tmp.CIjCnemboE -binddn uid=Administrator,cn=users,dc=autotest300,dc=local -ip 10.207.229.64 -netmask 255.255.0.0 -mac 52:54:00:e0:d8:f4 -role domaincontroller_slave -hostname slave300-s3 -domainname autotest300.local Calculated subnet = 10.207 forwardZone zoneName=autotest300.local,cn=dns,dc=autotest300,dc=local reverseZone zoneName=207.10.in-addr.arpa,cn=dns,dc=autotest300,dc=local dhcpEntry Join DC Slave Create new DC Slave 15.11.20 06:37:00.591 DEBUG_INIT # find /etc/univention/ssl -name slave300\* -ls 2878533 4 drwxr-x--- 2 slave300-s1$ DC Backup Hosts 4096 Nov 15 06:25 /etc/univention/ssl/slave300-s1.autotest300.local 2878560 0 lrwxrwxrwx 1 root nogroup 29 Nov 15 06:25 /etc/univention/ssl/slave300-s1 -> slave300-s1.autotest300.local # grep cn=slave300-s3 /var/log/univention/listener.log <EMPTY> # cat /var/lib/univention-ldap/listener/listener 1335 cn=2012,cn=gidNumber,cn=temporary,cn=univention,dc=autotest300,dc=local a 1336 cn=2012,cn=gidNumber,cn=temporary,cn=univention,dc=autotest300,dc=local d 1337 cn=52:54:00:e0:d8:f4,cn=mac,cn=temporary,cn=univention,dc=autotest300,dc=local a 1338 cn=10.207.229.64,cn=aRecord,cn=temporary,cn=univention,dc=autotest300,dc=local a 1339 cn=uidNumber,cn=temporary,cn=univention,dc=autotest300,dc=local m 1340 cn=2012,cn=uidNumber,cn=temporary,cn=univention,dc=autotest300,dc=local d 1341 cn=slave300-s3$,cn=uid,cn=temporary,cn=univention,dc=autotest300,dc=local a 1342 cn=slave300-s3,cn=dc,cn=computers,dc=autotest300,dc=local a 1343 cn=slave300-s3$,cn=uid,cn=temporary,cn=univention,dc=autotest300,dc=local d 1344 cn=slave300-s3,cn=dc,cn=computers,dc=autotest300,dc=local m 1345 cn=slave300-s3,cn=dc,cn=computers,dc=autotest300,dc=local m 1346 relativeDomainName=slave300-s3,zoneName=autotest300.local,cn=dns,dc=autotest300,dc=local a 1347 zoneName=autotest300.local,cn=dns,dc=autotest300,dc=local m 1348 relativeDomainName=64.229,zoneName=207.10.in-addr.arpa,cn=dns,dc=autotest300,dc=local a 1349 zoneName=207.10.in-addr.arpa,cn=dns,dc=autotest300,dc=local m 1350 cn=10.207.229.64,cn=aRecord,cn=temporary,cn=univention,dc=autotest300,dc=local d 1351 cn=52:54:00:e0:d8:f4,cn=mac,cn=temporary,cn=univention,dc=autotest300,dc=local d 1352 cn=DC Slave Hosts,cn=groups,dc=autotest300,dc=local m # cat /var/log/univention/notifier.log 15.11.20 06:25:11.908 DEBUG_INIT 15.11.20 06:37:00.646 TRANSFILE ( ERROR ) : ldap_sasl_interactive_bind_s(): Can't contact LDAP server 15.11.20 06:37:06.030 DEBUG_INIT (gdb) bt full #0 0x00007fb81ce135e3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84 No locals. #1 0x000055a3afcbffd4 in network_client_main_loop () at network.c:315 fd = 1024 testfds = {fds_bits = {656, 0 <repeats 15 times>}} #2 0x000055a3afcc297f in main (argc=6, argv=0x7ffde6c44288) at univention-directory-notifier.c:237 foreground = 1 debug = 1 So Bug #51804 again
(In reply to Philipp Hahn from comment #5) # /etc/init.d/univention-directory-notifier restart # wc -l /var/lib/univention-ldap/listener/listener 0 /var/lib/univention-ldap/listener/listener
Again <https://jenkins.knut.univention.de:8181/job/UCSschool-4.4/job/Install%20Multiserver/Config=s4,TestGroup=base1,UCSRelease=testing/913/console> since 3 days
https://git.knut.univention.de/univention/ucs/-/merge_requests/115
[phahn:~/REPOS/ucs/base/univention-ssl] 5.0-0+* 141 ± git cl -4 [5.0-0] e92d0a11bc style[ssl-download] Check also for machine.secret base/univention-ssl/univention-fetch-certificate | 3 +++ 1 file changed, 3 insertions(+) [5.0-0] 4549820633 fix[ssl-download] univention-scp detection base/univention-ssl/debian/ucslint.overrides | 2 ++ base/univention-ssl/univention-fetch-certificate | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-) [5.0-0] 700a44a507 style[ssl-download] shellcheck issues base/univention-ssl/univention-fetch-certificate | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) [5.0-0] 429be491fc fix[ssl-download] Abort after timeout base/univention-ssl/debian/changelog | 6 ++++++ base/univention-ssl/univention-fetch-certificate | 14 +++++--------- doc/errata/staging/univention-ssl.yaml | 10 ++++++++++ 3 files changed, 21 insertions(+), 9 deletions(-) Package: univention-ssl Version: 14.0.2-2A~5.0.0.202107221328 Branch: ucs_5.0-0 Scope: errata5.0-0 [5.0-0] b11b3a2d1c Bug #51776: ssl, Bug #53339: udm doc/errata/staging/univention-directory-manager-modules.yaml | 2 +- doc/errata/staging/univention-ssl.yaml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
OK: timeout after 10 minutes # time univention-fetch-certificate msater.school.dev master.school.dev Download host certificate for msater.school.dev:...............................univention-fetch-certificate: failed to get host certificate real 10m20,468s user 0m0,413s sys 0m0,039s OK: code review ~OK: YAML
<https://errata.software-univention.de/#/?erratum=5.0x58>