Bug 54311 - In case of automated joins, join failed for more the one node in parallel.
In case of automated joins, join failed for more the one node in parallel.
Status: NEW
Product: UCS
Classification: Unclassified
Component: SSL
UCS 5.0
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks: 39824
  Show dependency treegraph
 
Reported: 2022-01-07 13:54 CET by Gino Harlos
Modified: 2023-05-01 14:28 CEST (History)
1 user (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.086
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional): bitesize
Max CVSS v3 score:


Attachments
Patch suggestion (2.23 KB, patch)
2022-01-07 13:54 CET, Gino Harlos
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gino Harlos 2022-01-07 13:54:57 CET
Created attachment 10896 [details]
Patch suggestion

In case of automated joins, there are coming a lot of certificates in a short period of time and you will get a join failed.

root@pdc:~# echo "server/role=$(ucr get server/role)" && cat /var/log/univention/listener.log
server/role=domaincontroller_master
...
01.01.22 07:13:13.212  LISTENER    ( PROCESS ) : updating 'cn=mdc,cn=memberserver,cn=computers,dc=ucs,dc=example' command a
...
: Failed to get exclusive lock
...

Patch included ( git diff > univention-certificate.patch )

Environment:
 - inside univention-directory-listener.service from a join process or manually in a SHELL/BASH
 - systemd-detect-virt => docker and/or kvm ( not tested on a real machine! )
 - reproduce command for a SHELL/BASH as root user ( inside container or kvm )

Reproduce ORIGINAL:
root@pdc:~# rsync -av /etc/univention/ssl/ /etc/univention/ssl.orig
root@pdc:~# echo "ssl/default/bits=$(ucr get ssl/default/bits)" && for cert in {a..z}; do /bin/bash -c "/bin/bash /usr/sbin/univention-certificate.orig new -name ${cert}.$(ucr get domainname) 2>&1 | egrep -- '^(Creating certificate|\:)' &"; done
ssl/default/bits=2048
Creating certificate: a.ucs.example Fr 7. Jan 13:11:35 CET 2022
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
Creating certificate: t.ucs.example Fr 7. Jan 13:11:36 CET 2022
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock
: Failed to get exclusive lock

Reproduce PATCHED:
root@pdc:~# rm -rf /etc/univention/ssl && rsync -av /etc/univention/ssl.orig/ /etc/univention/ssl
root@pdc:~# echo "ssl/default/bits=$(ucr get ssl/default/bits)" && for cert in {a..z}; do /bin/bash -c "/bin/bash /usr/sbin/univention-certificate new -name ${cert}.$(ucr get domainname) 2>&1 | egrep -- '^(Creating certificate|\:)' &"; done
ssl/default/bits=2048
Creating certificate: a.ucs.example Fr 7. Jan 13:13:31 CET 2022
Creating certificate: t.ucs.example Fr 7. Jan 13:13:32 CET 2022
Creating certificate: w.ucs.example Fr 7. Jan 13:13:33 CET 2022
Creating certificate: y.ucs.example Fr 7. Jan 13:13:35 CET 2022
Creating certificate: l.ucs.example Fr 7. Jan 13:13:36 CET 2022
Creating certificate: b.ucs.example Fr 7. Jan 13:13:37 CET 2022
Creating certificate: u.ucs.example Fr 7. Jan 13:13:38 CET 2022
Creating certificate: i.ucs.example Fr 7. Jan 13:13:40 CET 2022
Creating certificate: m.ucs.example Fr 7. Jan 13:13:41 CET 2022
Creating certificate: r.ucs.example Fr 7. Jan 13:13:42 CET 2022
Creating certificate: h.ucs.example Fr 7. Jan 13:13:44 CET 2022
Creating certificate: n.ucs.example Fr 7. Jan 13:13:45 CET 2022
Creating certificate: q.ucs.example Fr 7. Jan 13:13:47 CET 2022
Creating certificate: v.ucs.example Fr 7. Jan 13:13:48 CET 2022
Creating certificate: c.ucs.example Fr 7. Jan 13:13:50 CET 2022
Creating certificate: g.ucs.example Fr 7. Jan 13:13:51 CET 2022
Creating certificate: p.ucs.example Fr 7. Jan 13:13:54 CET 2022
Creating certificate: d.ucs.example Fr 7. Jan 13:13:55 CET 2022
Creating certificate: x.ucs.example Fr 7. Jan 13:13:58 CET 2022
Creating certificate: f.ucs.example Fr 7. Jan 13:13:59 CET 2022
Creating certificate: z.ucs.example Fr 7. Jan 13:14:01 CET 2022
Creating certificate: s.ucs.example Fr 7. Jan 13:14:02 CET 2022
Creating certificate: e.ucs.example Fr 7. Jan 13:14:03 CET 2022
Creating certificate: k.ucs.example Fr 7. Jan 13:14:05 CET 2022
Creating certificate: o.ucs.example Fr 7. Jan 13:14:09 CET 2022
Creating certificate: j.ucs.example Fr 7. Jan 13:14:13 CET 2022

Patch suggestion:
root@pdc:~# diff --unified /usr/sbin/univention-certificate.orig /usr/sbin/univention-certificate
...
-	local role="$1" mode="$2"
+	local role="$1" mode="${2:-unlock}"
 	case "$role/$(ucr get server/role)" in
...
-	exec 3<"$SSLBASE"
-	flock -n --"$mode" 3 ||
-		die "Failed to get $mode lock"
+	[ 0 -eq ${#FD} ] &&
+		exec {FD}<${SSLBASE}
+	for i in {1..99}; do jitter ${i} flock -n --${mode} ${FD} && return || continue; done
+	flock -n --${mode} ${FD} ||
+		die "Failed to get ${mode} lock"
 }
...

Patch dependency:
 - The patch include jitter instead of sleep for a small performance boost ... ( save time )

Patch restrictions:
 - I'm not relay sure about the timeout up to 99 ( that is around half an hour or a maximum of 4950 seconds ), but we don't have to forget ssl/default/bits too

Patch evaluations:
for bits in 2048 4096 8192 16384; do printf "%s\t" "ssl/default/bits=${bits}" && ls -1 *${bits}.txt && head -3 *${bits}.txt && echo "..." && tail -n 3 *${bits}.txt && echo; done

ssl/default/bits=2048	timeout-univention-certificate-ssl.default.bits-2048.txt
root@pdc:~# echo "ssl/default/bits=$(ucr get ssl/default/bits)" && for cert in {9000..9999}; do /bin/bash -c "/bin/bash /usr/sbin/univention-certificate new -name ${cert}.$(ucr get domainname) 2>&1 | egrep -- '^(Creating certificate|\:)' &"; done
ssl/default/bits=2048
Creating certificate: 9000.ucs.example Do 6. Jan 09:29:28 CET 2022
...
Creating certificate: 9431.ucs.example Do 6. Jan 09:51:18 CET 2022
Creating certificate: 9241.ucs.example Do 6. Jan 09:51:21 CET 2022
Creating certificate: 9957.ucs.example Do 6. Jan 09:51:41 CET 2022

ssl/default/bits=4096	timeout-univention-certificate-ssl.default.bits-4096.txt
root@pdc:~# echo "ssl/default/bits=$(ucr get ssl/default/bits)" && for cert in {9000..9999}; do /bin/bash -c "/bin/bash /usr/sbin/univention-certificate new -name ${cert}.$(ucr get domainname) 2>&1 | egrep -- '^(Creating certificate|\:)' &"; done
ssl/default/bits=4096
Creating certificate: 9000.ucs.example Do 6. Jan 09:57:45 CET 2022
...
Creating certificate: 9416.ucs.example Do 6. Jan 10:27:31 CET 2022
Creating certificate: 9655.ucs.example Do 6. Jan 10:27:34 CET 2022
Creating certificate: 9115.ucs.example Do 6. Jan 10:28:23 CET 2022

ssl/default/bits=8192	timeout-univention-certificate-ssl.default.bits-8192.txt
root@pdc:~# echo "ssl/default/bits=$(ucr get ssl/default/bits)" && for cert in {9000..9999}; do /bin/bash -c "/bin/bash /usr/sbin/univention-certificate new -name ${cert}.$(ucr get domainname) 2>&1 | egrep -- '^(Creating certificate|\:)' &"; done
ssl/default/bits=8192
Creating certificate: 9000.ucs.example Do 6. Jan 10:30:14 CET 2022
...
Creating certificate: 9740.ucs.example Do 6. Jan 11:17:58 CET 2022
Creating certificate: 9284.ucs.example Do 6. Jan 11:18:03 CET 2022
Creating certificate: 9173.ucs.example Do 6. Jan 11:18:46 CET 2022

ssl/default/bits=16384	timeout-univention-certificate-ssl.default.bits-16384.txt
root@pdc:~# echo "ssl/default/bits=$(ucr get ssl/default/bits)" && for cert in {9000..9999}; do /bin/bash -c "/bin/bash /usr/sbin/univention-certificate new -name ${cert}.$(ucr get domainname) 2>&1 | egrep -- '^(Creating certificate|\:)' &"; done
ssl/default/bits=16384
Creating certificate: 9000.ucs.example Do 6. Jan 11:19:54 CET 2022
...
: Failed to get exclusive lock
Creating certificate: 9220.ucs.example Do 6. Jan 12:08:34 CET 2022
: Failed to get exclusive lock

for bits in 2048 4096 8192 16384; do printf "%s\t" "ssl/default/bits=${bits}" && ls -1 *${bits}.txt && egrep -- "^\: Failed" *${bits}.txt | wc -l | awk '{ print ": Failed  ... " $1 }' && egrep -- "^Creating" *${bits}.txt | wc -l | awk '{ print ": Created ... " $1 }'; done

ssl/default/bits=2048	timeout-univention-certificate-ssl.default.bits-2048.txt
: Failed  ... 0
: Created ... 1000
ssl/default/bits=4096	timeout-univention-certificate-ssl.default.bits-4096.txt
: Failed  ... 0
: Created ... 1000
ssl/default/bits=8192	timeout-univention-certificate-ssl.default.bits-8192.txt
: Failed  ... 571
: Created ... 429
ssl/default/bits=16384	timeout-univention-certificate-ssl.default.bits-16384.txt
: Failed  ... 962
: Created ... 38
Comment 1 Gino Harlos 2023-05-01 14:28:04 CEST
With pull request #10 ( https://github.com/univention/ucs-appliance-container/pull/10 ), this patch proposal is no longer needed and could be closed from the perspective of ucs-appliance-container.