Bug 54237 - univention-ldapsearch performance
univention-ldapsearch performance
Status: NEW
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 4.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
https://git.knut.univention.de/univen...
:
Depends on: 31257
Blocks:
  Show dependency treegraph
 
Reported: 2021-12-10 14:44 CET by Christina Scheinig
Modified: 2022-01-12 17:48 CET (History)
6 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 2: Improvement: Would be a product improvement
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.046
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2021120821000312
Bug group (optional): bitesize, UCS Performance
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christina Scheinig univentionstaff 2021-12-10 14:44:19 CET
If you have to call univention-ldapsearch very often in a script loop (e.g. because you have a long list of usernames and have to find out a certain attribute of each user), this slows down the execution quite a bit, because univention-ldapsearch executes 'eval "$(/usr/sbin/univention-config-registry
shell)"' with each call. 


Here some time examples
root@master:~# time for I in {1..50}; do echo -n "$I "; univention-ldapsearch uid=administrator dn >/dev/null; done
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
real    0m13,778s
user    0m12,120s
sys     0m0,740s
 
root@master:~# time for I in {1..50}; do echo -n "$I "; ldapsearch -ZZ -D cn=admin,dc=schule,dc=schein,dc=ig -w `cat /etc/ldap.secret` uid=administrator dn >/dev/null; done
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
real    0m0,919s
user    0m0,500s
sys     0m0,144s
Comment 1 Philipp Hahn univentionstaff 2021-12-10 16:27:58 CET
"Then don't do it": `univention-search` is a "shell-script", which fork()s the Python program "ucr shell" before forking() the "C" implementation "ldapsearch". This will always be slow because you have so many fork()s and especially the Python interpreter, which has a huge startup time. 
If you have to process many entries use *one* language to save on fork()s, e.g. all Python or Golang.

If you have to perform many queries you can improve this even more by using "ldapsearch %s"; see my recent <https://stackoverflow.com/questions/11752516/how-to-check-uid-exists-using-ldapsearch/69989391#69989391>

#DEV will probably spend time on this, so don't expect any (generic) improvements here; probably this bug will expire in X years and closed as WONTFIX.

PS: s/-w `cat /etc/ldap.secret`/-y /etc/ldap.secret/
Comment 2 Daniel Tröder univentionstaff 2021-12-12 09:44:34 CET
I think there can be a significant performance improvement by changing 1 line in the shell script.

1. UCR seems to be slow to generate "ucr shell" output.
2. UCR seems to not be faster when generating the "ucr shell" output for only one var.
3. The univention-ldapsearch script only needs 1 UCRV but eval's all of UCR.

root@m40:~# time for I in {1..50}; do ucr get ldap/client/retry/count >/dev/null; done

real	0m5,223s
user	0m4,240s
sys	0m0,945s

root@m40:~# time for I in {1..50}; do ucr shell ldap/client/retry/count >/dev/null; done

real	0m12,861s
user	0m11,504s
sys	0m1,262s

root@m40:~# time for I in {1..50}; do ucr dump >/dev/null; done

real	0m5,023s
user	0m4,187s
sys	0m0,794s


So if univention-ldapsearch would just do "ucr get ldap/client/retry/count" (|| default), if would take half the time for its startup. That is 1 line to change.
Comment 3 Arvid Requate univentionstaff 2021-12-13 17:41:39 CET
Can't we make ucr shell faster? That would improve a lot more scripts.
Comment 4 Arvid Requate univentionstaff 2021-12-13 17:43:11 CET
Yet I agree to replace the full eval "$(/usr/sbin/univention-config-registry shell)" by something more specific.
Comment 5 Philipp Hahn univentionstaff 2021-12-15 07:20:08 CET
(In reply to Daniel Tröder from comment #2)
> I think there can be a significant performance improvement by changing 1 line in the shell script.
> 
> 1. UCR seems to be slow to generate "ucr shell" output.
> 2. UCR seems to not be faster when generating the "ucr shell" output for only one var.
> 3. The univention-ldapsearch script only needs 1 UCRV but eval's all of UCR.> So if univention-ldapsearch would just do "ucr get ldap/client/retry/count"
> (|| default), if would take half the time for its startup. That is 1 line to
> change.

No: "ldap/binddn" and "ldap/hosndn" may also needed when no BIND-DN is given. So after halfing the time you tripple it by doing 3 ucr calls.

> # time for I in {1..50}; do ucr get ldap/client/retry/count; done >/dev/null

1. load all /etc/univention/base*.conf into dict()s
2. O(~1) for access

> real	0m5,223s
> user	0m4,240s
> sys	0m0,945s
> 
> # time for I in {1..50}; do ucr shell ldap/client/retry/count; done >/dev/null

1. load all /etc/univention/base*.conf into dict()s
2. load and parse all variable descriptions from /etc/univention/registry.info/variables/* (!)
3. O(n) to iterate over all vars and compare them by RegExp

> real	0m12,861s
> user	0m11,504s
> sys	0m1,262s
> 
> # time for I in {1..50}; do ucr dump; done >/dev/null

1. load all /etc/univention/base*.conf into dict()s
2. O(n) to iterate over all vars and just print them

> real	0m5,023s
> user	0m4,187s
> sys	0m0,794s


For comparison how costly Python is:
> # time for I in {1..50}; do sed -rne 's,^ldap/client/retry/count: ,,p;T;q' /etc/univention/base{-forced,-schedule,-ldap,,-defaults}.conf; done >/dev/null 
> real    0m0,053s
> user    0m0,038s
> sys     0m0,014s

That gives you 2 magnitudes: 5s -> 50ms


"ucr shell" is internally implemented as "ucr --shell search". To quote from `ucr --help`:
> shell [key]:
>    Convert key/value pair into shell compatible format, e.g.
>    `version/version: 1.0` => `version_version="1.0"`
>    (deprecated: use --shell dump instead)

(I have ignored that deprecation myself for years)


In all cases 'eval "$(ucr --shell dump)"' is more efficient, but it differs subtile from 'eval "$(ucr shell)"' as the later also returns variables defined via '.info' files only - see Bug #22798:
> diff <(ucr --shell dump) <(ucr shell)

commit d63d7d53c72554422a9dacb56166f3162176fa5d (HEAD -> phahn/54208-ucr-py3)
Author: Philipp Hahn <hahn@univention.de>
Date:   Wed Dec 15 05:57:14 2021 +0100

    perf[UCR]: Improve univention-ldapsearch
    
    by using `ucr --shell dump` instead of `ucr shell`
    
    Bug #54237

diff --git base/univention-config-registry/scripts/univention-ldapsearch base/univention-config-registry/scripts/univention-ldapsearch
index d6408f53ae..d89eab4944 100755
--- base/univention-config-registry/scripts/univention-ldapsearch
+++ base/univention-config-registry/scripts/univention-ldapsearch
@@ -30,7 +30,7 @@
 # /usr/share/common-licenses/AGPL-3; if not, see
 # <https://www.gnu.org/licenses/>.
 
-eval "$(/usr/sbin/univention-config-registry shell)"
+eval "$(/usr/sbin/univention-config-registry --shell dump)"
 
 ## check for option -D to avoid "ldapsearch: -D previously specified"
 ## check for option -w to avoid "ldapsearch: -y incompatible with -w"


PS: As most of the UCRVs used in `univention-ldapsearch` are static you could even convert it into a UCR template registered for `ldap/hostdn` and `ldap/client/retry/count` only: Then you no longer have to call `ucr` at runtime at all but only when they change.

PPS: As the command line parsing in `univention-ldapsearch` is "wrong" (getopt() in `ldapsearch` allows to merge single-letter options into one argument which the shell code does not handle at all) you could even patch `ldapsearch` to have the functionality of `univention-search` to use default credentials itself, which would save you from using /bin/bash at all AND have correct argument parsing.
> openldap/clients/tools/ldapsearch.c:365 const char options[] = "a:Ab:cE:F:l:Ls:S:tT:uz:Cd:D:e:f:h:H:IMnNO:o:p:P:QR:U:vVw:WxX:y:Y:Z";

See git:phahn/54208-ucr-py3
Comment 6 Philipp Hahn univentionstaff 2021-12-15 07:28:48 CET
(In reply to Philipp Hahn from comment #5)
> PS: As most of the UCRVs used in `univention-ldapsearch` are static you
> could even convert it into a UCR template registered for `ldap/hostdn` and
> `ldap/client/retry/count` only: Then you no longer have to call `ucr` at
> runtime at all but only when they change.
... and the undocumented but used "ldap/binddn":
> # git grep -n '[^/]\<ldap.binddn\>'
> base/univention-config-registry/scripts/univention-ldapsearch:56:       : "${binddn:=${ldap_binddn:-}}"
> base/univention-lib/shell/ldap.sh:109:# optional environment: ldap_binddn and ldap_bindpw
> base/univention-system-setup/usr/lib/univention-system-setup/scripts/10_basis/18root_password:65:       binddn=`ldap_binddn`