Univention Bugzilla – Bug 54237
univention-ldapsearch performance
Last modified: 2022-01-12 17:48:32 CET
If you have to call univention-ldapsearch very often in a script loop (e.g. because you have a long list of usernames and have to find out a certain attribute of each user), this slows down the execution quite a bit, because univention-ldapsearch executes 'eval "$(/usr/sbin/univention-config-registry shell)"' with each call. Here some time examples root@master:~# time for I in {1..50}; do echo -n "$I "; univention-ldapsearch uid=administrator dn >/dev/null; done 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 real 0m13,778s user 0m12,120s sys 0m0,740s root@master:~# time for I in {1..50}; do echo -n "$I "; ldapsearch -ZZ -D cn=admin,dc=schule,dc=schein,dc=ig -w `cat /etc/ldap.secret` uid=administrator dn >/dev/null; done 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 real 0m0,919s user 0m0,500s sys 0m0,144s
"Then don't do it": `univention-search` is a "shell-script", which fork()s the Python program "ucr shell" before forking() the "C" implementation "ldapsearch". This will always be slow because you have so many fork()s and especially the Python interpreter, which has a huge startup time. If you have to process many entries use *one* language to save on fork()s, e.g. all Python or Golang. If you have to perform many queries you can improve this even more by using "ldapsearch %s"; see my recent <https://stackoverflow.com/questions/11752516/how-to-check-uid-exists-using-ldapsearch/69989391#69989391> #DEV will probably spend time on this, so don't expect any (generic) improvements here; probably this bug will expire in X years and closed as WONTFIX. PS: s/-w `cat /etc/ldap.secret`/-y /etc/ldap.secret/
I think there can be a significant performance improvement by changing 1 line in the shell script. 1. UCR seems to be slow to generate "ucr shell" output. 2. UCR seems to not be faster when generating the "ucr shell" output for only one var. 3. The univention-ldapsearch script only needs 1 UCRV but eval's all of UCR. root@m40:~# time for I in {1..50}; do ucr get ldap/client/retry/count >/dev/null; done real 0m5,223s user 0m4,240s sys 0m0,945s root@m40:~# time for I in {1..50}; do ucr shell ldap/client/retry/count >/dev/null; done real 0m12,861s user 0m11,504s sys 0m1,262s root@m40:~# time for I in {1..50}; do ucr dump >/dev/null; done real 0m5,023s user 0m4,187s sys 0m0,794s So if univention-ldapsearch would just do "ucr get ldap/client/retry/count" (|| default), if would take half the time for its startup. That is 1 line to change.
Can't we make ucr shell faster? That would improve a lot more scripts.
Yet I agree to replace the full eval "$(/usr/sbin/univention-config-registry shell)" by something more specific.
(In reply to Daniel Tröder from comment #2) > I think there can be a significant performance improvement by changing 1 line in the shell script. > > 1. UCR seems to be slow to generate "ucr shell" output. > 2. UCR seems to not be faster when generating the "ucr shell" output for only one var. > 3. The univention-ldapsearch script only needs 1 UCRV but eval's all of UCR. … > So if univention-ldapsearch would just do "ucr get ldap/client/retry/count" > (|| default), if would take half the time for its startup. That is 1 line to > change. No: "ldap/binddn" and "ldap/hosndn" may also needed when no BIND-DN is given. So after halfing the time you tripple it by doing 3 ucr calls. > # time for I in {1..50}; do ucr get ldap/client/retry/count; done >/dev/null 1. load all /etc/univention/base*.conf into dict()s 2. O(~1) for access > real 0m5,223s > user 0m4,240s > sys 0m0,945s > > # time for I in {1..50}; do ucr shell ldap/client/retry/count; done >/dev/null 1. load all /etc/univention/base*.conf into dict()s 2. load and parse all variable descriptions from /etc/univention/registry.info/variables/* (!) 3. O(n) to iterate over all vars and compare them by RegExp > real 0m12,861s > user 0m11,504s > sys 0m1,262s > > # time for I in {1..50}; do ucr dump; done >/dev/null 1. load all /etc/univention/base*.conf into dict()s 2. O(n) to iterate over all vars and just print them > real 0m5,023s > user 0m4,187s > sys 0m0,794s For comparison how costly Python is: > # time for I in {1..50}; do sed -rne 's,^ldap/client/retry/count: ,,p;T;q' /etc/univention/base{-forced,-schedule,-ldap,,-defaults}.conf; done >/dev/null > real 0m0,053s > user 0m0,038s > sys 0m0,014s That gives you 2 magnitudes: 5s -> 50ms "ucr shell" is internally implemented as "ucr --shell search". To quote from `ucr --help`: > shell [key]: > Convert key/value pair into shell compatible format, e.g. > `version/version: 1.0` => `version_version="1.0"` > (deprecated: use --shell dump instead) (I have ignored that deprecation myself for years) In all cases 'eval "$(ucr --shell dump)"' is more efficient, but it differs subtile from 'eval "$(ucr shell)"' as the later also returns variables defined via '.info' files only - see Bug #22798: > diff <(ucr --shell dump) <(ucr shell) commit d63d7d53c72554422a9dacb56166f3162176fa5d (HEAD -> phahn/54208-ucr-py3) Author: Philipp Hahn <hahn@univention.de> Date: Wed Dec 15 05:57:14 2021 +0100 perf[UCR]: Improve univention-ldapsearch by using `ucr --shell dump` instead of `ucr shell` Bug #54237 diff --git base/univention-config-registry/scripts/univention-ldapsearch base/univention-config-registry/scripts/univention-ldapsearch index d6408f53ae..d89eab4944 100755 --- base/univention-config-registry/scripts/univention-ldapsearch +++ base/univention-config-registry/scripts/univention-ldapsearch @@ -30,7 +30,7 @@ # /usr/share/common-licenses/AGPL-3; if not, see # <https://www.gnu.org/licenses/>. -eval "$(/usr/sbin/univention-config-registry shell)" +eval "$(/usr/sbin/univention-config-registry --shell dump)" ## check for option -D to avoid "ldapsearch: -D previously specified" ## check for option -w to avoid "ldapsearch: -y incompatible with -w" PS: As most of the UCRVs used in `univention-ldapsearch` are static you could even convert it into a UCR template registered for `ldap/hostdn` and `ldap/client/retry/count` only: Then you no longer have to call `ucr` at runtime at all but only when they change. PPS: As the command line parsing in `univention-ldapsearch` is "wrong" (getopt() in `ldapsearch` allows to merge single-letter options into one argument which the shell code does not handle at all) you could even patch `ldapsearch` to have the functionality of `univention-search` to use default credentials itself, which would save you from using /bin/bash at all AND have correct argument parsing. > openldap/clients/tools/ldapsearch.c:365 const char options[] = "a:Ab:cE:F:l:Ls:S:tT:uz:Cd:D:e:f:h:H:IMnNO:o:p:P:QR:U:vVw:WxX:y:Y:Z"; See git:phahn/54208-ucr-py3
(In reply to Philipp Hahn from comment #5) > PS: As most of the UCRVs used in `univention-ldapsearch` are static you > could even convert it into a UCR template registered for `ldap/hostdn` and > `ldap/client/retry/count` only: Then you no longer have to call `ucr` at > runtime at all but only when they change. ... and the undocumented but used "ldap/binddn": > # git grep -n '[^/]\<ldap.binddn\>' > base/univention-config-registry/scripts/univention-ldapsearch:56: : "${binddn:=${ldap_binddn:-}}" > base/univention-lib/shell/ldap.sh:109:# optional environment: ldap_binddn and ldap_bindpw > base/univention-system-setup/usr/lib/univention-system-setup/scripts/10_basis/18root_password:65: binddn=`ldap_binddn`