Univention Bugzilla – Bug 56355
Low performance of UDM --append operations for attributes with many values
Last modified: 2023-08-04 12:46:14 CEST
Created attachment 11099 [details] raw data from timing measurement for MOD_REPLACE and MOD_ADD During work on Bug https://forge.univention.org/bugzilla/show_bug.cgi?id=56348 I noticed a similar performance issue, this time relevant for all append operations on multi-valued attributes. When using ``` udm groups/group modify --dn="$DN" --append "$attribute"="$newvalue" ``` The `diff` method in `univention/admin/handlers/__init__.py` calculates a change which results in `(attribute, oldvalues, newvalues` where `newvalues` is `oldvalues` extended by the `newvalue`. As in Bug #56348, this results in a MOD_REPLACE operation. Which is inefficient for adding a single value to a multi-value attribute. Is there any reason why it is done this way? An example scenario with measured times: Successively adding N users to a group. Currently, this would result in N appends which would result in N MOD_REPLACE where the length of the involved `oldvalues` and `newvalues` rise from 1 to N over all operations. This takes a lot of time. A quick measurement shows a slowdown of up to factor 30 for MOD_REPLACE compared to MOD_ADD, see the raw data in the attachment. I have neither tested or measured it, but something like this could be an improvement: diff --git a/management/univention-directory-manager-modules/modules/univention/admin/handlers/__init__.py b/management/univention-directory-manager-modules/modules/univention/admin/handlers/__init__.py index a58b548bfb..98fe4b6f5f 100644 --- a/management/univention-directory-manager-modules/modules/univention/admin/handlers/__init__.py +++ b/management/univention-directory-manager-modules/modules/univention/admin/handlers/__init__.py @@ -283,7 +283,16 @@ class simpleLdap(object): changes.append((key, self.oldinfo[key], null)) continue if (self.oldinfo.get(key) or self.info.get(key)) and self.oldinfo.get(key, null) != self.info.get(key, null): - changes.append((key, self.oldinfo.get(key, null), self.info.get(key, null))) + + old_value = self.oldinfo.get(key, null) + new_value = self.info.get(key, null) + if prop.multivalue and 1 < len(old_value) <= len(new_value): + old_set = set(old_value) # type: Set + new_set = set(new_value) + if old_set.issubset(new_set): + changes.append((key, null, list(new_set - old_set))) + continue + changes.append((key, old_value, new_value)) return changes
The place to fix this is not correct: The patch operates on self.info which contains decoded data and might contain list of lists, etc. Transforming this to sets is false. It has to be done after the values have been map()ed back to LDAP data (list of bytestrings).
So the place where the fix can be done is: management/univention-directory-manager-modules/modules/univention/admin/mapping.py 690 def mapDiff(mapping, diff): … 716 if k and ov != nv: 717 ml.append((k, ov, nv)) which could be e.g.: diff --git management/univention-directory-manager-modules/modules/univention/admin/mapping.py management/univention-directory-manager-modules/modules/univention/admin/mapping.py index c57c92963d..22ba74ce85 100644 --- management/univention-directory-manager-modules/modules/univention/admin/mapping.py +++ management/univention-directory-manager-modules/modules/univention/admin/mapping.py @@ -714,6 +714,16 @@ def mapDiff(mapping, diff): except KeyError: continue if k and ov != nv: + if isinstance(ov, list) and isinstance(nv, list): + old_set = set(ov) + new_set = set(nv) + if old_set.issubset(new_set): + ml.append((k, [], list(new_set - old_set))) + continue + if new_set.issubset(old_set): + ml.append((k, list(old_set - new_set), [])) + continue + ml.append((k, ov, nv)) return ml I guess this works and would be great. But this changes the way the modlist is filled. We might have third party _ldap_modlist hooks which iterate over the whole modlist and "interprete" and change it. I am unsure if they all do it "correct" or if they just rely on this current way of expecting (key, old_value, new_value).
> The place to fix this is not correct: The patch operates on self.info which contains decoded data and might contain list of lists, etc. OK, I guess the type hint for `self.info` is wrong? ``` _Properties = Dict[Text, Union[Text, List[Text]]] ``` > I guess this works and would be great. But this changes the way the modlist is filled. We might have third party _ldap_modlist hooks which iterate over the whole modlist and "interprete" and change it. I am unsure if they all do it "correct" or if they just rely on this current way of expecting (key, old_value, new_value). Couldn't we go one layer below and change it in `modify` in uldap.py? --- a/base/univention-python/modules/uldap.py +++ b/base/univention-python/modules/uldap.py @@ -733,6 +733,22 @@ class access(object): ml = [] for key, oldvalue, newvalue in changes: if oldvalue and newvalue: {+if isinstance(oldvalue, list) and isinstance(newvalue, list):+} {+ ov_set = set(oldvalue)+} {+ nv_set = set(newvalue)+} {+ if len(ov_set.intersection(nv_set)) == 0:+} {+ ml.append((ldap.MOD_REPLACE, key, newvalue))+} {+ continue+} {+ values_to_add = list(nv_set - ov_set)+} {+ values_to_delete = list(ov_set - nv_set)+} {+ if values_to_add:+} {+ ml.append((ldap.MOD_ADD, key, values_to_add))+} {+ if values_to_delete:+} {+ ml.append((ldap.MOD_DELETE, key, values_to_delete))+} {+ continue+} if oldvalue == newvalue or (not isinstance(oldvalue, (bytes, six.text_type)) and not isinstance(newvalue, (bytes, six.text_type)) and set(oldvalue) == set(newvalue)): continue # equal values op = ldap.MOD_REPLACE Disregarding the question how performant this is in reality for the moment, has this any other potential problems?
(In reply to Johannes Königer from comment #3) > > The place to fix this is not correct: The patch operates on self.info which contains decoded data and might contain list of lists, etc. > > OK, I guess the type hint for `self.info` is wrong? > > ``` > _Properties = Dict[Text, Union[Text, List[Text]]] > ``` Yes, this should be: _Properties = Dict[Text, Union[Text, Any]] > > I guess this works and would be great. But this changes the way the modlist is filled. We might have third party _ldap_modlist hooks which iterate over the whole modlist and "interprete" and change it. I am unsure if they all do it "correct" or if they just rely on this current way of expecting (key, old_value, new_value). > > Couldn't we go one layer below and change it in `modify` in uldap.py? Yes, this would be possible. > Disregarding the question how performant this is in reality for the moment, > has this any other potential problems? We have to be careful because some LDAP attributes don't have matching rules (I think e.g. jpegPhoto) which allow something like ADD or DELETE and always needs to use REPLACE. But I would have to try out. It would be good if you could run a Jenkins branch test with those changes (AutotestJoin and S4Connector).
Opened a draft merge request in the ucs repository: https://git.knut.univention.de/univention/ucs/-/merge_requests/861