Bug 56355 - Low performance of UDM --append operations for attributes with many values
Low performance of UDM --append operations for attributes with many values
Status: NEW
Product: UCS
Classification: Unclassified
Component: UDM (Generic)
UCS 5.0
Other Linux
: P5 normal (vote)
: ---
Assigned To: UMC maintainers
UMC maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2023-07-27 11:45 CEST by Johannes Königer
Modified: 2023-08-04 12:46 CEST (History)
1 user (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional): UCS Performance
Max CVSS v3 score:


Attachments
raw data from timing measurement for MOD_REPLACE and MOD_ADD (1.85 KB, text/plain)
2023-07-27 11:45 CEST, Johannes Königer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Johannes Königer univentionstaff 2023-07-27 11:45:24 CEST
Created attachment 11099 [details]
raw data from timing measurement for MOD_REPLACE and MOD_ADD

During work on Bug https://forge.univention.org/bugzilla/show_bug.cgi?id=56348 I noticed a similar performance issue, this time relevant for all append operations on multi-valued attributes.

When using 

```
udm groups/group modify --dn="$DN" --append "$attribute"="$newvalue"
```

The `diff` method in `univention/admin/handlers/__init__.py` calculates a change which results in `(attribute, oldvalues, newvalues` where `newvalues` is `oldvalues` extended by the `newvalue`. As in Bug #56348, this results in a MOD_REPLACE operation. Which is inefficient for adding a single value to a multi-value attribute. Is there any reason why it is done this way?

An example scenario with measured times: Successively adding N users to a group. 

Currently, this would result in N appends which would result in N MOD_REPLACE where the length of the involved `oldvalues` and `newvalues` rise from 1 to N over all operations. 
This takes a lot of time. A quick measurement shows a slowdown of up to factor 30 for MOD_REPLACE compared to MOD_ADD, see the raw data in the attachment.

I have neither tested or measured it, but something like this could be an improvement:

diff --git a/management/univention-directory-manager-modules/modules/univention/admin/handlers/__init__.py b/management/univention-directory-manager-modules/modules/univention/admin/handlers/__init__.py
index a58b548bfb..98fe4b6f5f 100644
--- a/management/univention-directory-manager-modules/modules/univention/admin/handlers/__init__.py
+++ b/management/univention-directory-manager-modules/modules/univention/admin/handlers/__init__.py
@@ -283,7 +283,16 @@ class simpleLdap(object):
                     changes.append((key, self.oldinfo[key], null))
                 continue
             if (self.oldinfo.get(key) or self.info.get(key)) and self.oldinfo.get(key, null) != self.info.get(key, null):
-                changes.append((key, self.oldinfo.get(key, null), self.info.get(key, null)))
+
+                old_value = self.oldinfo.get(key, null)
+                new_value = self.info.get(key, null)
+                if prop.multivalue and 1 < len(old_value) <= len(new_value):
+                    old_set = set(old_value) # type: Set
+                    new_set = set(new_value)
+                    if old_set.issubset(new_set):
+                        changes.append((key, null, list(new_set - old_set)))
+                        continue
+                changes.append((key, old_value, new_value))
 
         return changes
Comment 1 Florian Best univentionstaff 2023-07-27 12:04:56 CEST
The place to fix this is not correct: The patch operates on self.info which contains decoded data and might contain list of lists, etc.
Transforming this to sets is false.
It has to be done after the values have been map()ed back to LDAP data (list of bytestrings).
Comment 2 Florian Best univentionstaff 2023-07-27 12:12:02 CEST
So the place where the fix can be done is:
management/univention-directory-manager-modules/modules/univention/admin/mapping.py
690 def mapDiff(mapping, diff):
…
716             if k and ov != nv:
717                 ml.append((k, ov, nv))

which could be e.g.:
diff --git management/univention-directory-manager-modules/modules/univention/admin/mapping.py management/univention-directory-manager-modules/modules/univention/admin/mapping.py
index c57c92963d..22ba74ce85 100644
--- management/univention-directory-manager-modules/modules/univention/admin/mapping.py
+++ management/univention-directory-manager-modules/modules/univention/admin/mapping.py
@@ -714,6 +714,16 @@ def mapDiff(mapping, diff):
             except KeyError:
                 continue
             if k and ov != nv:
+                if isinstance(ov, list) and isinstance(nv, list):
+                    old_set = set(ov)
+                    new_set = set(nv)
+                    if old_set.issubset(new_set):
+                        ml.append((k, [], list(new_set - old_set)))
+                        continue
+                    if new_set.issubset(old_set):
+                        ml.append((k, list(old_set - new_set), []))
+                        continue
+
                 ml.append((k, ov, nv))
     return ml
 
I guess this works and would be great. But this changes the way the modlist is filled. We might have third party _ldap_modlist hooks which iterate over the whole modlist and "interprete" and change it. I am unsure if they all do it "correct" or if they just rely on this current way of expecting (key, old_value, new_value).
Comment 3 Johannes Königer univentionstaff 2023-07-31 10:47:08 CEST
> The place to fix this is not correct: The patch operates on self.info which contains decoded data and might contain list of lists, etc.

OK, I guess the type hint for `self.info` is wrong?

```
_Properties = Dict[Text, Union[Text, List[Text]]]
```

> I guess this works and would be great. But this changes the way the modlist is filled. We might have third party _ldap_modlist hooks which iterate over the whole modlist and "interprete" and change it. I am unsure if they all do it "correct" or if they just rely on this current way of expecting (key, old_value, new_value).

Couldn't we go one layer below and change it in `modify` in uldap.py?

--- a/base/univention-python/modules/uldap.py
+++ b/base/univention-python/modules/uldap.py
@@ -733,6 +733,22 @@ class access(object):
        ml = []
        for key, oldvalue, newvalue in changes:
            if oldvalue and newvalue:
                {+if isinstance(oldvalue, list) and isinstance(newvalue, list):+}
{+                    ov_set = set(oldvalue)+}
{+                    nv_set = set(newvalue)+}

{+                    if len(ov_set.intersection(nv_set)) == 0:+}
{+                        ml.append((ldap.MOD_REPLACE, key, newvalue))+}
{+                        continue+}

{+                    values_to_add = list(nv_set - ov_set)+}
{+                    values_to_delete = list(ov_set - nv_set)+}

{+                    if values_to_add:+}
{+                        ml.append((ldap.MOD_ADD, key, values_to_add))+}
{+                    if values_to_delete:+}
{+                        ml.append((ldap.MOD_DELETE, key, values_to_delete))+}
{+                    continue+}
                if oldvalue == newvalue or (not isinstance(oldvalue, (bytes, six.text_type)) and not isinstance(newvalue, (bytes, six.text_type)) and set(oldvalue) == set(newvalue)):
                    continue  # equal values
                op = ldap.MOD_REPLACE

Disregarding the question how performant this is in reality for the moment, has this any other potential problems?
Comment 4 Florian Best univentionstaff 2023-07-31 11:09:56 CEST
(In reply to Johannes Königer from comment #3)
> > The place to fix this is not correct: The patch operates on self.info which contains decoded data and might contain list of lists, etc.
> 
> OK, I guess the type hint for `self.info` is wrong?
> 
> ```
> _Properties = Dict[Text, Union[Text, List[Text]]]
> ```
Yes, this should be:

_Properties = Dict[Text, Union[Text, Any]]

 
> > I guess this works and would be great. But this changes the way the modlist is filled. We might have third party _ldap_modlist hooks which iterate over the whole modlist and "interprete" and change it. I am unsure if they all do it "correct" or if they just rely on this current way of expecting (key, old_value, new_value).
> 
> Couldn't we go one layer below and change it in `modify` in uldap.py?
Yes, this would be possible.

> Disregarding the question how performant this is in reality for the moment,
> has this any other potential problems?
We have to be careful because some LDAP attributes don't have matching rules (I think e.g. jpegPhoto) which allow something like ADD or DELETE and always needs to use REPLACE. But I would have to try out.

It would be good if you could run a Jenkins branch test with those changes (AutotestJoin and S4Connector).
Comment 5 Johannes Königer univentionstaff 2023-08-04 12:46:14 CEST
Opened a draft merge request in the ucs repository: https://git.knut.univention.de/univention/ucs/-/merge_requests/861