Bug 51902 - UMC module Admin Diary still consumes too much RAM, leaving UCS Server unresponsive
UMC module Admin Diary still consumes too much RAM, leaving UCS Server unresp...
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Admin Diary
UCS 4.4
Other Linux
: P5 normal (vote)
: UCS 5.0-2-errata
Assigned To: Nikola Radovanovic
Florian Best
https://git.knut.univention.de/univen...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-08-24 21:02 CEST by Erik Damrose
Modified: 2023-06-26 11:14 CEST (History)
8 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.200
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional): Large environments, UCS Performance
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Erik Damrose univentionstaff 2020-08-24 21:02:16 CEST
Cloned from Bug #50531
UCS: 4.4-5 errata703

The Admin Diary UMC module still consumes too much memory in large environments, the first fix was not sufficient. At Bug #50531 a limit was implemented, but only for objects handed from the backend to the frontend. In the backend, we still get all relevant admin diary entries (default: every entry from last week), load them into python, get the relevant IDs from them, query them again from SQL and load them again into the backend.

When opening the backend 3 main queries for the entries can be observed at the SQL server:

SELECT entries.id AS entries_id, ...
FROM entries 
WHERE entries.timestamp >= '2020-08-18' (timestamp is today - 7 days)

SELECT entries.id AS entries_id, ...
FROM entries 
WHERE entries.timestamp < '2020-08-25' (all entries older than tomorrow)

^*** this basically loads the entire table

And then we apply the limit to the third query:

SELECT entries.id AS entries_id, ...
FROM entries 
WHERE entries.id IN (1, 2, 3, 4, 5, 6, 7, 8) AND entries.event_id IS NOT NULL 
 LIMIT 1000

The customer installation has quite a few admindiary entries:
SELECT count(*) from entries;
13.098.404

The problem shows as in the original bug, the admindiary UMC module process consumes more than 7GiB of memory, the module and the server is unusable.

Before simply applying the limit to all queries we should check how that affects the default query and what is displayed in UMC if the logic is not adapted - we calculate a set intersection in python to decide which entries are queried and shown in UMC.
Comment 1 Florian Best univentionstaff 2020-08-24 21:20:08 CEST
(In reply to Erik Damrose from comment #0)
> SELECT entries.id AS entries_id, ...
> FROM entries 
> WHERE entries.timestamp < '2020-08-25' (all entries older than tomorrow)
> 
> ^*** this basically loads the entire table
There is a lot code, which does a select without limiting. This is the feature to make a search query. But as we are already limiting the result set, the search-filter criterias can also be limited:

diff --git a/services/univention-admin-diary/python/admindiary/backend.py b/services/univention-admin-diary/python/admindiary/backend.py                                                                                                      
index d9b2972008..1869033ba5 100644
--- a/services/univention-admin-diary/python/admindiary/backend.py
+++ b/services/univention-admin-diary/python/admindiary/backend.py
@@ -253,6 +253,9 @@ class Client(object):
                get_logger().info('Successfully added %s (%s)' % (diary_entry.context_id, diary_entry.event_name))

        def _one_query(self, ids, result):
+               limit = get_query_limit()
+               if limit:
+                       result = result.limit(limit)
                if ids is not None and not ids:
                        return set()
                new_ids = set()
Comment 3 Nico Gulden univentionstaff 2020-12-01 13:06:07 CET
There has not been any recent activity on this bug. Has the problem been seen somewhere else as well in the meantime or has its assessment changed?
Comment 11 Nikola Radovanovic univentionstaff 2022-08-31 15:51:25 CEST
96a25b7c9b | UMC module Admin Diary still consumes too much RAM, leaving UCS Server unresponsive
b38f3f2f5a | Advisory update
Comment 12 Florian Best univentionstaff 2022-09-01 14:19:29 CEST
OK: the RAM consumption for large queries keeps being constant
OK: YAML
Comment 14 Philipp Hahn univentionstaff 2023-06-25 16:32:54 CEST
@fbest updated ruff 0.0.272
 to with <https://git.knut.univention.de/univention/ucs/-/commit/9bbb7fa565711669ea6cd5ca825688e6b8da8203>, which now find the following issues in services/univention-admin-diary/python/admindiary/backend.py introduced by <https://git.knut.univention.de/univention/ucs/-/commit/96a25b7c9b9d4bb04ac541767469504b21d2cecf>:

> services/univention-admin-diary/python/admindiary/backend.py:185:50: PLR0124 Name compared with itself, consider replacing `context_id == context_id`
97

The code is this:
> comments = relationship('Entry', primaryjoin=context_id == context_id, foreign_keys=context_id, remote_side=context_id)