Bug 49862 - The initialization of the listener module portal_groups takes too long
The initialization of the listener module portal_groups takes too long
Status: NEW
Product: UCS
Classification: Unclassified
Component: Listener (univention-directory-listener)
UCS 4.4
Other Linux
: P5 normal (vote)
: UCS 5.0-0-errata
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-07-17 11:04 CEST by Christina Scheinig
Modified: 2021-11-29 16:43 CET (History)
9 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.286
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review: Yes
Ticket number: 2019071221000426
Bug group (optional): Large environments
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christina Scheinig univentionstaff 2019-07-17 11:04:42 CEST
The initialization of the listener module portal_groups takes up to 12 hours in large environments (36000 groups). 
The listener builds its cache group by group the first time. The cache is written 36,000 times instead of calculating everything once at the end. 

We have to optimize that.

In this special environment we have the workaround, that this module can be deactivated  by setting the ucr varible listener/module/<name>/deactivate

This might not work on other environments.
Comment 1 Christina Scheinig univentionstaff 2019-07-24 14:21:10 CEST
I small update. The customer has concerns about the Listener module to cause a lot of trouble during the next import of the new user data at the end of the school year. Thus several Dozens of thousands of groups can be modified hundreds of times, so that the Listener module could also stop other modules (S4-Connector) here. 

That would be a big problem. If these concerns are justified, the customer would therefore like a short-term performance optimization for the module if possible.

Maybe it's enough to deactivate the module with the ucr variable again?
Comment 2 Sönke Schwardt-Krummrich univentionstaff 2019-10-25 16:44:21 CEST
Is there a reason why the cache is rebuilt with every group change and not just once in postrun()?
Comment 3 Michael Grandjean univentionstaff 2019-10-25 16:48:07 CEST
Even for environments not that large, this is a problem, especiall since it takes place during an upgrade and extends the downtime and maintenance window massively.

6500 groups on a 8 Core, 32 GB RAM, UCS Backup:

25.10.19 15:55:25.273  LISTENER    ( WARN    ) : initializing module portal_groups
25.10.19 16:37:56.854  LISTENER    ( WARN    ) : finished initializing module portal_groups with rv=0
Comment 5 Florian Best univentionstaff 2019-11-28 17:48:34 CET
Untested patch in git:fbest/49862-remove-portal-group-cache:
Patch removes the listener and replaces the logic to use raw ldap calls using the memberOf attribute of groups.
Patch could be fine tuned to even make it faster by iterating only over the groups.
TODO: make the ldap dn comparision case insensitive.
Comment 7 Erik Damrose univentionstaff 2020-08-21 16:44:39 CEST
While joining a DC Backup to a domain with 52000 users and 11400 groups it took more than an hour to initialize the portal_groups module

21.08.20 15:32:36.250  LISTENER    ( WARN    ) : initializing module portal_groups
21.08.20 16:37:21.806  LISTENER    ( WARN    ) : finished initializing module portal_groups with rv=0
Comment 8 Ingo Steuwer univentionstaff 2021-04-28 10:16:41 CEST
might be fixed with the new portal, needs check
Comment 9 Florian Best univentionstaff 2021-11-12 13:54:14 CET
(In reply to Ingo Steuwer from comment #8)
> might be fixed with the new portal, needs check

The latest code does:

 42 class PortalGroups(ListenerModuleHandler):
 43 »   def post_run(self):
 44 »   »   with self.as_root():
 45 »   »   »   subprocess.call(['/usr/sbin/univention-portal', 'update', '--reason', 'ldap:group'])

and nothing  else anymore.
So still a blocking call but only in postrun.
→ Way more faster.
Comment 10 Sönke Schwardt-Krummrich univentionstaff 2021-11-29 14:26:42 CET
(In reply to Erik Damrose from comment #7)
> While joining a DC Backup to a domain with 52000 users and 11400 groups it
> took more than an hour to initialize the portal_groups module
> 
> 21.08.20 15:32:36.250  LISTENER    ( WARN    ) : initializing module
> portal_groups
> 21.08.20 16:37:21.806  LISTENER    ( WARN    ) : finished initializing
> module portal_groups with rv=0

29.11.21 11:26:41.977  LISTENER    ( WARN    ) : initializing module portal_groups
29.11.21 14:16:15.668  LISTENER    ( WARN    ) : finished initializing module portal_groups with rv=0

91.000 users + 19.800 groups

(In reply to Florian Best from comment #9)
> (In reply to Ingo Steuwer from comment #8)
> > might be fixed with the new portal, needs check
> 
> The latest code does:
> 
>  42 class PortalGroups(ListenerModuleHandler):
>  43 »   def post_run(self):
>  44 »   »   with self.as_root():
>  45 »   »   »   subprocess.call(['/usr/sbin/univention-portal', 'update',
> '--reason', 'ldap:group'])
> 
> and nothing  else anymore.
> So still a blocking call but only in postrun.
> → Way more faster.

Btw: what happens if the Listener is shut down/restarted before the module's postrun has been executed?
Comment 11 Sönke Schwardt-Krummrich univentionstaff 2021-11-29 14:34:09 CET
(In reply to Sönke Schwardt-Krummrich from comment #10)
> 29.11.21 11:26:41.977  LISTENER    ( WARN    ) : initializing module
> portal_groups
> 29.11.21 14:16:15.668  LISTENER    ( WARN    ) : finished initializing
> module portal_groups with rv=0
> 
> 91.000 users + 19.800 groups

This was a DC slave with UCS@school 4.4 / UCS 4.4-8
Comment 12 Philipp Hahn univentionstaff 2021-11-29 16:43:15 CET
(In reply to Sönke Schwardt-Krummrich from comment #10)
> Btw: what happens if the Listener is shut down/restarted before the module's
> postrun has been executed?

UDL implements a "poor mans locking mechanism", e.g. it disables UNIX signals while modules are run; that way SIGTERM et.al. will be ignored until signals are re-enabled.

If you kill UDL anyway - for example via SIGKILL - the UDL cache will be in an half-initialized state, where *some objects* are already handled by *some of their modules* and others are not. As such the module will not be flagged as "fully initialized" in "/var/lib/univention-directory-listener/handlers/$name" file and UDL will just call them again for all objects. What happens then depends on the module implementation if it is idempotent and hand handle handling the objects a second time without "clean()" and/or "initialize()" being called explicitly before the re-run.