Bug 50123 - Separate LDAP sync from Join Process
Separate LDAP sync from Join Process
Status: RESOLVED WONTFIX
Product: UCS
Classification: Unclassified
Component: Join (univention-join)
UCS 4.3
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
https://help.univention.com/t/how-to-...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-09-06 11:15 CEST by Christian Völker
Modified: 2021-05-14 16:38 CEST (History)
8 users (show)

See Also:
What kind of report is it?: Feature Request
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2019082721000611
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Völker univentionstaff 2019-09-06 11:15:03 CEST
In support we very frequently have customers complaining about the very long join times. Taking longer than a day could happen.

But even "shorter" times of five to six hours are very common and usually still to long!

There is an article related to speed up join (see URL) this reduced on some locations join times from above 24 hours to 5 hours. But again: this is still to long!

Request here is to separate the LDAP synchronization from the join process. Thus, the join could be much faster and the system would be (somehow) operational while the sync happens in background.

Could be solved by forwarding all local requests to the slapd by a forwarding rule of packetfilter. Or whatever idea might be possible.

Just please reduce join process especially in large environments (and getting more an more common).
Comment 1 Felix Botner univentionstaff 2019-09-09 09:57:32 CEST
Join without LDAP sync makes no sense in my opinion, but we can definitely speed up the process, e.g. remove the slapindex stuff on non-master systems (see also Bug #49822 and #49821)
Comment 2 Christian Völker univentionstaff 2019-09-11 11:43:40 CEST
I do not oversee the suggested changes with slapindex.

I would suggest to have a target join time of below an hour even in larger environments.

If this helps, it is fine.

Otherwise there should be further improvements.

Mine was a suggestion knowing the sync of the ldap data is a very common slow-downer.
Comment 3 Felix Botner univentionstaff 2019-09-11 11:52:07 CEST
(In reply to Christian Völker from comment #2)
> I do not oversee the suggested changes with slapindex.
> 
> I would suggest to have a target join time of below an hour even in larger
> environments.
> 
> If this helps, it is fine.
> 
> Otherwise there should be further improvements.
> 
> Mine was a suggestion knowing the sync of the ldap data is a very common
> slow-downer.

The join process IS the LDAP sync (to some extend) and the time depends on the size of the LDAP database, so "a target join time " is hard (impossible) to achieve
Comment 4 Christian Völker univentionstaff 2019-09-11 12:04:01 CEST
Well, and this is the reason I suggested to separate these two processes.

Whatever performance tweaks we try: in large environments (as getting more and more common!) it will take hours due to ldap sync.

Joining involves a couple of steps which might fail for whatever reason. Customer has to retry again. Failing again... and so on. And on every try he has to wait for the ldap sync.

Separating these two we could easily and very quick identify and troubleshoot the issues in joining. 

And afterwards start syncing.

So the failing joins would not slow down extremely as it happens currently.
Comment 5 Arvid Requate univentionstaff 2020-06-10 19:21:48 CEST
I guess Felix is referring to the execution of the listener modules.

* E.g. we have the ucs_registerLDAPExtension calls in some joinscripts
  and they wait for the registered UDM module to become available
  to use on the joining system (on the master). That's how it's
  done currently, but I guess we could change that to not need to
  wait for LDAP replication to the local system.

* We do UDM and LDAP-searches in the joinscripts. I guess they
  use ldap/server/name etc. for reading. I guess we could change
  that to use ldap/master until the end of the join.

* If we would background the LDAP replication to make univention-join
  finish earlier, then some of the configuration settings that
  are performed by listener modules may not be available yet
  when univention-join is "finished" running the joinscripts.
  In that case I could imagine to communicate to the admin
  that the joinscripts processed fine and replication proceeds
  and univention-join shows a nice progress bar and the admin
  can go home or enjoy it.
Comment 6 Sönke Schwardt-Krummrich univentionstaff 2020-06-10 22:18:00 CEST
(In reply to Christian Völker from comment #4)
> Separating these two we could easily and very quick identify and
> troubleshoot the issues in joining. 

The 2 subtasks of the join process are "establishing trust context" and "executing the join scripts". 
When the second task is completed, LDAP is initialized first.If the join process should break off during the execution of the join scripts, it should actually be possible (starting from a certain join script) to continue the execution of the join scripts with univention-run-join-scripts without having to restart the join process completely.
We should check from which joinscript this is the case.

There are also some other bugs, that suggest to perform the initial LDAP replication without safety net (no sync to harddisk after each LDAP change or during slapindex). This should shorten the initial LDAP replication massively on systems with poor I/O performance.
Comment 7 Felix Botner univentionstaff 2020-06-11 10:39:49 CEST
I think the motivation for this bug here is the long running time for the join process, and yes i can now see the why a join process with several phases can be helpful, but

See also Bug #49822 and Bug #49821

Currently we 
 * we replicate the ldap database
 * call slapindex again in a join scropt
 * call slapindex a second time

i think in terms of running time, this is the showstopper (a big ldap database takes a long time to index, and we double that)

Currently we have to do this, because the slapd index configuration is done/changed in the join scripts, but that can be changed ;-) (cn=config in ldap, get ucr index configuration from master before join, ... whatever) and i think and extra slapindex (even one) is not necessary

so in my opinion, before anything else we should consider changing this slapindex stuff
Comment 8 Christian Völker univentionstaff 2020-06-11 13:40:42 CEST
(In reply to Sönke Schwardt-Krummrich from comment #6)
> If the join
> process should break off during the execution of the join scripts, it should
> actually be possible (starting from a certain join script) to continue the
> execution of the join scripts with univention-run-join-scripts without
> having to restart the join process completely.
> We should check from which joinscript this is the case.

WE could do, indeed. But most customers would not! They will start the join again and again after having fixed some other things. And wait every time for the initial LDAP sync.

For a broader view just a thought:
How about removing the need to assign a role before joining? If we would always join as a member there is not need for LDAP sync. And once LDAP sync (after join) is done the member server could dynamically switch to a slave or backup server....
Comment 9 Arvid Requate univentionstaff 2020-06-11 14:24:23 CEST
> WE could do, indeed. But most customers would not! They will start the join again and again after having fixed some other things. And wait every time for the initial LDAP sync.

Ok, so this could be addressed by:
1) creating a knowledge base article recommending this
2) adjusting univention-join to recommend running univention-run-join-scripts in case of an error (as the last lines of the output)


> For a broader view just a thought:
> How about removing the need to assign a role before joining? If we would always join as a member there is not need for LDAP sync. And once LDAP sync (after join) is done the member server could dynamically switch to a slave or backup server....

I think we need a separate bug for this (nice) proposal of a conceptual change.
Or we use this bug for conceptual brainstorming if you like and split off the concrete action items into separate bugs, otherwise we cannot handle it properly in development.

On the topic of your question:

* That proposal touches the vision of making roles more flexibly assignable, like services (IIRC Ingo has this in his vision document).
* It would basically separate, as Sönke said, the initial step of machine enrollment from the join of individual services (like LDAP).
* If we are still talking in terms of roles (Memberserver/Backup) then we would need to establish support for switching system roles,
  that's something UDM currently doesn't support (AFAIK) and the whole usage flow would need to be defined and implemented.
* The backup2master script may give an idea what may be invloved technically.
* A member2other may be simpler, but I don't have time to think that through right now.
* We would probably still want to improve the timing of the univention-ldap-server joinscript (or run-join-scripts in general)
Comment 10 Philipp Hahn univentionstaff 2020-06-11 14:46:37 CEST
Please read Bug #486270 which lists many more issues with `univention-join`.
Comment 11 Sönke Schwardt-Krummrich univentionstaff 2020-06-14 16:49:26 CEST
(In reply to Philipp Hahn from comment #10)
> Please read Bug #486270 which lists many more issues with `univention-join`.

Small typo: Bug #48627 was meant
Comment 13 Ingo Steuwer univentionstaff 2020-07-27 16:26:42 CEST
notes from a short discussion:

- main issue here is that a LDAP initialization takes very long in big environments
- sort of a "split" between LDAP initialization and other join scripts is already possible with the "univention-run-joinscripts" command. In case a script fails, the join can be continued for debugging using this tool
Comment 14 Ingo Steuwer univentionstaff 2021-05-14 16:38:01 CEST
This issue has been filed against UCS 4.3.

UCS 4.3 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed.

If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.