Bug 49817 - Check whether a resync is already in progress before starting another one
Check whether a resync is already in progress before starting another one
Status: NEW
Product: UCS
Classification: Unclassified
Component: Listener (univention-directory-listener)
UCS 5.0
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-07-05 13:57 CEST by Valentin Heidelberger
Modified: 2022-11-15 11:01 CET (History)
4 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 2: Improvement: Would be a product improvement
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.023
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Valentin Heidelberger univentionstaff 2019-07-05 13:57:49 CEST
univention-directory-listener-ctrl allows users to start multiple concurrent resyncs, which can cause problems. One problem that I can reproduce regularly is that the second resync runs forever but phahn hinted that their could be even worse problems if the second resync cancels the first in a bad moment.

Imho univention-directory-listener-ctrl should check whether a resync is already in progress when it's invoked to start one. I've built very simple logic doing that in the past to work around this problem. In that case I just set a UCR var when the resync was started and unset it when it was done. There are certainly more elegant ways to accomplish that.

The documentation also doesn't tell the user that concurrent resyncs are not a good idea/not supported by the listener.
https://docs.software-univention.de/manual-4.4.html#domain:listenernotifier:erroranalysis:reinit

I'll open a separate bug for that now but if it's decided to "just" fix this behaviour that bug can be closed.
Comment 2 Ingo Steuwer univentionstaff 2019-07-05 16:02:23 CEST
Some thoughts:

* this behavior is in UCS since the first release. I can't remember customers complaining about it.

* this is an expert tool for special "problem solving" use cases, and I think it's obvious that it should be "handled with care"

* I doubt that there is a simple solution: the resync process is an initialization of the listener plugin, which is individually implemented as part of the plugin. There are plugins that just write information to APIs / files which are processed "later". You can't fully control this.

I'd like to close this bug and keep the documentation.
Comment 3 Valentin Heidelberger univentionstaff 2019-07-05 16:09:24 CEST
(In reply to Ingo Steuwer from comment #2)
> * I doubt that there is a simple solution: the resync process is an
> initialization of the listener plugin, which is individually implemented as
> part of the plugin. There are plugins that just write information to APIs /
> files which are processed "later". You can't fully control this.
> 
> I'd like to close this bug and keep the documentation.

I see your point regarding external factors we have no control over (APIs, etc. spoken to by listener modules). I agree, we shouldn't try to control what happens there in a resync. What I'm currently thinking of changing is only what happens in the listener itself - if the listener module takes hours to complete after a resync and one breaks something by triggering a new resync after half an hour - that isn't our problem. However, if the listener itself runs into problems by triggering resyncs we should change it so that that can't happen.
Comment 5 Lukas Zumvorde univentionstaff 2022-11-14 13:49:38 CET
I just stumbled upon this issue at another one of our customers. I could reliably reproduce the issue not only in the customers environments but also in a stock UCS 4 or UCS 5 by running something like

root@ucs-2781:~# /usr/sbin/univention-directory-listener-ctrl resync faillog
listener shutdown done
root@ucs-2781:~# /usr/sbin/univention-directory-listener-ctrl resync faillog
Modules:
3       app_attributes  /usr/lib/univention-directory-listener/system/app_attributes.py
3       bind    /usr/lib/univention-directory-listener/system/bind.py
0       faillog /usr/lib/univention-directory-listener/system/faillog.py
3       gencertificate  /usr/lib/univention-directory-listener/system/gencertificate.py
3       hosteddomains   /usr/lib/univention-directory-listener/system/hosteddomains.py
3       keytab-member   /usr/lib/univention-directory-listener/system/keytab-member.py
3       keytab  /usr/lib/univention-directory-listener/system/keytab.py
3       ldap-cache-baa04df67e7af6bb0769f5cb7e72dba9     /usr/lib/univention-directory-listener/system/ldap-cache-baa04df67e7af6bb0769f5cb7e72dba9.py
3       ldap_extension  /usr/lib/univention-directory-listener/system/ldap_extension.py
3       ldap_server     /usr/lib/univention-directory-listener/system/ldap_server.py
3       license_uuid    /usr/lib/univention-directory-listener/system/license_uuid.py
3       monitoring-client       /usr/lib/univention-directory-listener/system/monitoring-client.py
3       nagios-client   /usr/lib/univention-directory-listener/system/nagios-client.py
3       nfs-homes       /usr/lib/univention-directory-listener/system/nfs-homes.py
3       nfs-shares      /usr/lib/univention-directory-listener/system/nfs-shares.py
3       nscd_update     /usr/lib/univention-directory-listener/system/nscd.py
3       nss     /usr/lib/univention-directory-listener/system/nss.py
3       pkgdb-watch     /usr/lib/univention-directory-listener/system/pkgdb-watch.py
3       portal_groups   /usr/lib/univention-directory-listener/system/portal_groups.py
3       portal_server   /usr/lib/univention-directory-listener/system/portal_server.py
3       quota   /usr/lib/univention-directory-listener/system/quota.py
3       udm_extension   /usr/lib/univention-directory-listener/system/udm_extension.py
3       umc-service-providers   /usr/lib/univention-directory-listener/system/umc-service-providers.py
3       univention-admin-diary-backend  /usr/lib/univention-directory-listener/system/univention-admin-diary-backend.py
3       univention-saml-groups  /usr/lib/univention-directory-listener/system/univention-saml-groups.py
3       univention-saml-idp-config      /usr/lib/univention-directory-listener/system/univention-saml-idp-config.py
3       univention-saml-servers /usr/lib/univention-directory-listener/system/univention-saml-servers.py
3       univention-saml-simplesamlphp-configuration     /usr/lib/univention-directory-listener/system/univention-saml-simplesamlphp-configuration.py
3       well-known-sid-name-mapping     /usr/lib/univention-directory-listener/system/well-known-sid-name-mapping.py

If there is little enough time between the resyncs (<2 seconds), we see what is shown above. I would be nice to remove this indeterminate behavior, even if it is an edge case.
Comment 6 Philipp Hahn univentionstaff 2022-11-15 11:01:07 CET
(In reply to Lukas Zumvorde from comment #5)
> I just stumbled upon this issue at another one of our customers. I could
> reliably reproduce the issue not only in the customers environments but also
> in a stock UCS 4 or UCS 5 by running something like
> 
> # /usr/sbin/univention-directory-listener-ctrl resync faillog
> # /usr/sbin/univention-directory-listener-ctrl resync faillog

I doubt that you can reproduce this with UCS-5.0 as within that version `resync` will never wait: We use `systemctl {stop,start} univention-directory-listener` there.
Up to UCS-4.4 we used `runit` which had some custom code to waited for the restart. Handling `runit` itself that way is already tricky, but handling multiple processes applying concurrent state change  to `runit` is calling for disaster. Basically what happens with a "resync" is this:
1. Stop UDL so we can fiddle in its innards
2. Remove a state file
3. Re-start UDL so it can recover from the now inconsistent state

1. alone is dangerous: While UDL protects itself from SIGINT/SIGTERM to get some kind of "atomic operations" `systemd` (as  SysV-init did already) will revert to SIGKILL if a process does not terminate in time (~90s). Whatever is happening than will get aborted and will leave UDL in a very inconsistent state: lucky you if that state is only in memory, catastrophic if some of that state reaches persistent disc.

> Q: Doctor, it hurts when I do that.
> A: Than don't do it!

1. Append a " DANGEROUS!" to listener-ctrl
2. Add a warning to the manual

Proposal https://git.knut.univention.de/univention/ucs/-/merge_requests/524