Univention Bugzilla – Bug 49817
Check whether a resync is already in progress before starting another one
Last modified: 2022-11-15 11:01:07 CET
univention-directory-listener-ctrl allows users to start multiple concurrent resyncs, which can cause problems. One problem that I can reproduce regularly is that the second resync runs forever but phahn hinted that their could be even worse problems if the second resync cancels the first in a bad moment. Imho univention-directory-listener-ctrl should check whether a resync is already in progress when it's invoked to start one. I've built very simple logic doing that in the past to work around this problem. In that case I just set a UCR var when the resync was started and unset it when it was done. There are certainly more elegant ways to accomplish that. The documentation also doesn't tell the user that concurrent resyncs are not a good idea/not supported by the listener. https://docs.software-univention.de/manual-4.4.html#domain:listenernotifier:erroranalysis:reinit I'll open a separate bug for that now but if it's decided to "just" fix this behaviour that bug can be closed.
Some thoughts: * this behavior is in UCS since the first release. I can't remember customers complaining about it. * this is an expert tool for special "problem solving" use cases, and I think it's obvious that it should be "handled with care" * I doubt that there is a simple solution: the resync process is an initialization of the listener plugin, which is individually implemented as part of the plugin. There are plugins that just write information to APIs / files which are processed "later". You can't fully control this. I'd like to close this bug and keep the documentation.
(In reply to Ingo Steuwer from comment #2) > * I doubt that there is a simple solution: the resync process is an > initialization of the listener plugin, which is individually implemented as > part of the plugin. There are plugins that just write information to APIs / > files which are processed "later". You can't fully control this. > > I'd like to close this bug and keep the documentation. I see your point regarding external factors we have no control over (APIs, etc. spoken to by listener modules). I agree, we shouldn't try to control what happens there in a resync. What I'm currently thinking of changing is only what happens in the listener itself - if the listener module takes hours to complete after a resync and one breaks something by triggering a new resync after half an hour - that isn't our problem. However, if the listener itself runs into problems by triggering resyncs we should change it so that that can't happen.
I just stumbled upon this issue at another one of our customers. I could reliably reproduce the issue not only in the customers environments but also in a stock UCS 4 or UCS 5 by running something like root@ucs-2781:~# /usr/sbin/univention-directory-listener-ctrl resync faillog listener shutdown done root@ucs-2781:~# /usr/sbin/univention-directory-listener-ctrl resync faillog Modules: 3 app_attributes /usr/lib/univention-directory-listener/system/app_attributes.py 3 bind /usr/lib/univention-directory-listener/system/bind.py 0 faillog /usr/lib/univention-directory-listener/system/faillog.py 3 gencertificate /usr/lib/univention-directory-listener/system/gencertificate.py 3 hosteddomains /usr/lib/univention-directory-listener/system/hosteddomains.py 3 keytab-member /usr/lib/univention-directory-listener/system/keytab-member.py 3 keytab /usr/lib/univention-directory-listener/system/keytab.py 3 ldap-cache-baa04df67e7af6bb0769f5cb7e72dba9 /usr/lib/univention-directory-listener/system/ldap-cache-baa04df67e7af6bb0769f5cb7e72dba9.py 3 ldap_extension /usr/lib/univention-directory-listener/system/ldap_extension.py 3 ldap_server /usr/lib/univention-directory-listener/system/ldap_server.py 3 license_uuid /usr/lib/univention-directory-listener/system/license_uuid.py 3 monitoring-client /usr/lib/univention-directory-listener/system/monitoring-client.py 3 nagios-client /usr/lib/univention-directory-listener/system/nagios-client.py 3 nfs-homes /usr/lib/univention-directory-listener/system/nfs-homes.py 3 nfs-shares /usr/lib/univention-directory-listener/system/nfs-shares.py 3 nscd_update /usr/lib/univention-directory-listener/system/nscd.py 3 nss /usr/lib/univention-directory-listener/system/nss.py 3 pkgdb-watch /usr/lib/univention-directory-listener/system/pkgdb-watch.py 3 portal_groups /usr/lib/univention-directory-listener/system/portal_groups.py 3 portal_server /usr/lib/univention-directory-listener/system/portal_server.py 3 quota /usr/lib/univention-directory-listener/system/quota.py 3 udm_extension /usr/lib/univention-directory-listener/system/udm_extension.py 3 umc-service-providers /usr/lib/univention-directory-listener/system/umc-service-providers.py 3 univention-admin-diary-backend /usr/lib/univention-directory-listener/system/univention-admin-diary-backend.py 3 univention-saml-groups /usr/lib/univention-directory-listener/system/univention-saml-groups.py 3 univention-saml-idp-config /usr/lib/univention-directory-listener/system/univention-saml-idp-config.py 3 univention-saml-servers /usr/lib/univention-directory-listener/system/univention-saml-servers.py 3 univention-saml-simplesamlphp-configuration /usr/lib/univention-directory-listener/system/univention-saml-simplesamlphp-configuration.py 3 well-known-sid-name-mapping /usr/lib/univention-directory-listener/system/well-known-sid-name-mapping.py If there is little enough time between the resyncs (<2 seconds), we see what is shown above. I would be nice to remove this indeterminate behavior, even if it is an edge case.
(In reply to Lukas Zumvorde from comment #5) > I just stumbled upon this issue at another one of our customers. I could > reliably reproduce the issue not only in the customers environments but also > in a stock UCS 4 or UCS 5 by running something like > > # /usr/sbin/univention-directory-listener-ctrl resync faillog > # /usr/sbin/univention-directory-listener-ctrl resync faillog I doubt that you can reproduce this with UCS-5.0 as within that version `resync` will never wait: We use `systemctl {stop,start} univention-directory-listener` there. Up to UCS-4.4 we used `runit` which had some custom code to waited for the restart. Handling `runit` itself that way is already tricky, but handling multiple processes applying concurrent state change to `runit` is calling for disaster. Basically what happens with a "resync" is this: 1. Stop UDL so we can fiddle in its innards 2. Remove a state file 3. Re-start UDL so it can recover from the now inconsistent state 1. alone is dangerous: While UDL protects itself from SIGINT/SIGTERM to get some kind of "atomic operations" `systemd` (as SysV-init did already) will revert to SIGKILL if a process does not terminate in time (~90s). Whatever is happening than will get aborted and will leave UDL in a very inconsistent state: lucky you if that state is only in memory, catastrophic if some of that state reaches persistent disc. > Q: Doctor, it hurts when I do that. > A: Than don't do it! 1. Append a " DANGEROUS!" to listener-ctrl 2. Add a warning to the manual Proposal https://git.knut.univention.de/univention/ucs/-/merge_requests/524