The fetchmail process gets stuck hangs (or does not restart) every time you change a user object in the UMC. Jun 27 14:14:45 server systemd[1]: fetchmail.service: Found left-over process 1680 (fetchmail) in control group while starting unit. Ignoring. Jun 27 14:14:45 server systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Jun 27 14:14:45 server systemd[1]: Starting LSB: init-Script for system wide fetchmail daemon... Jun 27 14:14:45 server fetchmail[1680]: beendet mit Signal 15 Jun 27 14:14:45 server fetchmail[31072]: fetchmail already started; not starting. ... failed! Jun 27 14:14:45 server systemd[1]: Started LSB: init-Script for system wide fetchmail daemon. univention-app info UCS: 5.0-4 errata794 Installed: adconnector=12.0 fetchmail=6.3.26 mailserver=12.0 open-xchange-guard=2.10.6-ucs1 open-xchange-text=7.10.6-ucs2 ox-connector=2.2.6 oxseforucs=7.10.6-ucs9 dpkg -l | grep fetchmail ii fetchmail 6.4.0~beta4-3+deb10u1A~5.0.0.202104091504 amd64 SSL enabled POP3, APOP, IMAP mail gatherer/forwarder ii univention-fetchmail 13.0.6-5 all UCS fetchmail integration for UDM ii univention-fetchmail-schema 13.0.6-5 all UCS schema package for univention-fetchmail Dev Q&A card: https://wekan.knut.univention.de/b/DSf93wFtTAyvGCW3u/development-q-and-a/fJRfYMmjeh4Zg8Suu
In the OTRS ticket the customer writes that after changing _any_ attribute of a user - not necessarily a fetchmail attribute - fetchmail stops polling emails, although the fetchmail process is still running. First I thought: That seems highly unlikely, because the listener wouldn't trigger for just any change. But then I took a look at the listener (fetchmailrc.py) and it has only an LDAP »filter='(objectClass=univentionFetchmail)'« but no »attribute=[...]«. So it does indeed rewrite the fetchmail configuration after _any_ change to a user with UDM fetchmail data. That is not only inefficient but also potentially problematic. That means, that in a time of change with high frequency (e.g. school import, school year change etc.), the fetchmailrc will be overwritten again and again. Currently the writes are not atomic (create temp. file and do a mv), but the fetchmailrc is written to directly. So it is possible, that during a restart of the daemon the file is in an inconsistent state. It seems unlikely to me, that this is the problem of the referenced customer, but two changes should be done: 1. Add »attribute=[...]« to the listener, so it is only triggered, when a relevant LDAP attribute changes. 2. Make writes to the fetchmailrc atomic. I will create separate bugs for that, because while those fixes will lower the probability of the occurrence of the customers problem, I done't think it will solve it.
Do we have any workaround?
While the things listed in the previous comment should be fixed, it's very unlikely that they should cause a problem _every_ time. It is more likely, that the customer's problem is with a broken configuration. But to debug the customer's problem, they could run: systemctl stop fetchmail.service sudo -u fetchmail fetchmail --verbose --nodetach --nosyslog --fetchmailrc /etc/fetchmailrc If there is a problem with the configuration, it should now be printed to the terminal.
I don't see any configuration error. It looks more like a timing issue. If I change a description of a user the restart works in 50% of the cases. I changed the listener script from the initscript to services. Afterwards I wasn't able to reproduce the problem: --- fetchmailrc.py.orig 2023-09-21 07:10:24.843304181 +0200 +++ fetchmailrc.py 2023-09-21 07:10:53.623293111 +0200 @@ -237,6 +237,6 @@ ud.debug(ud.LISTENER, ud.INFO, 'Restarting fetchmail-daemon') listener.setuid(0) try: - listener.run(initscript, ['fetchmail', 'restart'], uid=0) + listener.run('/usr/sbin/service', ['service', 'fetchmail', 'restart'], uid=0) finally: listener.unsetuid()
Created attachment 11129 [details] fetchmail.py.patch Workaround: patch -p0 -d /usr/lib/univention-directory-listener/system/ <fetchmail.py.patch service univention-directory-listener restart
Oh... looks like using an old-style init script directly, prevents systems from doing its thing, leading to "Found left-over process <pid> (<process>) in control group while starting unit."... I wonder if we have more cases like this... Most calls have been converted to "systemctl restart <..>.service" or "service <..> restart", but some remain in UCS...
To the implementer: Just talked to Philipp about what our standards are in UCS5: When implementing, please modify the patch to use "systemctl" instead of "service". (And in Debian maintainer scripts (like postinst) use "deb-systemd-invoke".)
(In reply to Stefan Gohmann from comment #6) > Created attachment 11129 [details] > fetchmail.py.patch > > Workaround: > > patch -p0 -d /usr/lib/univention-directory-listener/system/ > <fetchmail.py.patch > service univention-directory-listener restart Unfortunately, it didn't work. So next try is to change the listener script in the following way: # listener.run(initscript, ['fetchmail', 'restart'], uid=0) listener.run('/usr/sbin/service', ['service', 'fetchmail', 'stop'], uid=0) time.sleep(5) listener.run('/usr/sbin/service', ['service', 'fetchmail', 'start'], uid=0)
(In reply to Stefan Gohmann from comment #9) > (In reply to Stefan Gohmann from comment #6) > > Created attachment 11129 [details] > > fetchmail.py.patch > > > > Workaround: > > > > patch -p0 -d /usr/lib/univention-directory-listener/system/ > > <fetchmail.py.patch > > service univention-directory-listener restart > > Unfortunately, it didn't work. So next try is to change the listener script > in the following way: > > # listener.run(initscript, ['fetchmail', 'restart'], uid=0) > listener.run('/usr/sbin/service', ['service', 'fetchmail', 'stop'], > uid=0) > time.sleep(5) > listener.run('/usr/sbin/service', ['service', 'fetchmail', 'start'], > uid=0) The partner / customer gave feedback that it works now.
To summarize the above, things changed in the mean time: > 1. Add »attribute=[...]« to the listener, so it is only triggered, when a relevant LDAP attribute changes. > 2. Make writes to the fetchmailrc atomic. This has been done via * Bug #56586 already adjusted the fetchmail listener so it only runs for changes of specific relevant attributes * Bug #56587 made the way the fetchmailrc is generated more robust against race conditions (which we should not have here) Regarding the question if calling /etc/init.d/fetchmail could be worse that "systemctl restart fetchmail.service": * The /var/run/systemd/generator.late/fetchmail.service is automatically generated for /etc/init.d/fetchmail by systemd-sysv-generator * "/etc/init.d/fetchmail restart" and "systemctl restart fetchmail" do the same thing via /lib/lsb/init-functions.d/40-systemd * Comment 9 already says that the theory of Comment 6 didn't work out * univention-fetchmail could attempt to ship a dedicated fetchmail.service, but if upstream doesn't, why should we? That /etc/init.d/fetchmail is from the fetchmail upstream package (version 6.4.0~beta4-3+deb10u1 in UCS 5.0-x and 6.4.37-1 since UCS 5.2-0). So either we are lucky an that fixed something (still didn't find the debian maintainer repo to check the git history) or we may want to follow the advice of Comment 10, just to be sure. But the tile.sleep(5) is not so great - if every module does things like this, we kill (synchronous) replication performance.
Ah Julia found https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=981464 which shows that /usr/lib/systemd/user/fetchmail.service exists. But it's a user unit, not sure yet if/how we can use that.
[5.0-10] b01c5692c6d | fix(univention-fetchmail): manage fetchmail service via systemd instead of using SysV init script Package: univention-fetchmail Version: 13.0.12-2 Release: errata5.0-10 Scope: errata5.0-10
Package: univention-fetchmail Version: 13.0.12-3 Release: errata5.0-10 Scope: errata5.0-10 QA: OK: switch to systemd instead of SysV init script OK: advisories OK: manual tests - fetchmail no longer enters the described failed state
<https://errata.software-univention.de/#/?erratum=5.0x1314>