Bug 56585 - Fetchmail restart doesn't work
Fetchmail restart doesn't work
Status: NEW
Product: UCS
Classification: Unclassified
Component: Mail
UCS 5.0
Other Linux
: P5 normal (vote)
: ---
Assigned To: Mail maintainers
Mail maintainers
:
Depends on: 56586 56587
Blocks:
  Show dependency treegraph
 
Reported: 2023-09-14 06:33 CEST by Stefan Gohmann
Modified: 2023-11-13 08:46 CET (History)
2 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 4: A User would return the product
User Pain: 0.229
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support: Yes
Flags outvoted (downgraded) after PO Review:
Ticket number: 2023062721000197
Bug group (optional):
Max CVSS v3 score:
troeder: Patch_Available+


Attachments
fetchmail.py.patch (433 bytes, patch)
2023-09-21 07:36 CEST, Stefan Gohmann
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Gohmann univentionstaff 2023-09-14 06:33:28 CEST
The fetchmail process gets stuck hangs (or does not restart) every time you change a user object in the UMC.

Jun 27 14:14:45 server systemd[1]: fetchmail.service: Found left-over process 1680 (fetchmail) in control group while starting unit. Ignoring.
Jun 27 14:14:45 server systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jun 27 14:14:45 server systemd[1]: Starting LSB: init-Script for system wide fetchmail daemon...
Jun 27 14:14:45 server fetchmail[1680]: beendet mit Signal 15
Jun 27 14:14:45 server fetchmail[31072]: fetchmail already started; not starting. ... failed!
Jun 27 14:14:45 server systemd[1]: Started LSB: init-Script for system wide fetchmail daemon.


univention-app info
UCS: 5.0-4 errata794
Installed: adconnector=12.0 fetchmail=6.3.26 mailserver=12.0 open-xchange-guard=2.10.6-ucs1 open-xchange-text=7.10.6-ucs2 ox-connector=2.2.6 oxseforucs=7.10.6-ucs9

dpkg -l | grep fetchmail
ii  fetchmail                                           6.4.0~beta4-3+deb10u1A~5.0.0.202104091504         amd64        SSL enabled POP3, APOP, IMAP mail gatherer/forwarder
ii  univention-fetchmail                                13.0.6-5                                          all          UCS fetchmail integration for UDM
ii  univention-fetchmail-schema                         13.0.6-5                                          all          UCS schema package for univention-fetchmail

Dev Q&A card:
https://wekan.knut.univention.de/b/DSf93wFtTAyvGCW3u/development-q-and-a/fJRfYMmjeh4Zg8Suu
Comment 2 Daniel Tröder univentionstaff 2023-09-14 09:24:04 CEST
In the OTRS ticket the customer writes that after changing _any_ attribute of a user - not necessarily a fetchmail attribute - fetchmail stops polling emails, although the fetchmail process is still running.

First I thought: That seems highly unlikely, because the listener wouldn't trigger for just any change.

But then I took a look at the listener (fetchmailrc.py) and it has only an LDAP »filter='(objectClass=univentionFetchmail)'« but no »attribute=[...]«.

So it does indeed rewrite the fetchmail configuration after _any_ change to a user  with UDM fetchmail data.

That is not only inefficient but also potentially problematic.

That means, that in a time of change with high frequency (e.g. school import, school year change etc.), the fetchmailrc will be overwritten again and again.
Currently the writes are not atomic (create temp. file and do a mv), but the fetchmailrc is written to directly.
So it is possible, that during a restart of the daemon the file is in an inconsistent state.

It seems unlikely to me, that this is the problem of the referenced customer, but two changes should be done:

1. Add »attribute=[...]« to the listener, so it is only triggered, when a relevant LDAP attribute changes.
2. Make writes to the fetchmailrc atomic.

I will create separate bugs for that, because while those fixes will lower the probability of the occurrence of the customers problem, I done't think it will solve it.
Comment 3 Stefan Gohmann univentionstaff 2023-09-19 14:55:33 CEST
Do we have any workaround?
Comment 4 Daniel Tröder univentionstaff 2023-09-19 18:09:23 CEST
While the things listed in the previous comment should be fixed, it's very unlikely that they should cause a problem _every_ time.
It is more likely, that the customer's problem is with a broken configuration.

But to debug the customer's problem, they could run:

systemctl stop fetchmail.service

sudo -u fetchmail fetchmail --verbose --nodetach --nosyslog --fetchmailrc /etc/fetchmailrc

If there is a problem with the configuration, it should now be printed to the terminal.
Comment 5 Stefan Gohmann univentionstaff 2023-09-21 07:32:36 CEST
I don't see any configuration error. It looks more like a timing issue. If I change a description of a user the restart works in 50% of the cases.

I changed the listener script from the initscript to services. Afterwards I wasn't able to reproduce the problem:

--- fetchmailrc.py.orig	2023-09-21 07:10:24.843304181 +0200
+++ fetchmailrc.py	2023-09-21 07:10:53.623293111 +0200
@@ -237,6 +237,6 @@
     ud.debug(ud.LISTENER, ud.INFO, 'Restarting fetchmail-daemon')
     listener.setuid(0)
     try:
-        listener.run(initscript, ['fetchmail', 'restart'], uid=0)
+        listener.run('/usr/sbin/service', ['service', 'fetchmail', 'restart'], uid=0)
     finally:
         listener.unsetuid()
Comment 6 Stefan Gohmann univentionstaff 2023-09-21 07:36:17 CEST
Created attachment 11129 [details]
fetchmail.py.patch

Workaround:

patch -p0 -d /usr/lib/univention-directory-listener/system/ <fetchmail.py.patch 
service univention-directory-listener restart
Comment 7 Daniel Tröder univentionstaff 2023-09-21 09:22:04 CEST
Oh... looks like using an old-style init script directly, prevents systems from doing its thing, leading to "Found left-over process <pid> (<process>) in control group while starting unit."...
I wonder if we have more cases like this... Most calls have been converted to "systemctl restart <..>.service" or "service <..> restart", but some remain in UCS...
Comment 8 Daniel Tröder univentionstaff 2023-09-21 09:30:03 CEST
To the implementer: Just talked to Philipp about what our standards are in UCS5: When implementing, please modify the patch to use "systemctl" instead of "service". (And in Debian maintainer scripts (like postinst) use "deb-systemd-invoke".)
Comment 9 Stefan Gohmann univentionstaff 2023-09-22 16:24:35 CEST
(In reply to Stefan Gohmann from comment #6)
> Created attachment 11129 [details]
> fetchmail.py.patch
> 
> Workaround:
> 
> patch -p0 -d /usr/lib/univention-directory-listener/system/
> <fetchmail.py.patch 
> service univention-directory-listener restart

Unfortunately, it didn't work. So next try is to change the listener script in the following way:

        # listener.run(initscript, ['fetchmail', 'restart'], uid=0)
        listener.run('/usr/sbin/service', ['service', 'fetchmail', 'stop'], uid=0)
        time.sleep(5)
        listener.run('/usr/sbin/service', ['service', 'fetchmail', 'start'], uid=0)
Comment 10 Stefan Gohmann univentionstaff 2023-10-02 11:13:23 CEST
(In reply to Stefan Gohmann from comment #9)
> (In reply to Stefan Gohmann from comment #6)
> > Created attachment 11129 [details]
> > fetchmail.py.patch
> > 
> > Workaround:
> > 
> > patch -p0 -d /usr/lib/univention-directory-listener/system/
> > <fetchmail.py.patch 
> > service univention-directory-listener restart
> 
> Unfortunately, it didn't work. So next try is to change the listener script
> in the following way:
> 
>         # listener.run(initscript, ['fetchmail', 'restart'], uid=0)
>         listener.run('/usr/sbin/service', ['service', 'fetchmail', 'stop'],
> uid=0)
>         time.sleep(5)
>         listener.run('/usr/sbin/service', ['service', 'fetchmail', 'start'],
> uid=0)

The partner / customer gave feedback that it works now.