Univention Bugzilla – Bug 50660
replication.py: UDL not stopped on disk full
Last modified: 2022-03-16 14:33:02 CET
/usr/lib/univention-directory-listener/system/replication.py # check_file_system_space check the free disk space and stopps UDL otherwise: > 775 listener.run('/etc/init.d/univention-directory-listener', ['univention-directory-listener', 'stop'], uid=0, wait=True) Before doing this the module tries to send an email: > 770 s = smtplib.SMTP() > 771 s.connect() > 772 s.sendmail(sender, [recipient], msg.as_string()) > 773 s.close() It the MTA is not available, the module crashes and UDL is *not* stopped: >18.12.19 16:53:19.833 LISTENER ( ERROR ) : replication: Critical disk space. The Univention LDAP Listener was stopped Traceback (most recent call last): > File "/usr/lib/univention-directory-listener/system/replication.py", line 783, in handler > check_file_system_space() > File "/usr/lib/univention-directory-listener/system/replication.py", line 771, in check_file_system_space > s.connect() > File "/usr/lib/python2.7/smtplib.py", line 316, in connect > self.sock = self._get_socket(host, port, self.timeout) > File "/usr/lib/python2.7/smtplib.py", line 291, in _get_socket > return socket.create_connection((host, port), timeout) > File "/usr/lib/python2.7/socket.py", line 575, in create_connection > raise err >socket.error: [Errno 111] Connection refused >18.12.19 16:53:19.834 LISTENER ( WARN ) : handler: replication (failed) This can be reproduced easily by ucr set ldap/replication/filesystem/check=true ldap/replication/filesystem/limit=40187996 /etc/init.d/postfix stop /etc/init.d/univention-directory-listener restart udm users/user modify --dn "uid=Administrator,cn=users,$(ucr get ldap/base)" --set description="$(date)" tail -f /var/log/univention/listener.log Also there are 2 UCRVs: > # ucr search listener/freespace ldap/replication/filesystem/limit > ldap/replication/filesystem/limit: 40187996 > This variable configures the lower limit for free space in the directory '/var/lib/univention-ldap/', when replication will be stopped. Default is 10 [MiB]. This is implemented in `replication.py`. The module only exists on Backups and Slaves. The module is always executed first. If the check triggers, UDL is stopped via `service stop udl`. > listener/freespace: 10 > This variable configures the lower limit for free space in the directories '/var/lib/univention-ldap/' and '/var/lib/univention-directory-listener/', when the Listener will be stopped. Default is 10 [MiB]. This is implemented in the main loop of UDL. This is also checked on Master and Member. If the check fails, UDL abort()s: management/univention-directory-listener/src/notifier.c > 89 »···»···abort(); But is restarted by `systemd` in an endless-loop and fills the remaining disk space with error messages. If both limits are set to the same value, both implementations will case. For the customer the logfile contains the beginning of the Traceback, but it is overwritten in the middle by the next incarnation of UDL: > 16.12.19 10:05:44.816 LISTENER ( ERROR ) : replication: Critical disk space. The Univention LDAP Listener was stopped > Traceback (most recent call last): > File "/usr/lib/univention-directory-listener/system/replication.py", line 783, in handler > check_file_system_space() > File "/usr/lib/univention-directory16.12.19 10:05:51.019 DEBUG_INIT Tasks: 1. Catch exception in replication.py to always stop UDL 2. Or remove code from replication.py completely now that UDL has a check itself 3. Make systemd not restart UDL in that case; either by UDL stopping itself or by using RestartPreventExitStatus= or by ... 4. Document to set listener/freespace << ldap/replication/filesystem/limit
Patch in git:phahn/replication
I had a similar problem during product testing for ucs5. Just not with replication.py but with the s4 connector. The listener module "s4-connector.py" was writing the changes faster to "/var/lib/univention-connector/s4" than the connector was processing them. This filled up all space on the drive. While the check for free space in the main loop worked, systemd restarted the service and the little space left was used up with log messages. As a side note: 10MiB as the default value feels a bit low? Otherwise I would have hoped the s4 connector could have freed up some space while the listener was stopped.
*** Bug 53413 has been marked as a duplicate of this bug. ***