Univention Bugzilla – Bug 56423
Inconsistent diskcache in id-connector stops replication
Last modified: 2023-09-13 16:25:16 CEST
If the diskcache in the id-connector is inconsistent, e.g. due to power outage, items from the cache cannot be parsed. In the queues.log: 2023-08-11 15:13:20 ERROR [InQueue.distribute_loop:356] During preprocessing of file (1/51132) '2023-08-10-17-43-59-023977.json': Ran out of input Traceback (most recent call last): File "/ucsschool-id-connector/src/ucsschool_id_connector/queues.py", line 327, in distribute_loop new_path = await self.preprocess_file(path) File "/ucsschool-id-connector/src/ucsschool_id_connector/queues.py", line 288, in preprocess_file changed |= any([await coro for coro in result_coros]) File "/ucsschool-id-connector/src/ucsschool_id_connector/queues.py", line 288, in <listcomp> changed |= any([await coro for coro in result_coros]) File "/ucsschool-id-connector/src/plugins/plugins/listener_objects.py", line 172, in preprocess_add_mod_object obj.old_data = self.get_old_data(obj) File "/ucsschool-id-connector/src/plugins/plugins/listener_objects.py", line 139, in get_old_data return self.old_data_db.get(obj.id) File "/ucsschool-id-connector/src/ucsschool_id_connector/db.py", line 82, in get value = self._cache.get(key, default, *args, **kwargs) File "/usr/lib/python3.8/site-packages/diskcache/core.py", line 1173, in get value = self._disk.fetch(mode, filename, db_value, read) File "/usr/lib/python3.8/site-packages/diskcache/core.py", line 282, in fetch return pickle.load(reader) EOFError: Ran out of input Each time the ldap object with the inconsistent cache entry is processed, the InQueue stops. As an Administrator, you can move the JSON file mentioned in the logfile, but the Problem might reoccur if the object is modified again. To me it is not obvious how the situation should be handled. The old object in the cache is invalid. Should we simply remove the corrupt cache entry? Should we write the current new object to the cache? How will id connector plugin(s) logic deal with each solution? As a workaround, the corrupt entries can be removed from the cache: # univention-app shell ucsschool-id-connector # python3 from typing import Any, Dict, Optional, Type, Union from ucsschool_id_connector.db import OldDataDB from ucsschool_id_connector.models import ListenerUserOldDataEntry from ucsschool_id_connector.constants import OLD_DATA_DB_PATH import diskcache odb=OldDataDB(OLD_DATA_DB_PATH, Type[ListenerUserOldDataEntry]) odb._cache.volume() # Sanity Check: Same size as the filesystems /var/lib/univention-appcenter/apps/ucsschool-id-connector/data/old_data_db/cache.db for key in odb._cache: try: entry=odb._cache.get(key) except EOFError as e: print(key) # del odb._cache[key] # if the corrupt entries should be deleted odb._cache.close()
Corrupted "old data" is problematic. Move and remove operations cannot be performed safely without it, as the old state of the object is unknown. 1. IMHO we should modify the documentation to inform operators of this, so they can employ a backup & restore strategy. 2. A cron job may also be installed, automatically creating daily backups to /var/backup in the host. 3. The code should be made more robust to handle this kind of situation, by removing the inconsistent entry and doing a "resync" of the affected object. It should inform the operator with an ERROR message, that the object may not be synced correctly anymore (incomplete (re)moves).