Bug 51772 - make error handling of corrupted files in directory logger more robust
make error handling of corrupted files in directory logger more robust
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 4.4
Other Linux
: P5 normal (vote)
: UCS 5.0-2-errata
Assigned To: Julia Bremer
Dirk Wiesenthal
https://git.knut.univention.de/univen...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-08-05 13:32 CEST by Daniel Tröder
Modified: 2022-09-29 12:38 CEST (History)
13 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 4: A User would return the product
User Pain: 0.114
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2022022121000261
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Tröder univentionstaff 2020-08-05 13:32:56 CEST
When the directory logger listener module encounters an empty file, it crashes.
When it starts the next time (which happens on *every* LDAP modification) it crashes again, when reading the same file.
As this keep happening, the number files to read (created by the OpenLDAP overlay) grows, as they are never consumed and deleted.
After a while there are a lot of files (in a customers case 500.000). Such a large directory causes a lot of I/O when read.
When the empty file is removed, the listener module will delete all other files and _not_ log them, as the current DN does not match that of the opened file. The information is lost.
An empty or otherwise corrupt file can for example be created, when the disk is full.

The concrete error that happend on a customer system, when opening an empty file was:
-----------------
Traceback (most recent call last):
  File "/usr/lib/univention-directory-listener/system/directory_logger.py", line 205, in handler
    (timestamp, dellog_id, modifier, action) = process_dellog(dn)
  File "/usr/lib/univention-directory-listener/system/directory_logger.py", line 121, in process_dellog
    (dellog_stamp, dellog_id, dellog_dn, modifier, action) = [line.rstrip() for line in f]
ValueError: need more than 0 values to unpack
------------------

1. Make the directory logger listener module more robust in handling errors.
2. Consider the possibility to recover from errors, and log at least incomplete information (if the complete info is not available anymore).
Comment 1 Ingo Steuwer univentionstaff 2020-08-05 15:14:53 CEST
I tried to make the subject more specific.
Comment 4 Dirk Schnick univentionstaff 2022-02-24 17:20:22 CET
This happened again in a customer environment. Unfortunately, with exactly the deletion that the customer wanted to track.
Ticket attached and customer added

I changes What type to 5 as the directory logger should log all LDAP actions; if this happens the key scenario is broken. We break our promise with that app.
Also I have changed the feel of the customer, as the worst case happened. The action the customer needed to track was running into that bug. I was able to solve it but we should fix that.
Comment 7 Èric Monné Mesalles univentionstaff 2022-05-11 12:59:48 CEST
Added a verification of the format of the log files. I've the file is corrupted, the service will produce a log but not parse it and avoiding crashing during the parsing of the corrupted files. 
Fixed on version `11.0.3-2A` of the univention-directory-logger package 
see commit 768df113
Comment 8 Dirk Wiesenthal univentionstaff 2022-06-06 01:15:18 CEST
From my understanding, the bug is two fold:

1. A corrupt file raises a traceback, thus the handling of that operation is aborted
2. Cleaning up the file will result in the software deleting all but one of the not-yet-logged delete operations that are stored in the "correct" files

The patch proposed solves the first issue. But the second issue remains.

I suggest the following:

1. If a ValueError occurs, do not simply "continue". Log the filename. Delete the file
2. If dellog == dn: delete the file. Otherwise: Don't! (except for ValueError in 1.)

Now, why are there correct files that would get deleted in the first place? They exist because the dellog overlay module of OpenLDAP creates them and the univention-directory-logger did not consume them. This is due to the ValueError that shall be handled in this bug. But it could be something else in the future: Whenever we have an error in the handler() function, those files may not be processed. Any future call will not be able to handle them because this call is not meant for that file from the past.

The univention-directory-logger is a listener module. And listener modules only get one shot at every LDAP operation. Either they succeed or they fail.

I therefore suggest to somehow keep these unprocessed files, not delete them. One idea:

Iterate over all the files and rename them to pathname + ".fail" in the postrun() of the listener: https://docs.software-univention.de/developer-reference-5.0.html#udl:mod:postrun
(Please also log that)

But make sure to not process .fail files before _parse_dellog_file:

if pathname.endswith(".fail"): continue
Comment 9 Dirk Wiesenthal univentionstaff 2022-06-15 12:22:57 CEST
Changes successfully reverted.

Reason for revert:
Now that we do not delete all unneeded files, there seems to be no mechanism to remove those files with ignored DNs, i.e., the ones in cn=temporary,cn=univention.

I may reconsider this requirement. As we do this:

dellist = sorted(os.listdir(dellog))

and, apparently, the names are essentially timestamps, we may still work with the "finally: os.unlink" code.

But we should see in 5.0-2, maybe 5.0-2-errata.
Comment 10 Julia Bremer univentionstaff 2022-09-22 21:42:17 CEST
d46ab67e00 Bug #51772: fix yaml
f9d6b037bb Bug #51772: YAML
201ebbb988 Bug #51772: Delete all past unprocessed files per iteration or else they will never be cleaned up
4e94704b08 Revert "Bug #51772: Revert package changes"


Package: univention-directory-logger
Version: 11.0.3-5A~5.0.0.202209222130
Branch: ucs_5.0-0
Scope: errata5.0-2


I reverted the last revert and added an os.unlink at the end to clean up the deletes that were not processed by the listener.
Comment 11 Dirk Wiesenthal univentionstaff 2022-09-23 11:01:23 CEST
YAML: OK
Changes: OK
Tests: OK

It should be noted that certain objects do not pass the listener. Including DNs like
cn=test,cn=uid,cn=temporary,cn=univention,dc=...

This means that after some UDM actions, one file may remain in the dellog directory until the next LDAP change.