Bug 56997 - univention-lastbind.py lacks error handling and logging
univention-lastbind.py lacks error handling and logging
Status: NEW
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 5.0
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2024-01-22 10:06 CET by Julia Bremer
Modified: 2024-01-22 10:07 CET (History)
1 user (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 4: Minor Usability: Impairs usability in secondary scenarios
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.046
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2024011121000189
Bug group (optional): Error handling, Usability
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Julia Bremer univentionstaff 2024-01-22 10:06:06 CET
The univention-lastbind.py script is used when slapo_lastbind is activated.
Slapo_lastbind writes the attribute authTimestamp into the LOCAL! LDAP whenever a  LDAP bind happens on a server.
This attribute is not replicated by design.

If a customer wants to collect all authTimestamp attributes in the domain, to find out when a user logged in the last time, we ship the script univention_lastbind.py in the product. It is documented, that this script should be used in a cronjob.

This script has no sufficient error handling, which becomes apparent in bigger customer environments.
The script iterates over all domaincontroller_[master|backup|slaves] and creates a connection.

It only catches LDAP_SERVER_DOWN, as an exception and then prints a warning. Nothing else.
This might be ok for a small domain, where this script could be rerun easily on an error.
In this customer environment, it takes about 10 hours to run this script. If an error happens on one of the >300 servers, all progress is lost.
For example, at this customer, there was one domaincontroller that was just for testing, and the password of the primary machine account was not replicated there. This raised an LDAP.INVALID_CREDENTIALS exception, which was not caught.

Later in the script, all those LDAP connections are used to get "authTimeStamp" for each user in the domain. 
During the "lo.get" later in the script, no error handling is done at all.
If the connection closed in the meantime (probably, because the connection has been created hours ago sometimes)->the script errors-> all progress is lost.

Furthermore the support does not have an easy way to find out what went wrong.
I assume the traceback is sent via cron.mail, but there should also be some kind of logging (and don't forget the logrotate!)



TL;DR: the script should continue on all/most errors (or be configurable to do so) and the support needs some logging and a verbose flag. 

PS: This script could possibly be parallelized, it should not take that long.