Univention Bugzilla – Bug 50820
Provisioning App: error in listener stops processing of queue
Last modified: 2020-02-25 15:47:01 CET
When there is an error in the handling of an item im the listener queue, a traceback is logged and all processing stopped. Five seconds later the same happens. Then forever the same happens, until the disk with the logfile is full and other processes crash. 1. The listeners error handling more be more robust. It should handle the remaining queue items. 2. The listener should have a (logarithmic?) back-off algorithm for retrying the same problematic queue item after 10s, 30s, 1min, 10min, 1h. 3. The problematic queue items file name must be printed to the logfile, so an administrator has the means to manually remove it.
Fixed with the last line in the listener_trigger: run_on_files(objs, run, pause_after_errors_num=3, pause_after_errors_length=60) (In reply to Daniel Tröder from comment #0) > 1. The listeners error handling more be more robust. It should handle the > remaining queue items. Not done as we do not know if the failed task was important for the subsequent tasks (like creating a context) > 2. The listener should have a (logarithmic?) back-off algorithm for retrying > the same problematic queue item after 10s, 30s, 1min, 10min, 1h. Currently there is no logarithmic increase in the sleep time and it is not configurable. > 3. The problematic queue items file name must be printed to the logfile, so > an administrator has the means to manually remove it. Error while processing /var/lib/univention-appcenter/apps/ox-connector/data/listener/2020-02-24-23-40-30-263937.json
(In reply to Dirk Wiesenthal from comment #1) > Fixed with the last line in the listener_trigger: > run_on_files(objs, run, pause_after_errors_num=3, > pause_after_errors_length=60) > > > (In reply to Daniel Tröder from comment #0) > > 1. The listeners error handling more be more robust. It should handle the > > remaining queue items. > > Not done as we do not know if the failed task was important for the > subsequent tasks (like creating a context) OK: we decided to make the behavior this way, as it is safer > > 2. The listener should have a (logarithmic?) back-off algorithm for retrying > > the same problematic queue item after 10s, 30s, 1min, 10min, 1h. > > Currently there is no logarithmic increase in the sleep time and it is not > configurable. OK: sleeps 60s, when recurring errors are detected > > 3. The problematic queue items file name must be printed to the logfile, so > > an administrator has the means to manually remove it. > > Error while processing > /var/lib/univention-appcenter/apps/ox-connector/data/listener/2020-02-24-23- > 40-30-263937.json OK.