Bug 53821 - OpenLDAP slapd-translog resets last_id:=-1 (4.4)
OpenLDAP slapd-translog resets last_id:=-1 (4.4)
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 4.4
amd64 Linux
: P5 major (vote)
: UCS 4.4-8-errata
Assigned To: Julia Bremer
Arvid Requate
:
Depends on: 41687 51910 51911 54203
Blocks:
  Show dependency treegraph
 
Reported: 2021-09-22 12:01 CEST by Philipp Hahn
Modified: 2022-01-12 16:33 CET (History)
7 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.400
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2019032821000494, 2021092221000186
Bug group (optional): Error handling, Troubleshooting
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2021-09-22 12:01:17 CEST
It happened twice in two large environments that some bug causes /var/lib/univention-ldap/last_id to become invalid.

slapo-translog then fails to get the last_id from said file.
The fallback code also fails to parse /var/lib/univention-ldap/notify/transaction to get the last ID.
(BUG: /var/lib/univention-ldap/listener/listener.priv and …/listener MUST be checked, too, as their might be pending transactions already written by slapo-translog but not yet processed by UDN!)

Due the the missing last_id "<TransID>" is used instead in "…/listener/listener".
This confuses UDL, which will propagate those transaction with TID=0 to "…/notify/transaction" and "reqSession=0mcn=translog"

UDL luckily does not process those TID=0 transactions because TID=0 < /var/lib/univention-directory-listener/notifier_id

1. slapo-translog should ABORT `slapd` instead of using "<TransID>"
2. Fix logic bug which leads this situation, that `last_id` gets lost
3. Fix logic to also handle …/listener/listener{,.priv}

Alternative:
Change UDN to ignore the TID in first column and let UDN (on the Master) assign the TID. I see no reason why `slapo-translog` has to do this. This would simplify the slapo-translog code as we can remove all code related to last_id.
UDN on the other hand already has code for parsing the different files and is in the best position to assign the TID. 

+++ This bug was initially created as a clone of Bug #41687 +++
Comment 1 Dirk Schnick univentionstaff 2021-09-22 14:40:40 CEST
Added actual ticket no and cust id;

This happend today because MDB was full -> fixing bug 50665 would prevent running in this situation.
Comment 3 Arvid Requate univentionstaff 2021-10-13 10:41:26 CEST
The errata update https://errata.software-univention.de/#/?erratum=5.0x75 should address this.
Has that been installed in the case of Ticket #2021092221000186 ?
Comment 4 Philipp Hahn univentionstaff 2021-10-13 11:02:31 CEST
(In reply to Arvid Requate from comment #3)
> The errata update https://errata.software-univention.de/#/?erratum=5.0x75
> should address this.
> Has that been installed in the case of Ticket #2021092221000186 ?

No, as you only fixed it for 5.0-0 but *never* for 4.4-8 which most of our customers are still using!
Comment 7 Julia Bremer univentionstaff 2021-12-02 16:48:19 CET
I cherry-picked these commits from 5.0, 
I skipped advisory and changelog stuff
and I skipped commit e223cdad7cc82658e5 because these systemd config changes don't apply to the runit service in 4.4

f1b267e8b5 Bug #53821: yaml
4cdd3ccd6e Bug #53821: changelog
a794013870 test[udn]: Test univention-translog --listener-private-file
17b3307128 Bug #51911: Please the PEP8 gods
4082e9843a fixup! Bug #51911: Improve fix_renumber
0bc10f2dee fixup! Bug #51911: Don't try to cache TIDs < 1
a9e745e9c0 Bug #51911: Improve readability of log outout
d4ab094a83 Bug #51911: Improve fix_renumber
43fe4ad843 Bug #51911: Also check listener.priv
f7d35f71f0 Bug #51911: Fix element order in transaction syntax error handling
bbedaa0583 Bug #51911: Don't try to read TIDs < 1
f23f2e82f2 Bug #51911: Don't try to cache TIDs < 1
efcc4a1850 Bug #51911: Abort if Transaction ID from listener/listener.priv doesn't increase
811740a203 Bug #51911: Abort if Transaction ID from listener/listener.priv cannot be parsed

Successful build
Package: univention-directory-notifier
Version: 13.0.3-4A~4.4.0.202112021635
Branch: ucs_4.4-0
Scope: errata4.4-8


I also merged the revisions r19406, r19407, r19408, r19409 to openldap in 4.4
Successful build
Package: openldap
Version: 2.4.45+dfsg-1~bpo9+1A~4.4.0.202112020939
Branch: ucs_4.4-0
Scope: errata4.4-8
Comment 10 Philipp Hahn univentionstaff 2021-12-07 07:49:48 CET
Regression: Bug #54203 for UCS-5.0 got backported as 0bc10f2dee5 and f23f2e82f27
Comment 11 Julia Bremer univentionstaff 2021-12-08 09:20:26 CET
Successful build
Package: univention-directory-notifier
Version: 13.0.3-6A~4.4.0.202112080915
Branch: ucs_4.4-0
Scope: errata4.4-8

Reverted both of those commits:

79b6a4c10e fixup! Bug #53821: version bump
9309f9cef4 Revert "Bug #51911: Don't try to cache TIDs < 1"
7a066278f2 Bug #53821: version bump
55348a4236 Revert "Bug #51911: Don't try to cache TIDs < 1"
Comment 12 Julia Bremer univentionstaff 2021-12-23 10:40:19 CET
For completeness I also cherry-picked the changes from Bug #54203: 

477f4718d5 Bug #54208: Make sure notifier_cache_size is not negative
427658608b Bug #54203: Fix partial import when index < notifier_cache_size

Successful build
Package: univention-directory-notifier
Version: 13.0.3-8A~4.4.0.202112231030
Branch: ucs_4.4-0
Scope: errata4.4-8
Comment 13 Arvid Requate univentionstaff 2022-01-11 19:03:22 CET
Verified:
* 10_translog_overlay.quilt is the same now in errata4.4-8 as in errata5.0-1
* Cherrypicks for univention-directory-notifier (diffed all commits)
* Package updates
* Advisories