Bug 51911 - UDN fails to parse <TransID>
UDN fails to parse <TransID>
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Notifier (univention-directory-notifier)
UCS 4.4
Other Linux
: P5 normal (vote)
: UCS 5.0-0-errata
Assigned To: Arvid Requate
Philipp Hahn
:
Depends on: 41687
Blocks: 51910 53821 54203
  Show dependency treegraph
 
Reported: 2020-08-26 17:24 CEST by Philipp Hahn
Modified: 2021-12-07 07:28 CET (History)
6 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.400
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2020082521000273
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2020-08-26 17:24:33 CEST
OpenLDAP/translog may write "<TransID>" to "/var/lib/univention-ldap/listener/listener" it it fails to parse the last transaction from either "/var/lib/univention-ldap/last_id" or "/var/lib/univention-ldap/notify/transaction". If that happens UDN fails to parse the transaction ID and uses TID=0 (See Bug #41687 comment 2 item 17)

UDN then copies those lines from "listener/listener" to "notify/transaction" with TID=0.

If "last_id" is still broken after that "translog.c" will parse "TID=0" from that file and will assign 0 to all following transactions.

Replication then is broken as TID=0 is not considered a valid transaction ID by several tools: secondary systems will stop replication and/or UDN as UDN will announce TID=0 as the latest transaction from then on.

univention-translog will also fail as TID=0 is considered invalid CheckGeneric.loop:

 875                 if not 0 < rec.tid:
 876                     self.error("Invalid transaction id", syntax=True)
 877                     continue

1. UDN should handle "<TransID>"
2. univention-translog should handle "<TransID>" and "0".
Comment 1 Philipp Hahn univentionstaff 2020-08-26 17:28:45 CEST
(Future) design question: why does "translog" assign the transaction ID?
The current code in "translog.c" has many issues (See Bug #51910) and fails to determine the last used transaction ID in several cases, as it must check multiple locations:
- /var/lib/univention-ldap/last_id
- /var/lib/univention-ldap/listener/listener
- /var/lib/univention-ldap/listener/listener.priv
- /var/lib/univention-ldap/notify/transaction

If translog.c would instead always use "0" and UDN (only on the Master) assignes the next TID, that would simplify the code a lot:
- a lot of code could be removed from translog.c
- UDN has to track the last transaction ID anyway and already has code for that.
Comment 3 Nico Gulden univentionstaff 2020-12-01 12:41:53 CET
There has not been any recent activity on this bug. Has the problem been seen somewhere else as well in the meantime or has its assessment changed?
Comment 4 Philipp Hahn univentionstaff 2020-12-01 14:16:46 CET
(In reply to Nico Gulden from comment #3)
> There has not been any recent activity on this bug. Has the problem been
> seen somewhere else as well in the meantime or has its assessment changed?

It only ever happened once so far, but in that case it was a complete disaster. UCS should reliable by default and not a ticking time bomb.

The bug is not assigned to any developer, neither is the Taiga task.
It's a pullcord bug from Q3 and we're nearing the end of Q4.
So please assign it to a developer if you want to see progress.
Comment 5 Arvid Requate univentionstaff 2020-12-01 17:34:56 CET
I also saw this once on my VMs, and yes, that's totally broken then.
Comment 6 Arvid Requate univentionstaff 2021-08-02 18:43:17 CEST
I've adjusted these points:

1. Make univention-directory-notifier abort if the TID is not a valid integer
2. Don't restart univention-directory-notifier forever. Limit to 50 restarts in a window of 1000s
3. Make univention-translog handle values "<TransID>" and 0
4. Make univention-translog check listener/listener.priv too

Commits:
d9b6bb5c53 | Abort if Transaction ID from listener/listener.priv cannot be parsed
3878737d4e | Abort if Transaction ID from listener/listener.priv doesn't increase
ec5d2e2250 | Don't try to cache TIDs < 1
8a26ec31d6 | Don't try to read TIDs < 1
e223cdad7c | Limit to 50 automatic restarts within time interval of 1000s
efe7aecf3f | Fix element order in transaction syntax error handling
25f24a2281 | Also check listener.priv
8a054da0e3 | Improve fix_renumber
83fc309575 | Improve readability of log outout
e932001ad8 | Merge branch 'arequate/Bug51911-fix-notifier-TransID' into 5.0-0
a600a21dd8 | Changelog
5192c92f5b | Advisory
966136b751 | Update advisory text
841b665696 | Fix advisory package name
bb2dc8fa80 | fixup! Bug #51911: Don't try to cache TIDs < 1
70134d91be | Advisory update
287f84977f | fixup! Bug #51911: Improve fix_renumber
daec2a463c | Advisory update

Package: univention-directory-notifier
Version: 14.0.5-1A~5.0.0.202108021826
Branch: ucs_5.0-0
Scope: errata5.0-0
Comment 7 Philipp Hahn univentionstaff 2021-08-24 18:17:20 CEST
(In reply to Arvid Requate from comment #6)
OK d9b6bb5c53 | Abort if Transaction ID from listener/listener.priv cannot be parsed
OK 3878737d4e | Abort if Transaction ID from listener/listener.priv doesn't increase
OK ec5d2e2250 | Don't try to cache TIDs < 1
OK bb2dc8fa80 | fixup! Bug #51911: Don't try to cache TIDs < 1
OK 8a26ec31d6 | Don't try to read TIDs < 1
   this was my "Bug #41687 udn: Remove useless fetch of last transaction"
OK e223cdad7c | Limit to 50 automatic restarts within time interval of 1000s
  see <man:systemd.unit(5)>
OK efe7aecf3f | Fix element order in transaction syntax error handling
FIXED 25f24a2281 | Also check listener.priv
~OK 8a054da0e3 | Improve fix_renumber
  lacks refactoring
  `tid = -1` should have remained `tid = 0` which already 0 is not a valid TID.
> 287f84977f | fixup! Bug #51911: Improve fix_renumber
> c710b05c0d | Please the PEP8 gods
FIXED added unit test

OK 83fc309575 | Improve readability of log outout
OK a600a21dd8 | Changelog
FIXED 5192c92f5b | Advisory
FIXED 966136b751 | Update advisory text
FIXED 841b665696 | Fix advisory package name
FIXED 70134d91be | Advisory update
FIXED daec2a463c | Advisory update


OK: apt install -t apt univention-directory-notifier

FIXED: univention-directory-notifier.yaml c3c7a9abbf b13e8ebaa4
OK: errata-announce  -V --only univention-directory-notifier.yaml

OK:
  systemctl stop univention-directory-notifier slapd
  truncate -s 0 /var/lib/univention-ldap/last_id
  systemctl start slapd
  udm "computers/$(ucr get server/role)" modify --dn "$(ucr get ldap/hostdn)" --set description=$RANDOM
  udm "computers/$(ucr get server/role)" modify --dn "$(ucr get ldap/hostdn)" --set description=$RANDOM
  cat /var/lib/univention-ldap/listener/listener
  cat /var/lib/univention-ldap/last_id
  /usr/share/univention-directory-notifier/univention-translog -n check -f -S
  /usr/sbin/univention-directory-notifier -F -d 2 -v 3
  Ctrl-C
  cat /var/lib/univention-ldap/listener/listener
  cat /var/lib/univention-ldap/last_id
  tail /var/lib/univention-ldap/notify/transaction

Package: univention-directory-notifier
Version: 14.0.5-3A~5.0.0.202108241814
Branch: ucs_5.0-0
Scope: errata5.0-0

[5.0-0] f570ad2fd8 Bug #51911: univention-directory-notifier 14.0.5-3A~5.0.0.202108241814
 doc/errata/staging/univention-directory-notifier.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


(In reply to Philipp Hahn from comment #0)
> 1. UDN should handle "<TransID>"
OK
> 2. univention-translog should handle "<TransID>" and "0".
OK
Comment 8 Arvid Requate univentionstaff 2021-08-24 19:13:47 CEST
576aaf6358 | Adjusted advisory wording
Comment 9 Philipp Hahn univentionstaff 2021-08-25 17:53:07 CEST
<https://errata.software-univention.de/#/?erratum=5.0x74>