Bug 50668 - If "univention-translog check --fix" fails then it should not say "all systems must be re-joined"
If "univention-translog check --fix" fails then it should not say "all system...
Status: NEW
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 4.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-12-20 11:42 CET by Arvid Requate
Modified: 2020-08-26 09:05 CEST (History)
1 user (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 2: Improvement: Would be a product improvement
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.023
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2019121921000565
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2019-12-20 11:42:15 CET
In Ticket #2019121921000565 Unvention Support had to deal with a case of a large envirnment, where the hard disk of hte DC Master hat run out of storage place and the "transaction" file got currupted". The attempt to fix this by running unviention-translog check --fix faild, because the tool didn't know how to handle some basically trivial inconsistencies. I created Bug 50666 to improve the capabilities of the tool.

What made things worse for Support was in this situation, that the tool wrote this message:
=========================================================================
UCS Master must be reset and all other UCS systems must be re-joined!

See <https://help.univention.com/t/how-to-reset-listener-notifier-replication/11710> for more details.
=========================================================================

We should improve the wording here, the "must" is a very strong word here. That's more the last option available, that should not be chosen lightly in very large domains. We should recommend manually inspecting the "transaction" file first, try to fix the first mentioned issue manually and then re-attempt the "check --fix".


This is the full output:
=========================================================================
root@master:/var/log/univention # /usr/share/univention-directory-notifier/univention-translog check --fix
2019-12-19 12:04:22,994:ERROR:/var/lib/univention-ldap/notify/transaction:3493468:'3492355 cn=43637,cn=uidNumber,cn=temporary,cn=univention,dc=dom,dc=net a\n': Repeated line after '3493467 cn=foo123 cn=43637,cn=gidNumber,cn=temporary,cn=univention,dc=dom,dc=net a'
2019-12-19 12:04:27,901:ERROR:/var/lib/univention-ldap/notify/transaction:4005736:'4004852 cn={326A40A8-2DFB-4BDA-87A9-AAECAC56460C},cn=Policies,cn=System,dc=dom,dc=net m\n': Hole after '4004622 cn=xyz123,cn=computers,ou=abc123,dc=dom,dc=net m'

/var/lib/univention-ldap/notify/transaction needs fixing:
- the transactions are not sorted uniquely
2019-12-19 12:05:06,044:ERROR:/var/lib/univention-ldap/notify/transaction:3493468:'3493467 cn=foo123 cn=43637,cn=gidNumber,cn=temporary,cn=univention,dc=dom,dc=net a\n': Repeated line after '3493467 cn=foo123,cn=groups,ou=abc123,dc=dom,dc=net m'
2019-12-19 12:05:12,174:ERROR:/var/lib/univention-ldap/notify/transaction:4004624:'4004852 cn={326A40A8-2DFB-4BDA-87A9-AAECAC56460C},cn=Policies,cn=System,dc=dom,dc=net m\n': Hole after '4004622 cn=xyz123,cn=computers,ou=abc123,dc=dom,dc=net m'
- still contains duplicate transactions after unique sorting!
UCS Master must be reset and all other UCS systems must be re-joined!

See <https://help.univention.com/t/how-to-reset-listener-notifier-replication/11710> for more details.
=========================================================================
Comment 1 Philipp Hahn univentionstaff 2020-08-26 09:05:31 CEST
univention-translog was and is designed as a tool to be used by regular administrators to help them fix SOME translog related issues; as such it recommends to re-join the system by default.

If on the other you are willing to spend extra time and money on fixing the problem somehow differently you are free to do so; in most cases that requires expert knowledge which regular administrators do not have.