Bug 51910 - Missing error handling in translog leads to data loss
Missing error handling in translog leads to data loss
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 4.4
Other Linux
: P5 normal (vote)
: UCS 5.0-0-errata
Assigned To: Arvid Requate
Philipp Hahn
:
: 53621 (view as bug list)
Depends on: 51911
Blocks: 53821
  Show dependency treegraph
 
Reported: 2020-08-26 17:16 CEST by Philipp Hahn
Modified: 2021-10-13 11:54 CEST (History)
6 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.400
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2020082521000273
Bug group (optional):
Max CVSS v3 score:


Attachments
Recovery procedure (1.31 KB, text/plain)
2020-08-26 17:16 CEST, Philipp Hahn
Details
Reworked translog.c (15.38 KB, patch)
2020-08-26 18:19 CEST, Philipp Hahn
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2020-08-26 17:16:17 CEST
Created attachment 10465 [details]
Recovery procedure

Reproducer:

/etc/init.d/univention-directory-notifier stop  # needed to see temporary state
: > /var/lib/univention-ldap/last_id
udm "computers/$(domaincontroller_master)" modify --dn "$(ucr get ldap/hostdn)" --set description="$(date)"
tail /var/lib/univention-ldap/listener/listener
  <TransID> cn=m34,cn=dc,cn=computers,dc=phahn,dc=dev m
tail /var/lib/univention-ldap/last_id
  -1
/etc/init.d/univention-directory-notifier start
tail -n 1 /var/lib/univention-ldap/notify/transaction
  0 cn=m34,cn=dc,cn=computers,dc=phahn,dc=dev m
tail /var/lib/univention-ldap/last_id
  -1



1. Bug #47835 leads to slapd crashes

If "/var/lib/univention-ldap/last_id" is corrupt "translog.c" reverts to parsing "/var/lib/univention-ldap/notify/transaction". This has multiple issues:

2. Parsing fails and always return -1.

3. If there are already pending transactions in "/var/lib/univention-ldap/listener/listener", the last transaction ID is calculated wrong!

4. It may happen that "notifier/trannsaction" already contains "broken" transactions, e.g. some with "tid=0". In that case get_last_id() returns 0 and translog_response() uses the string "<TransID>" instead of a transaction number. That string breaks both "univention-translog" and "management/univention-directory-notifier/src/notify.c#split_transaction_buffer()", which only work with *integers*.

5. Bug #41687 comment 2 item 17 fails to detect this and silently uses tid=0: UDN then copies the pending transactions from "listener/listener" to "notify/transaction" replacing that "<TransID>" with "tid=0".


We were able to recover those transactions by performing the attached procedure: It copies the broken transactions back from "notify/transaction" to "listener/listener", temporarily replaces all invalid TIDs with "1" and uses "univention-translog check --fix" to get them re-numbered properly.


translog.c has many more issues:

6. In translog_response() for the else-case of "lastid > -1" the fall-back string "<TransID>" is only used for the for the first line, but not for the 2nd line of the MODRDN operation. ("<TransID>" is used until the first MODRDN operation, after that translog.c would use increasing TIDs starting from 0 again.

7. There is not error handling for buffer overflows for all snprintf() calls.

8. The use of "lockf(F_TEST)" in fopen_lock() is at best dubios (actually unneeded)

9. fopen_lock() assigned NULL to the pointer, not to the *pointed* location:
  257 -»··»···l_file  = NULL;
  257 +»··»···*l_file = NULL;

10. There two cases where NULL is assigned to *local* variables in fclose_lock()

11. In fclose_lock() "l_file" is de-referenced before the check.

12. Calling "flock(F_ULOCK)" is pointless as the file is "fclose()"ed anyway.

13. The code for WITH_STAMP should be removed finally.

14. There is no error message when writing "last_id" fails.

15. While the path for "listener/listener" is configured through the arguments, the paths for "last_id" and "notify/transaction" are hard-coded (in multiple places).

16. fclose_lock() relies on some internal knowledge about FILE.

17. Memory-leak in translog_config() due to "Debug(…, ch_strdup(…), …)

18. EOF is handled wrong in get_last_id(): fgetc() returns "int", not "char"! (Similar to #41687 commen 2 item 24)
Comment 1 Philipp Hahn univentionstaff 2020-08-26 18:19:52 CEST
Created attachment 10466 [details]
Reworked translog.c

The attached patch replaces "10_translog_overlay.quilt" and should fix most of the issues.

It compiles okay and survived a short test in my VM.

TODO: Handle the case better when fprintf()+fclose() fails due ENOSPC.
Comment 3 Arvid Requate univentionstaff 2020-12-16 19:29:34 CET
What's going on here, there is a commit r19129 for this bug in svn, which will go out with the next errata release, completely without any QA?

------------------------------------------------------------------------
r19129 | phahn | 2020-08-26 19:39:52 +0200 (Mi, 26. Aug 2020) | 1 Zeile

Bug #51910 OpenLDAP: Convert .patch to .quilt
------------------------------------------------------------------------

This now blocks security Bug 52406. What now?
Comment 4 Arvid Requate univentionstaff 2020-12-17 23:02:17 CET
Ok, Felix added this Bug to the openldap.yaml so the release is blocked until checked.
Comment 5 Philipp Hahn univentionstaff 2020-12-18 09:13:13 CET
(In reply to Arvid Requate from comment #3)
> What's going on here, there is a commit r19129 for this bug in svn, which
> will go out with the next errata release, completely without any QA?
> 
> ------------------------------------------------------------------------
> r19129 | phahn | 2020-08-26 19:39:52 +0200 (Mi, 26. Aug 2020) | 1 Zeile
> 
> Bug #51910 OpenLDAP: Convert .patch to .quilt
> ------------------------------------------------------------------------
> 
> This now blocks security Bug 52406. What now?

1. Have a look at <http://10.200.17.11/4.4-7/slapd_2.4.45%2Bdfsg-1~bpo9%2B1A~4.4.0.202012151059_amd64.dchdiff>: you only see additional patches because my change split the nested patches into separate .quilt files

2. Have a look at <https://nissedal.knut.univention.de/websvn/comp.php?repname=patches&compare[]=%2Fopenldap%2F4.4-0-0-ucs%2F2.4.45%2Bdfsg-1%7Ebpo9%2B1-errata4.4-5@19128&compare[]=%2Fopenldap%2F4.4-0-0-ucs%2F2.4.45%2Bdfsg-1%7Ebpo9%2B1-errata4.4-5%2F@19129>: It only splits the nested patches into separate .quitl files

(In reply to Arvid Requate from comment #4)
> Ok, Felix added this Bug to the openldap.yaml so the release is blocked until checked.

This is wrong: The CODE change to resolve THIS bug is NOT part of the pushed change! I only changed how our patches are stored in SVN because this cost ME extra time while working on this PULLCORD event back in August: To save OTHERS from also needing to spent that time again and again I only pushed that change.

Our customers are still VULNERABLE to THIS bug: They still have NOT received the value of this work.

But at least some colleges might appreciate my work not going to waste completely as it might save their time when next the have to work on OpenLDAP.
Comment 6 Erik Damrose univentionstaff 2021-01-05 19:04:21 CET
I checked the modifications done in svn r19129, no changes have been introduced. Patch and quilt files have been applied to the most recent build. I removed this bug from the yaml file.
Comment 7 Erik Damrose univentionstaff 2021-08-02 09:58:50 CEST
*** Bug 53621 has been marked as a duplicate of this bug. ***
Comment 8 Julia Bremer univentionstaff 2021-08-02 10:08:40 CEST
Since 27.07 many jenkins jobs hang while waiting for the LDAP replication.
The tests are not usable in this state.

In the slapd log file we can see over and over:
Aug 02 09:53:12 master071 slapd[29681]: OVER: Could not write file /var/lib/univention-ldap/last_id.new
Comment 9 Arvid Requate univentionstaff 2021-08-02 18:32:36 CEST
------------------------------------------------------------------------
r19406 | Several fixes from Philipp

* Use PATH_MAX instead of MAX_PATH_LEN
* Remove unused WITH_STAMP code
* Fix fopen_lock
* Fix fclose_lock
* Refactor get_last_id into subfunctions
* Check listener/listener if getting last_id failed
* Consistently write <TransID> if last_id could not be determined as all
* Log error if listener/listener cannot be written
* Update last_id atomically
* Avoid useless strdup in error message
* Some whitespace cleanup
------------------------------------------------------------------------
r19407 | Also consider listener.priv

* consider listener/listener.priv
* fix typo and improve log message
------------------------------------------------------------------------
r19408 | Fix variable name in last commit

------------------------------------------------------------------------
r19409 |  Fix return code check

------------------------------------------------------------------------
cb346999cf | Advisory
7c4d58e552 | Advisory update
966136b751 | Update advisory text

Package: openldap
Version: 2.4.47+dfsg-3+deb10u6A~5.0.0.202108021020
Branch: ucs_5.0-0
Scope: errata5.0-0
Comment 10 Philipp Hahn univentionstaff 2021-08-24 18:38:08 CEST
(In reply to Arvid Requate from comment #9)
OK: r19406 | Several fixes from Philipp
OK: r19407 | Also consider listener.priv
OK: reading priority
  /var/lib/univention-ldap/last_id
  /var/lib/univention-ldap/listener/listener
  /var/lib/univention-ldap/listener/listener.priv
  /var/lib/univention-ldap/notify/transaction

OK: r19408 | Fix variable name in last commit
OK: r19409 |  Fix return code check

OK:
> cb346999cf | Advisory
> 7c4d58e552 | Advisory update
> 966136b751 | Update advisory text

OK: apt install -t apt slapd
OK: journalctl -u slapd.service
OK:
  systemctl stop slapd.service
  truncate -s 0 /var/lib/univention-ldap/last_id
  systemctl start slapd.service
  journalctl -u slapd.service
> Aug 24 18:06:30 m38 slapd[24873]: OVER: Could not find last ID, lastid seems to be: "-1"

OK:
  systemctl stop slapd.service univention-directory-notifier.service
  rm -f /var/lib/univention-ldap/last_id
  tail -1 ../notify/transaction > listener.priv # +1
  tail -1 ../notify/transaction > listener # +2
  chown listener:root listener.priv
  systemctl start slapd.service
  udm "computers/$(ucr get server/role)" modify --dn "$(ucr get ldap/hostdn)" --set description=$RANDOM
  journalctl -u slapd.service
  tail -n 1 /var/lib/univention-ldap/{notify/transaction,listener/listener{,.priv},last_id}

OK: errata-announce -V --onyl openldap.yaml
OK: openldap.yaml