Univention Bugzilla – Bug 51910
Missing error handling in translog leads to data loss
Last modified: 2021-10-13 11:54:52 CEST
Created attachment 10465 [details] Recovery procedure Reproducer: /etc/init.d/univention-directory-notifier stop # needed to see temporary state : > /var/lib/univention-ldap/last_id udm "computers/$(domaincontroller_master)" modify --dn "$(ucr get ldap/hostdn)" --set description="$(date)" tail /var/lib/univention-ldap/listener/listener <TransID> cn=m34,cn=dc,cn=computers,dc=phahn,dc=dev m tail /var/lib/univention-ldap/last_id -1 /etc/init.d/univention-directory-notifier start tail -n 1 /var/lib/univention-ldap/notify/transaction 0 cn=m34,cn=dc,cn=computers,dc=phahn,dc=dev m tail /var/lib/univention-ldap/last_id -1 1. Bug #47835 leads to slapd crashes If "/var/lib/univention-ldap/last_id" is corrupt "translog.c" reverts to parsing "/var/lib/univention-ldap/notify/transaction". This has multiple issues: 2. Parsing fails and always return -1. 3. If there are already pending transactions in "/var/lib/univention-ldap/listener/listener", the last transaction ID is calculated wrong! 4. It may happen that "notifier/trannsaction" already contains "broken" transactions, e.g. some with "tid=0". In that case get_last_id() returns 0 and translog_response() uses the string "<TransID>" instead of a transaction number. That string breaks both "univention-translog" and "management/univention-directory-notifier/src/notify.c#split_transaction_buffer()", which only work with *integers*. 5. Bug #41687 comment 2 item 17 fails to detect this and silently uses tid=0: UDN then copies the pending transactions from "listener/listener" to "notify/transaction" replacing that "<TransID>" with "tid=0". We were able to recover those transactions by performing the attached procedure: It copies the broken transactions back from "notify/transaction" to "listener/listener", temporarily replaces all invalid TIDs with "1" and uses "univention-translog check --fix" to get them re-numbered properly. translog.c has many more issues: 6. In translog_response() for the else-case of "lastid > -1" the fall-back string "<TransID>" is only used for the for the first line, but not for the 2nd line of the MODRDN operation. ("<TransID>" is used until the first MODRDN operation, after that translog.c would use increasing TIDs starting from 0 again. 7. There is not error handling for buffer overflows for all snprintf() calls. 8. The use of "lockf(F_TEST)" in fopen_lock() is at best dubios (actually unneeded) 9. fopen_lock() assigned NULL to the pointer, not to the *pointed* location: 257 -»··»···l_file = NULL; 257 +»··»···*l_file = NULL; 10. There two cases where NULL is assigned to *local* variables in fclose_lock() 11. In fclose_lock() "l_file" is de-referenced before the check. 12. Calling "flock(F_ULOCK)" is pointless as the file is "fclose()"ed anyway. 13. The code for WITH_STAMP should be removed finally. 14. There is no error message when writing "last_id" fails. 15. While the path for "listener/listener" is configured through the arguments, the paths for "last_id" and "notify/transaction" are hard-coded (in multiple places). 16. fclose_lock() relies on some internal knowledge about FILE. 17. Memory-leak in translog_config() due to "Debug(…, ch_strdup(…), …) 18. EOF is handled wrong in get_last_id(): fgetc() returns "int", not "char"! (Similar to #41687 commen 2 item 24)
Created attachment 10466 [details] Reworked translog.c The attached patch replaces "10_translog_overlay.quilt" and should fix most of the issues. It compiles okay and survived a short test in my VM. TODO: Handle the case better when fprintf()+fclose() fails due ENOSPC.
What's going on here, there is a commit r19129 for this bug in svn, which will go out with the next errata release, completely without any QA? ------------------------------------------------------------------------ r19129 | phahn | 2020-08-26 19:39:52 +0200 (Mi, 26. Aug 2020) | 1 Zeile Bug #51910 OpenLDAP: Convert .patch to .quilt ------------------------------------------------------------------------ This now blocks security Bug 52406. What now?
Ok, Felix added this Bug to the openldap.yaml so the release is blocked until checked.
(In reply to Arvid Requate from comment #3) > What's going on here, there is a commit r19129 for this bug in svn, which > will go out with the next errata release, completely without any QA? > > ------------------------------------------------------------------------ > r19129 | phahn | 2020-08-26 19:39:52 +0200 (Mi, 26. Aug 2020) | 1 Zeile > > Bug #51910 OpenLDAP: Convert .patch to .quilt > ------------------------------------------------------------------------ > > This now blocks security Bug 52406. What now? 1. Have a look at <http://10.200.17.11/4.4-7/slapd_2.4.45%2Bdfsg-1~bpo9%2B1A~4.4.0.202012151059_amd64.dchdiff>: you only see additional patches because my change split the nested patches into separate .quilt files 2. Have a look at <https://nissedal.knut.univention.de/websvn/comp.php?repname=patches&compare[]=%2Fopenldap%2F4.4-0-0-ucs%2F2.4.45%2Bdfsg-1%7Ebpo9%2B1-errata4.4-5@19128&compare[]=%2Fopenldap%2F4.4-0-0-ucs%2F2.4.45%2Bdfsg-1%7Ebpo9%2B1-errata4.4-5%2F@19129>: It only splits the nested patches into separate .quitl files (In reply to Arvid Requate from comment #4) > Ok, Felix added this Bug to the openldap.yaml so the release is blocked until checked. This is wrong: The CODE change to resolve THIS bug is NOT part of the pushed change! I only changed how our patches are stored in SVN because this cost ME extra time while working on this PULLCORD event back in August: To save OTHERS from also needing to spent that time again and again I only pushed that change. Our customers are still VULNERABLE to THIS bug: They still have NOT received the value of this work. But at least some colleges might appreciate my work not going to waste completely as it might save their time when next the have to work on OpenLDAP.
I checked the modifications done in svn r19129, no changes have been introduced. Patch and quilt files have been applied to the most recent build. I removed this bug from the yaml file.
*** Bug 53621 has been marked as a duplicate of this bug. ***
Since 27.07 many jenkins jobs hang while waiting for the LDAP replication. The tests are not usable in this state. In the slapd log file we can see over and over: Aug 02 09:53:12 master071 slapd[29681]: OVER: Could not write file /var/lib/univention-ldap/last_id.new
------------------------------------------------------------------------ r19406 | Several fixes from Philipp * Use PATH_MAX instead of MAX_PATH_LEN * Remove unused WITH_STAMP code * Fix fopen_lock * Fix fclose_lock * Refactor get_last_id into subfunctions * Check listener/listener if getting last_id failed * Consistently write <TransID> if last_id could not be determined as all * Log error if listener/listener cannot be written * Update last_id atomically * Avoid useless strdup in error message * Some whitespace cleanup ------------------------------------------------------------------------ r19407 | Also consider listener.priv * consider listener/listener.priv * fix typo and improve log message ------------------------------------------------------------------------ r19408 | Fix variable name in last commit ------------------------------------------------------------------------ r19409 | Fix return code check ------------------------------------------------------------------------ cb346999cf | Advisory 7c4d58e552 | Advisory update 966136b751 | Update advisory text Package: openldap Version: 2.4.47+dfsg-3+deb10u6A~5.0.0.202108021020 Branch: ucs_5.0-0 Scope: errata5.0-0
(In reply to Arvid Requate from comment #9) OK: r19406 | Several fixes from Philipp OK: r19407 | Also consider listener.priv OK: reading priority /var/lib/univention-ldap/last_id /var/lib/univention-ldap/listener/listener /var/lib/univention-ldap/listener/listener.priv /var/lib/univention-ldap/notify/transaction OK: r19408 | Fix variable name in last commit OK: r19409 | Fix return code check OK: > cb346999cf | Advisory > 7c4d58e552 | Advisory update > 966136b751 | Update advisory text OK: apt install -t apt slapd OK: journalctl -u slapd.service OK: systemctl stop slapd.service truncate -s 0 /var/lib/univention-ldap/last_id systemctl start slapd.service journalctl -u slapd.service > Aug 24 18:06:30 m38 slapd[24873]: OVER: Could not find last ID, lastid seems to be: "-1" OK: systemctl stop slapd.service univention-directory-notifier.service rm -f /var/lib/univention-ldap/last_id tail -1 ../notify/transaction > listener.priv # +1 tail -1 ../notify/transaction > listener # +2 chown listener:root listener.priv systemctl start slapd.service udm "computers/$(ucr get server/role)" modify --dn "$(ucr get ldap/hostdn)" --set description=$RANDOM journalctl -u slapd.service tail -n 1 /var/lib/univention-ldap/{notify/transaction,listener/listener{,.priv},last_id} OK: errata-announce -V --onyl openldap.yaml OK: openldap.yaml
<https://errata.software-univention.de/#/?erratum=5.0x75>