Bug 50738 - Add Nagios check for corruption of Samba LDB files
Add Nagios check for corruption of Samba LDB files
Status: NEW
Product: UCS
Classification: Unclassified
Component: Samba4
UCS 4.4
Other Linux
: P5 normal (vote)
: ---
Assigned To: Samba maintainers
Samba maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-01-24 12:34 CET by Arvid Requate
Modified: 2023-06-20 10:13 CEST (History)
2 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.200
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2020012221000684, 2023051521000363
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2020-01-24 12:34:38 CET
In Ticket #2020012221000684 a customer reported corruption of the file /var/lib/samba/private/sam.ldb.d/DC=DOMAINDNSZONES,DC=CUSTOMERDOMAIN,DC=LOCAL .


In this case the structure of the TDB-Key-Value-Database file was corrpupt:


root@ucs-master:/var/log/univention# tdbtool /var/lib/samba/private/sam.ldb.d/DC\=DOMAINDNSZONES\,DC\=CUSTOMERDOMAIN\,DC\=LOCAL.ldb 
tdb> check
Hashes do not match records
Integrity check for the opened database failed.
tdb> 


To make things worse, the last successful /var/univention-backup/samba file was more than one month old. Apparently the error emails from the cron backup job went unnoticed ( Bug #49399 , http://errata.software-univention.de/ucs/4.4/162.html ).

We should help customers to notice this ASAP by adding a Nagios check, that verifies the consitency of the TDB-structure of all of the LDB-Files.
Comment 1 Arvid Requate univentionstaff 2020-01-24 12:37:52 CET
For reference here is the syslog message that showed the problem:

===========================================================================
Jan 22 13:54:19 master named[12779]: samba_dlz: added rdataset foomar.customerdomain.local 'foobar.customerdomain.local.#0111200#011IN#011A#0111
0.12.11.10'
Jan 22 13:54:19 master named[12779]: samba_dlz: ldb: ltdb: tdb(/var/lib/samba/private/sam.ldb.d/DC=DOMAINDNSZONES,DC=CUSTOMERDOMAIN,DC=LOCAL.ldb): tdb_rec_read bad magic 0xd9fee666 at offset=5188824
Jan 22 13:54:19 master named[12779]: samba_dlz:
Jan 22 13:54:19 master named[12779]: samba_dlz: failed to commit a transaction for zone customerdomain.local
Jan 22 13:54:19 master named[12779]: sdlz closeversion on origin customerdomain.local failed
Jan 22 13:54:19 master named[12779]: ../../../lib/dns/db.c:459: ENSURE(*versionp == ((void *)0)) failed, back trace
Jan 22 13:54:19 master named[12779]: #0 0x56463ba1d560 in ??
Jan 22 13:54:19 master named[12779]: #1 0x7f3397aac97a in ??
Jan 22 13:54:19 master named[12779]: #2 0x7f339915d6b2 in ??
Jan 22 13:54:19 master named[12779]: #3 0x56463ba4a51e in ??
Jan 22 13:54:19 master named[12779]: #4 0x7f3397ad09f3 in ??
Jan 22 13:54:19 master named[12779]: #5 0x7f3396dff4a4 in ??
Jan 22 13:54:19 master named[12779]: #6 0x7f3396250d0f in ??
Jan 22 13:54:19 master named[12779]: exiting (due to assertion failure)
===========================================================================