Univention Bugzilla – Bug 33904
Samba 4 segmentation fault in DRS replication
Last modified: 2017-04-20 19:59:18 CEST
Created attachment 5727 [details] core From time to time the DRS replication stops working in the S4 backup test run: Bug #33050 It seems that it is triggered by the case tests/10_ldap/61getent_crash. The bt full output from gdb (the gdb version is from squeeze-backports): [New LWP 10350] [Thread debugging using libthread_db enabled] Core was generated by `samba -i -M single'. Program terminated with signal 6, Aborted. #0 0x00007f1f31cab1b5 in raise () from /lib/libc.so.6 #0 0x00007f1f31cab1b5 in raise () from /lib/libc.so.6 No symbol table info available. #1 0x00007f1f31cadfc0 in abort () from /lib/libc.so.6 No symbol table info available. #2 0x00007f1f35c83cb0 in smb_panic_default (why=<optimized out>) at ../lib/util/fault.c:149 No locals. #3 smb_panic (why=0x7f1f35c9007b "internal error") at ../lib/util/fault.c:162 No locals. #4 0x00007f1f35c83e82 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:77 counter = 1 #5 sig_fault (sig=11) at ../lib/util/fault.c:88 No locals. #6 <signal handler called> No symbol table info available. #7 0x00007f1f2038e783 in replmd_check_urgent_objectclass (objectclass_el=0x0, situation=REPL_URGENT_ON_UPDATE) at ../source4/dsdb/samdb/ldb_modules/repl_meta_data.c:205 j = 2 #8 0x00007f1f20394c89 in replmd_update_rpmd (module=<optimized out>, schema=<optimized out>, req=<optimized out>, rename_attrs=<optimized out>, msg=0x7f1f376cac90, seq_num=<optimized out>, t=1389593127, is_urgent=0x7fff695ae6cf, rodc=0x7fff695ae6ce) at ../source4/dsdb/samdb/ldb_modules/repl_meta_data.c:1432 omd_value = <optimized out> ndr_err = <optimized out> omd = {version = 0, reserved = 0, ctr = {ctr1 = {count = 843032434, reserved = 32543, array = 0x7f1f32621219}}} i = <optimized out> now = <optimized out> our_invocation_id = 0x7f1f378f8f30 ret = <optimized out> attrs = 0x7fff695ae600 attrs1 = {0x7f1f2039a1b0 "replPropertyMetaData", 0x7f1f2039a352 "*", 0x0} attrs2 = {0x7f1f2039a1a5 "uSNChanged", 0x7f1f2039a346 "objectClass", 0x7f1f2039a144 "instanceType", 0x0} res = 0x7f1f37d11c10 ldb = 0x7f1f36c39690 objectclass_el = 0x0 situation = REPL_URGENT_ON_UPDATE __FUNCTION__ = "replmd_update_rpmd" #9 0x00007f1f20398b59 in replmd_modify (module=0x7f1f36cc1830, req=0x7f1f3771aba0) at ../source4/dsdb/samdb/ldb_modules/repl_meta_data.c:2455 ldb = <optimized out> ac = <optimized out> down_req = <optimized out> msg = 0x7f1f376cac90 t = 1389593127 ret = <optimized out> is_urgent = false rodc = false guid_blob = <optimized out> sd_propagation_control = <optimized out> #10 0x00007f1f232151e4 in dsdb_module_modify (module=0x7f1f36cc1830, message=<optimized out>, dsdb_flags=4194304, parent=0x7f1f377665e0) at ../source4/dsdb/samdb/ldb_modules/util.c:447 ops = 0x0 mod_req = 0x7f1f3771aba0 ret = 0 ldb = 0x7f1f36c39690 tmp_ctx = <optimized out> __FUNCTION__ = "dsdb_module_modify" #11 0x00007f1f20391fc6 in replmd_delete_internals (module=0x7f1f36cc1830, req=0x7f1f377665e0, re_delete=<optimized out>) at ../source4/dsdb/samdb/ldb_modules/repl_meta_data.c:3309 ret = <optimized out> retb = <optimized out> old_dn = 0x7f1f376cad10 new_dn = 0x7f1f37d11fc0 rdn_name = 0x7f1f377f6ae0 "cn" rdn_value = 0x7f1f3776a8f8 new_rdn_value = 0x7fff695ae840 guid = {time_low = 940274416, time_mid = 32543, time_hi_and_version = 0, clock_seq = ")\204", node = "`2\037\177\000"} ldb = 0x7f1f36c39690 schema = 0x7f1f3865a0e0 msg = 0x7f1f3776aab0 old_msg = <optimized out> el = 0x7f1f37769850 tmp_ctx = 0x7f1f377f69d0 res = 0x7f1f377626d0 parent_res = 0x7fff695aea40 preserved_attrs = {0x7f1f2039a4d5 "nTSecurityDescriptor", 0x7f1f2039a516 "attributeID", 0x7f1f2039a522 "attributeSyntax", 0x7f1f2039a532 "dNReferenceUpdate", 0x7f1f2039a544 "dNSHostName", 0x7f1f2039a550 "flatName", 0x7f1f2039a559 "governsID", 0x7f1f2039a563 "groupType", 0x7f1f2039a144 "instanceType", 0x7f1f2039a56d "lDAPDisplayName", 0x7f1f2039a57d "legacyExchangeDN", 0x7f1f2039a018 "isDeleted", 0x7f1f2039a022 "isRecycled", 0x7f1f2039a268 "lastKnownParent", 0x7f1f2039a278 "msDS-LastKnownRDN", 0x7f1f2039a58e "mS-DS-CreatorSID", 0x7f1f2039a59f "mSMQOwnerID", 0x7f1f2039a5ab "nCName", 0x7f1f2039a346 "objectClass", 0x7f1f2039a5b2 "distinguishedName", 0x7f1f2039a183 "objectGUID", 0x7f1f2039a5c4 "objectSid", 0x7f1f2039a5ce "oMSyntax", 0x7f1f2039a5d7 "proxiedObjectName", 0x7f1f2039a1f4 "name", 0x7f1f2039a1b0 "replPropertyMetaData", 0x7f1f2039a5e9 "sAMAccountName", 0x7f1f2039a5f8 "securityIdentifier", 0x7f1f2039a60b "sIDHistory", 0x7f1f2039a616 "subClassOf", 0x7f1f2039a22f "systemFlags", 0x7f1f2039a621 "trustPartner", 0x7f1f2039a62e "trustDirection", 0x7f1f2039a63d "trustType", 0x7f1f2039a647 "trustAttributes", 0x7f1f2039a657 "userAccountControl", 0x7f1f2039a1a5 "uSNChanged", 0x7f1f2039a19a "uSNCreated", 0x7f1f2039a480 "whenCreated", 0x7f1f2039a18e "whenChanged", 0x0} i = <optimized out> el_count = 918764704 deletion_state = OBJECT_TOMBSTONE next_deletion_state = OBJECT_TOMBSTONE __FUNCTION__ = "replmd_delete_internals" #12 0x00007f1f20392d39 in replmd_replicated_apply_isDeleted (ar=0x7f1f380b7db0) at ../source4/dsdb/samdb/ldb_modules/repl_meta_data.c:4718 del_req = 0x7f1f377665e0 tmp_ctx = <optimized out> deleted_objects_dn = 0x7f1f37761a50 msg = <optimized out> ret = 0 #13 0x00007f1f20392fde in replmd_op_callback (req=<optimized out>, ares=0x7f1f380bb6a0) at ../source4/dsdb/samdb/ldb_modules/repl_meta_data.c:526 ret = <optimized out> ac = 0x7f1f380b7db0 modified_partition = 0x7f1f31d8c220 partition_ctrl = 0x7f1f3742dd10 controls = 0x0 __FUNCTION__ = "replmd_op_callback" #14 0x00007f1f20390fb5 in replmd_op_possible_conflict_callback (req=0x7f1f3788a0a0, ares=0x7f1f38235f40, callback=0x7f1f20392e60 <replmd_op_callback>) at ../source4/dsdb/samdb/ldb_modules/repl_meta_data.c:3606 conflict_dn = <optimized out> res = <optimized out> attrs = {0x7f1f2039a1b0 "replPropertyMetaData", 0x7f1f2039a183 "objectGUID", 0x0} ret = <optimized out> omd_value = <optimized out> omd = {version = 0, reserved = 0, ctr = {ctr1 = {count = 835648624, reserved = 32543, array = 0x7f1f3742dd10}}} rmd = <optimized out> ndr_err = <optimized out> rodc = <optimized out> rmd_name = <optimized out> omd_name = <optimized out> msg = <optimized out> __FUNCTION__ = "replmd_op_possible_conflict_callback" #15 0x00007f1f1ca64922 in ?? () from /usr/lib//ldb/modules/ldb/tdb.so No symbol table info available. #16 0x00007f1f31fe3094 in tevent_common_loop_timer_delay () from /usr/lib/libtevent.so.0 No symbol table info available. #17 0x00007f1f31fe3fcb in ?? () from /usr/lib/libtevent.so.0 No symbol table info available. #18 0x00007f1f31fe26d6 in ?? () from /usr/lib/libtevent.so.0 No symbol table info available. #19 0x00007f1f31fde8bd in _tevent_loop_once () from /usr/lib/libtevent.so.0 No symbol table info available. #20 0x00007f1f3261f195 in ldb_wait () from /usr/lib/libldb.so.1 No symbol table info available. #21 0x00007f1f3261f91b in ldb_extended () from /usr/lib/libldb.so.1 No symbol table info available. #22 0x00007f1f3337d1d5 in dsdb_replicated_objects_commit (ldb=0x7f1f36c39690, working_schema=0x0, objects=0x7f1f382ddc10, notify_uSN=0x7f1f37f82c30) at ../source4/dsdb/repl/replicated_objects.c:773 ext_res = <optimized out> cur_schema = 0x0 new_schema = <optimized out> ret = 0 seq_num1 = <optimized out> seq_num2 = <optimized out> used_global_schema = false tmp_ctx = 0x7f1f371410b0 __FUNCTION__ = "dsdb_replicated_objects_commit" #23 0x00007f1f26b58fda in dreplsrv_op_pull_source_apply_changes_trigger (ctr6=<optimized out>, ctr1=<optimized out>, ctr_level=<optimized out>, r=<optimized out>, req=<optimized out>) at ../source4/dsdb/repl/drepl_out_helpers.c:717 state = 0x7f1f36f71000 rf1 = {blobsize = 274, consecutive_sync_failures = 0, last_success = 130339146210000000, last_attempt = 130340667260000000, result_last_attempt = {w = 0}, other_info = 0x7f1f37655f40, other_info_length = 66, replica_flags = 116, schedule = '\021' <repeats 84 times>, reserved = 0, highwatermark = {tmp_highest_usn = 5604, reserved_usn = 0, highest_usn = 5068}, source_dsa_obj_guid = {time_low = 618272083, time_mid = 53081, time_hi_and_version = 16778, clock_seq = "\273\060", node = "\005S\261\237\243", <incomplete sequence \357>}, source_dsa_invocation_id = {time_low = 3851307759, time_mid = 43312, time_hi_and_version = 17618, clock_seq = "\260\242", node = "\210;k\331\071S"}, transport_guid = {time_low = 0, time_mid = 0, time_hi_and_version = 0, clock_seq = "\000", node = "\000\000\000\000\000"}} service = 0x7f1f3833d090 linked_attributes_count = 0 linked_attributes = 0x0 dsdb_repl_flags = <optimized out> partition = <optimized out> drsuapi = 0x7f1f37d6cc00 schema = 0x7f1f3865a0e0 first_object = 0x7f1f37b9fab0 uptodateness_vector = 0x0 status = {w = 0} nt_status = <optimized out> working_schema = 0x0 mapping_ctr = 0x7f1f37e11730 object_count = 133 objects = 0x7f1f382ddc10 more_data = true #24 dreplsrv_op_pull_source_get_changes_done (subreq=<optimized out>) at ../source4/dsdb/repl/drepl_out_helpers.c:599 req = 0x7f1f371e1df0 state = <optimized out> status = <optimized out> r = 0x7f1f373604c0 ctr_level = <optimized out> ctr1 = <optimized out> ctr6 = <optimized out> extended_ret = <optimized out> #25 0x00007f1f32a5270c in dcerpc_binding_handle_call_done (subreq=0x7f1f37072820) at ../librpc/rpc/binding_handle.c:492 req = <optimized out> state = <optimized out> h = 0x7f1f3786bf90 error = {v = 0} out_flags = <optimized out> ndr_err = <optimized out> #26 0x00007f1f31fdf4e9 in tevent_common_loop_immediate () from /usr/lib/libtevent.so.0 No symbol table info available. #27 0x00007f1f31fe3fbf in ?? () from /usr/lib/libtevent.so.0 No symbol table info available. #28 0x00007f1f31fe26d6 in ?? () from /usr/lib/libtevent.so.0 No symbol table info available. #29 0x00007f1f31fde8bd in _tevent_loop_once () from /usr/lib/libtevent.so.0 No symbol table info available. #30 0x00007f1f31fde93b in tevent_common_loop_wait () from /usr/lib/libtevent.so.0 No symbol table info available. #31 0x00007f1f31fe2646 in ?? () from /usr/lib/libtevent.so.0 No symbol table info available. #32 0x00007f1f362df369 in binary_smbd_main (argc=<optimized out>, argv=<optimized out>, binary_name=<optimized out>) at ../source4/smbd/server.c:512 opt_daemon = <optimized out> opt_interactive = <optimized out> opt = <optimized out> pc = <optimized out> static_init = {0x7f1f34ddbd40 <server_service_auth_init>, 0x7f1f34ddd7b0 <server_service_echo_init>, 0} shared_init = <optimized out> event_ctx = <optimized out> status = <optimized out> model = <optimized out> max_runtime = 0 st = {st_dev = 10, st_ino = 3, st_nlink = 1, st_mode = 8592, st_uid = 0, st_gid = 5, __pad0 = 0, st_rdev = 34816, st_size = 0, st_blksize = 1024, st_blocks = 0, st_atim = {tv_sec = 1389593104, tv_nsec = 936000000}, st_mtim = {tv_sec = 1389593104, tv_nsec = 936000000}, st_ctim = {tv_sec = 1389592355, tv_nsec = 936000000}, __unused = {0, 0, 0}} long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x7f1f323f2d60, val = 0, descrip = 0x7f1f362e1ed8 "Help options:", argDescrip = 0x0}, {longName = 0x7f1f362e1ee6 "daemon", shortName = 68 'D', argInfo = 0, arg = 0x0, val = 1000, descrip = 0x7f1f362e1eed "Become a daemon (default)", argDescrip = 0x0}, {longName = 0x7f1f362e1f07 "interactive", shortName = 105 'i', argInfo = 0, arg = 0x0, val = 1001, descrip = 0x7f1f362e23b8 "Run interactive (not a daemon)", argDescrip = 0x0}, {longName = 0x7f1f362e1f22 "model", shortName = 77 'M', argInfo = 1, arg = 0x0, val = 1002, descrip = 0x7f1f362e1f13 "Select process model", argDescrip = 0x7f1f362e1f28 "MODEL"}, {longName = 0x7f1f362e1f2e "maximum-runtime", shortName = 0 '\000', argInfo = 2, arg = 0x7fff695af5fc, val = 0, descrip = 0x7f1f362e23d8 "set maximum runtime of the server process, till autotermination", argDescrip = 0x7f1f362e1f3e "seconds"}, {longName = 0x7f1f362e1f46 "show-build", shortName = 98 'b', argInfo = 0, arg = 0x0, val = 1003, descrip = 0x7f1f362e1f51 "show build info", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x7f1f364e4d60, val = 0, descrip = 0x7f1f362e1f61 "Common Samba options:", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x7f1f364e4f20, val = 0, descrip = 0x7f1f362e1f77 "Version options:", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}} #33 0x00007f1f31c97c8d in __libc_start_main () from /lib/libc.so.6 No symbol table info available. #34 0x00007f1f362db5f9 in _start () No symbol table info available.
Created attachment 5728 [details] core
Created attachment 5729 [details] samba.log Samba logfile with debug level 12. I've started the samba process with "-i -M single".
in replmd_check_urgent_objectclass objectclass_el=0x0 source4/dsdb/samdb/ldb_modules/repl_meta_data.c:205: objectclass_el->num_values
The dreplsrv server process crashed here because the attribute "objectclass" was not present in the data structure of the following object replicated from the master. It is present on the master though, so it's currently unclear why it didn't get replicated: =============================================================================== # record 1 dn: cn=supergroup\0ADEL:84d4159b-4b1c-4b08-ba30-ded58028a0bd,CN=Deleted Objects,DC=autotest093,DC=local objectClass: top objectClass: group instanceType: 4 whenCreated: 20140111115020.0Z uSNCreated: 5067 objectGUID: 84d4159b-4b1c-4b08-ba30-ded58028a0bd objectSid: S-1-5-21-406819353-333020689-3686141061-1375 sAMAccountName: supergroup groupType: -2147483646 isDeleted: TRUE lastKnownParent: CN=Groups,DC=autotest093,DC=local isRecycled: TRUE cn:: c3VwZXJncm91cApERUw6ODRkNDE1OWItNGIxYy00YjA4LWJhMzAtZGVkNTgwMjhhMGJk name:: c3VwZXJncm91cApERUw6ODRkNDE1OWItNGIxYy00YjA4LWJhMzAtZGVkNTgwMjhhMGJk whenChanged: 20140111115021.0Z uSNChanged: 5069 distinguishedName: cn=supergroup\0ADEL:84d4159b-4b1c-4b08-ba30-ded58028a0bd,CN =Deleted Objects,DC=autotest093,DC=local ===============================================================================
The samba.log shows that is due to an earlier add operation (uSNChanged: 4609) which lacked the objectClass: ============================================================================== DRS replication add message: dn: cn=supergroup\0ADEL:84d4159b-4b1c-4b08-ba30-ded58028a0bd,CN=Deleted Objects,DC=autotest093,DC=local changetype: add cn:: c3VwZXJncm91cApERUw6ODRkNDE1OWItNGIxYy00YjA4LWJhMzAtZGVkNTgwMjhhMGJk instanceType: 4 whenChanged: 20140111115021.0Z uSNCreated: 4609 isDeleted: TRUE uSNChanged: 4609 name:: c3VwZXJncm91cApERUw6ODRkNDE1OWItNGIxYy00YjA4LWJhMzAtZGVkNTgwMjhhMGJk objectGUID: 84d4159b-4b1c-4b08-ba30-ded58028a0bd replPropertyMetaData:: AQAAAAAAAAAIAAAAAAAAAAEAAgABAAAA/MDhCAMAAADvSo7lMKnSRLC iiDtr2TlTyxMAAAAAAAABEgAAAAAAADAAAgABAAAA/cDhCAMAAADvSo7lMKnSRLCiiDtr2TlTzRMA AAAAAAABEgAAAAAAAAEACQACAAAA/cDhCAMAAADvSo7lMKnSRLCiiDtr2TlTzRMAAAAAAAABEgAAA AAAAC4BCQADAAAA/cDhCAMAAADvSo7lMKnSRLCiiDtr2TlTzRMAAAAAAAABEgAAAAAAAA0DCQABAA AA/cDhCAMAAADvSo7lMKnSRLCiiDtr2TlTzRMAAAAAAAABEgAAAAAAAA4DCQACAAAA/cDhCAMAAAD vSo7lMKnSRLCiiDtr2TlTzRMAAAAAAAABEgAAAAAAAAoICQABAAAA/cDhCAMAAADvSo7lMKnSRLCi iDtr2TlTzRMAAAAAAAABEgAAAAAAAAMAAAACAAAA/cDhCAMAAADvSo7lMKnSRLCiiDtr2TlTzRMAA AAAAAABEgAAAAAAAA== lastKnownParent: <GUID=aeb34709-b4cd-47f0-9b7b-e3876e4bcd78>;CN=Groups,DC=auto test093,DC=local isRecycled: TRUE ==============================================================================
ucs-test.log shows 267 delete operations on this opbject and there are 66 cn=supergroup objects under "Deleted Objects" on the DC backup. So, this probably was delete operation #67 of 267, which seems to support the initial assumption that this might be caused by a race condition.
Created attachment 5730 [details] drsuapi_DsGetNCChanges.log This is the list of changes the master sends, and objectClass is not part of that. Looks like the master thinks the backup already got the objectClass: * The master shows highest_usn = 5068 in its uptodateness vector, probably that's the state of replication for backup093. * The current object has some attribtes with USN = 5067 and some with usn = 5069. * The objectClass attribute has originating_usn = 5067, local_usn = 5067, so it isn't considered in the replication. The whole object was created at USN 5067. Simply patching the segfault on the receiver side (backup093) will not fix the general issue here. But it's probably pretty hard to determine the root cause for this situation.
A similar segfault occured again on Bug 33977, which promted an upstream patch. This should fix the segfault also in this case, but as explained above, this doesn't address the original usnChanged issue.
All objects lacking objectClass in hte domain of Ticket# 2014011421000077 have been "\0ADEL:" objects. Significant statistical bias.
This issue has been filed against UCS 3.2 running Samba 4.1.0 UCS 3.2 is out of maintenance and many UCS components have vastly changed in later releases. Thus, this issue is now being closed. If this is still an issue in newer UCS versions, please use "Clone this bug". In this case please provide detailed information on how this issue is affecting you.