Univention Bugzilla – Bug 42662
Sometimes object are not properly moved during a mass move
Last modified: 2019-01-03 07:11:21 CET
Created attachment 8094 [details] /var/lib/univention-ldap/listener/listener of a LDAP slave During mass move operations we observed, that the source object of a move operation is not always removed on systems that replicate from a LDAP backup instance. The failed move operation is reported with a 'move_to without history' error log line in the listener.log of slave and member systems. The move changes only the first part of the DN (uid in this case), so source and destination of the move is in the same container and are visible on all affected LDAP instances. The moved objects have been created by a mass import of a LDIF file with slapadd and the objects still have different entryUUID values in different LDAP directories. The behaviour described above could not be reproduced on LDAP backup instances and could not be reproduced for objects that don't have the entryUUID mismatch.
Created attachment 8095 [details] listener.log of a LDAP slave
One addition: All affected object are not in the local listener cache because they are filtered.
Via the filter mechanism implemented for Bug 38823? The wording of the original feature request suggests that this was intended for use "on member server instances" and not on LDAP slaves, let alone DC Backups.
The described problem has been reproduced by the customer after the entryUUID for the affected objects have been synced.
Created attachment 8098 [details] listener.log and transaction of a backup instance
I was able to reproduce in on slaves the described in my test environment with a probability of about 50% to 100%. The LDAP has been populated using slapadd -q and a LDIF file with 30k objects on each system. Slapd, notifier and listener were stopped until the last slapadd finished. This results in objects which have the same DN in the complete environment but different entryUUID. The mass rename operation is then started using a script that first enforces a entryUUID sync as first step (MOD_REPLACE on univentionObjectType) and in the second step does the move of the object. None of the affected objects is in the listener cache due to 'listener/cache/filter' settings. I was not able to reproduce this on a slave connected to the master for replication. I was also unable to reproduce this on a slave if the listener is stopped during the moves are executed and restarted after the used backup instance finished replication.
Created attachment 8402 [details] listener.log of a move_to without history
I'm trying to catch up with the status here... Comment 2 says: > One addition: All affected object are not in the local listener cache because they are filtered. Ok, so if they are not cached then it's always a move without history, or where should the Listener get the history from? The current listener implementation doesn't look at the local LDAP. I guess it should though, if the listener cache has been disabled effectively). An a question regarding Comment 8: > Breakpoint 1, change_update_dn (trans=0x7fffffffddb0) at change.c:787 > 787 if (rv == LDAP_NO_SUCH_OBJECT) { > $41 = 0 > $42 = 114 'r' Can you elaborate what $41 and $42 refer to here? From the code context I can only guess that * $41 refers to rv * $42 refers to trans->prev.notify.command > So the Slave successfully retrieved the entry from the backup - while it > should be gone there already as the listener only writes its cascaded > transaction log *after* replication.py has finished updating the local LDAP. > > LMDB is described as "Fully-transactional, full ACID semantics with MVCC" > - is this a ACID problem in OpenLDAP? If we really would have to dig in that direction and we assume that LMDB itself is ACID compliant, I could remotely imagine two issues in the way LMDB is used in UCS/OpenLDAP: a) The translog overlay executes before the LMDB transation is committed. b) The Listener LDAP search connection holds a long running LMDB transaction, in which case it would not see updates, theoretically. I guess the first race condition could be checked in an experimental setup in which the translog overlay is adjusted to sleep 300 seconds before continuing. I would be very surprised if b) would be the case. Many use cases of OpenLDAP would potentially suffer from that.
This issue has been filled against UCS 3.3. The maintenance with bug and security fixes for UCS 3.3 has ended on 31st of December 2016. Customers still on UCS 3.3 are encouraged to update to UCS 4.3. Please contact your partner or Univention for any questions. If this issue still occurs in newer UCS versions, please use "Clone this bug" or simply reopen the issue. In this case please provide detailed information on how this issue is affecting you.