Univention Bugzilla – Bug 35741
Test 25reconnect_uldap failed in jenkins
Last modified: 2014-11-26 06:54:47 CET
During autotest-090-master-s3 the instance got stuck while performing: 25reconnect_uldap from the ucs-test.log: *** BEGIN *** ['/usr/bin/python2.7', '25reconnect_uldap'] *** 4274 *** 10_ldap/25reconnect_uldap *** Test reconnect mechanism of uldap ***
The strace for the process give the following endless output: select(0, NULL, NULL, NULL, {0, 4286}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout) ... If you run the script from the shell it also gets stuck after printing: ************************************************************************ Test: connect, stop, - start after 9 seconds, search ************************************************************************ [ ok ] Stopping ldap server(s): slapd ...done. [ ok ] Starting ldap server(s): slapd ...done. [ ok ] Checking Schema ID: ...done. And the endless strace for this process looks the same: select(0, NULL, NULL, NULL, {0, 5463}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout) ...
Created attachment 6077 [details] verbose strace
Created attachment 6078 [details] verbose stdout
r53174: I added a missing wait to a subprocess call.
"gdb -p %1" shows the process with a single thread hanging in /usr/lib/python2.7/dist-packages/ldap/ldapobject.py:766 def reconnect(self,uri): while self._pending_reconnect: time.sleep(0.01) else: self._pending_reconnect = 1 "timeval.tv_usec = 10_000" equals those 0.01s. Adding that .wait()ing for the restart terminates the stall, but the test fails then for the negative check (delaying 11s while only 10s are allowed).
python-ldap_2.4.9 broke it: * ldapobject.ReconnectLDAPObject.reconnect() now does kind of an internal locking to pause other threads while reconnecting is pending. UCS-4.0 uses 2.4.10, which is the latest version in Debian. Debian-RFP 2.4.15: <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=742487> May-be fixed by 2.4.13: <https://mail.python.org/pipermail/python-ldap/2014q1/003327.html> * ldapobject.ReconnectLDAPObject now uses internal class-wide lock for serializing reconnects * Method signature of ReconnectLDAPObject.reconnect() changed to be able to call it with separate retry_max and retry_delay values See <https://mail.python.org/pipermail/python-ldap/2013q3/003265.html> for a patch; this alone does not seem to fix the bug. A quick test with a self-compiled 2.4.15 also failed:The process is then multi-threaded and hangs in some lock. The thread links to <https://bugs.launchpad.net/ldapuserfolder/+bug/650371>.
See Bug #35841 for a patch for python-ldap. The patch is not enough to fix the test case, as some internal working seems to have changes. (In reply to Stefan Gohmann from comment #4) > r53174: I added a missing wait to a subprocess call. this patch needs to be reverted, as it breaks the whole idea of the test case: slapd is (re-)started *while* the main test case is running to test the reconnect behavior. By adding the .wait() the tests runs only after the slapd is fully working again.
r53510 | Bug #35741 test: 10_ldap/25reconnect_uldap: revert subprocess.call.wait r53174 was reverted as Bug #35841 and Bug #35852 have been fixed. Package: ucs-test Version: 5.0.9-1.794.201409100922 Branch: ucs_4.0-0 No changelog as this is only a revert, so no visible source code change.
OK, looks good.
UCS 4.0-0 has been released: http://docs.univention.de/release-notes-4.0-0-en.html http://docs.univention.de/release-notes-4.0-0-de.html If this error occurs again, please use "Clone This Bug".