Bug 35741 - Test 25reconnect_uldap failed in jenkins
Test 25reconnect_uldap failed in jenkins
Status: CLOSED FIXED
Product: UCS Test
Classification: Unclassified
Component: General
unspecified
Other Linux
: P5 normal (vote)
: UCS 4.0
Assigned To: Philipp Hahn
Stefan Gohmann
: interim-2
Depends on: 35841 35852
Blocks:
  Show dependency treegraph
 
Reported: 2014-08-28 13:34 CEST by Alexander Kramer
Modified: 2014-11-26 06:54 CET (History)
2 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
verbose strace (241.12 KB, text/plain)
2014-08-28 14:50 CEST, Alexander Kramer
Details
verbose stdout (2.81 KB, text/plain)
2014-08-28 14:51 CEST, Alexander Kramer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Kramer univentionstaff 2014-08-28 13:34:52 CEST
During autotest-090-master-s3 the instance got stuck while performing:
25reconnect_uldap

from the ucs-test.log:
*** BEGIN *** ['/usr/bin/python2.7', '25reconnect_uldap'] ***
4274 *** 10_ldap/25reconnect_uldap *** Test reconnect mechanism of uldap ***
Comment 1 Alexander Kramer univentionstaff 2014-08-28 14:39:59 CEST
The strace for the process give the following endless output:
select(0, NULL, NULL, NULL, {0, 4286})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
...

If you run the script from the shell it also gets stuck after printing:
************************************************************************
Test: connect, stop,  - start after 9 seconds, search
************************************************************************
[ ok ] Stopping ldap server(s): slapd ...done.
[ ok ] Starting ldap server(s): slapd ...done.
[ ok ] Checking Schema ID: ...done.

And the endless strace for this process looks the same:
select(0, NULL, NULL, NULL, {0, 5463})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
...
Comment 2 Alexander Kramer univentionstaff 2014-08-28 14:50:31 CEST
Created attachment 6077 [details]
verbose strace
Comment 3 Alexander Kramer univentionstaff 2014-08-28 14:51:00 CEST
Created attachment 6078 [details]
verbose stdout
Comment 4 Stefan Gohmann univentionstaff 2014-08-30 06:37:27 CEST
r53174: I added a missing wait to a subprocess call.
Comment 5 Philipp Hahn univentionstaff 2014-09-02 16:00:56 CEST
"gdb -p %1" shows the process with a single thread hanging in

/usr/lib/python2.7/dist-packages/ldap/ldapobject.py:766
  def reconnect(self,uri):
    while self._pending_reconnect:
      time.sleep(0.01)
    else:
      self._pending_reconnect = 1

"timeval.tv_usec = 10_000" equals those 0.01s.

Adding that .wait()ing for the restart terminates the stall, but the test fails then for the negative check (delaying 11s while only 10s are allowed).
Comment 6 Philipp Hahn univentionstaff 2014-09-03 17:33:15 CEST
python-ldap_2.4.9 broke it:
* ldapobject.ReconnectLDAPObject.reconnect() now does kind of
  an internal locking to pause other threads while reconnecting
  is pending.

UCS-4.0 uses 2.4.10, which is the latest version in Debian.
Debian-RFP 2.4.15: <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=742487>

May-be fixed by 2.4.13:
<https://mail.python.org/pipermail/python-ldap/2014q1/003327.html>
* ldapobject.ReconnectLDAPObject now uses internal class-wide
  lock for serializing reconnects
* Method signature of ReconnectLDAPObject.reconnect() changed to be able
  to call it with separate retry_max and retry_delay values

See <https://mail.python.org/pipermail/python-ldap/2013q3/003265.html> for a patch; this alone does not seem to fix the bug.

A quick test with a self-compiled 2.4.15 also failed:The process is then multi-threaded and hangs in some lock.

The thread links to <https://bugs.launchpad.net/ldapuserfolder/+bug/650371>.
Comment 7 Philipp Hahn univentionstaff 2014-09-08 13:57:57 CEST
See Bug #35841 for a patch for python-ldap.
The patch is not enough to fix the test case, as some internal working seems to have changes.

(In reply to Stefan Gohmann from comment #4)
> r53174: I added a missing wait to a subprocess call.
this patch needs to be reverted, as it breaks the whole idea of the test case: slapd is (re-)started *while* the main test case is running to test the reconnect behavior. By adding the .wait() the tests runs only after the slapd is fully working again.
Comment 8 Philipp Hahn univentionstaff 2014-09-10 09:36:40 CEST
r53510 | Bug #35741 test: 10_ldap/25reconnect_uldap: revert subprocess.call.wait
 r53174 was reverted as Bug #35841 and Bug #35852 have been fixed.

Package: ucs-test
Version: 5.0.9-1.794.201409100922
Branch: ucs_4.0-0

No changelog as this is only a revert, so no visible source code change.
Comment 9 Stefan Gohmann univentionstaff 2014-10-01 15:57:39 CEST
OK, looks good.
Comment 10 Stefan Gohmann univentionstaff 2014-11-26 06:54:47 CET
UCS 4.0-0 has been released:
 http://docs.univention.de/release-notes-4.0-0-en.html
 http://docs.univention.de/release-notes-4.0-0-de.html

If this error occurs again, please use "Clone This Bug".