Bug 37299 - Test 10_ldap.28reconnect_univention-ldapsearch.test failed in jenkins
Test 10_ldap.28reconnect_univention-ldapsearch.test failed in jenkins
Status: CLOSED FIXED
Product: UCS Test
Classification: Unclassified
Component: LDAP
unspecified
Other Linux
: P5 normal (vote)
: ---
Assigned To: Dmitry Galkin
:
Depends on: 37631
Blocks:
  Show dependency treegraph
 
Reported: 2014-12-11 10:04 CET by Dirk Wiesenthal
Modified: 2023-03-25 06:55 CET (History)
3 users (show)

See Also:
What kind of report is it?: Development Internal
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Wiesenthal univentionstaff 2014-12-11 10:04:31 CET
The test fails indeterministically. Happens in various server roles and samba configurations. I am quite sure that everything is okay with this UCS installation.
Comment 1 Stefan Gohmann univentionstaff 2014-12-15 09:11:45 CET
Please have a look. It looks like a timing issue.
Comment 2 Dmitry Galkin univentionstaff 2014-12-16 15:00:14 CET
(In reply to Stefan Gohmann from comment #1)
> Please have a look. It looks like a timing issue.

A bit hard to reproduce as it mostly works, but the problem seems to be that slapd needs a bit of time to start actually. Added a bit of sleep time for each 'slapd start', after that test never failed in my envs.

r56898:
  * 10_ldap/28reconnect_univention-ldapsearch: wait a bit for slapd
    to be started (Bug #37299).
Comment 3 Dmitry Galkin univentionstaff 2014-12-16 15:04:18 CET
(In reply to Dmitry Galkin from comment #2)
> Added a bit of sleep time for
> each 'slapd start', after that test never failed in my envs.

Btw, I've noticed that in '10_ldap/25reconnect_uldap' test there is a '_wait_for_slapd_to_be_started' function with a 5 sec sleep in it.
Comment 4 Ammar Najjar univentionstaff 2014-12-17 12:17:55 CET
In the latest Jenkins run, script "10_ldap/28reconnect_univention-ldapsearch" failed on a multi server environment:

http://jenkins.knut.univention.de:8080/job/UCSschool%204.0/job/UCSschool%204.0%20Multiserver/SambaVersion=s4-only-master/lastCompletedBuild/testReport/10_ldap/28reconnect_univention-ldapsearch/test/

ldap_start_tls: Can't contact LDAP server (-1)
ldap_start_tls: Can't contact LDAP server (-1)
ldap_start_tls: Can't contact LDAP server (-1)
ldap_start_tls: Can't contact LDAP server (-1)
ldap_start_tls: Can't contact LDAP server (-1)
ldap_start_tls: Can't contact LDAP server (-1)
ldap_start_tls: Can't contact LDAP server (-1)
ldap_start_tls: Can't contact LDAP server (-1)
error 2014-12-16 06:16:45	 Search was not successful: ldap/client/retry/count=11 and slapd restart delay is 9
error 2014-12-16 06:16:45	 **************** Test failed above this line (1) ****************

and on a UCS 4.0 single server also failed:

1st try:

Stopping ldap server(s): slapd ...done.
ldap_start_tls: Can't contact LDAP server (-1)
error 2014-12-17 06:13:58        First default test with a delay of 5 seconds failed
error 2014-12-17 06:13:58        **************** Test failed above this line (1) ****************

2nd try:
Starting ldap server(s): slapd ...done.
Checking Schema ID: ...done.
Setting ldap/client/retry/count
Stopping ldap server(s): slapd ...done.
ldap_start_tls: Can't contact LDAP server (-1)
error 2014-12-17 06:14:47        Search was not successful: ldap/client/retry/count=11 and slapd restart delay is 9
error 2014-12-17 06:14:47        **************** Test failed above this line (1) ****************


A suggestion regarding wait_for_slapd(): to make sure that slapd is started, instead of using `sleep`, check if slapd is running using something like `pgrep slapd`, `ps -ef | grep slapd`, or any similar method in a waiting loop.
Comment 5 Dmitry Galkin univentionstaff 2014-12-19 12:09:34 CET
(In reply to Ammar Najjar from comment #4)
> In the latest Jenkins run, script
> "10_ldap/28reconnect_univention-ldapsearch" failed on a multi server
> environment:

Ammar, I've checked all the most recent ucs@school test results, including:

UCS 4.0 Multiserver all tests:
http://jenkins.knut.univention.de:8080/job/UCSschool%204.0/job/UCSschool%204.0%20Multiserver/13/testReport/

UCS 4.0 Singleserver all tests:
http://jenkins.knut.univention.de:8080/job/UCSschool%204.0/job/UCSschool%204.0%20Singleserver/13/testReport/

--> and this test passed in there.
I guess, you could have looked into the following test results:

UCS 4.0 Multiserver all tests:
http://jenkins.knut.univention.de:8080/job/UCSschool%204.0/job/UCSschool%204.0%20Multiserver/12/testReport/

UCS 4.0 Singleserver all tests:
http://jenkins.knut.univention.de:8080/job/UCSschool%204.0/job/UCSschool%204.0%20Singleserver/12/testReport/

where both were run with the svn revison 56783, but the commit was done with r56898 and you can also see the commit time (2014-12-16 14:56:52) vs. the jenkins job run time (16.12.2014 11:39:49) -> job was started before the actual commit...

anyway, I've added pgrepping for slapd after start as an extra-check:

r57008 | dgalkin | 2014-12-19 11:52:12 +0100 (Fr, 19. Dez 2014) | 3 Zeilen
  * 10_ldap/28reconnect_univention-ldapsearch: added pgrep for slapd after
    its restart (Bug #37299).
Comment 6 Dmitry Galkin univentionstaff 2015-01-12 09:38:15 CET
The test still fails nondeterministically, on different roles. Perhaps any suggestions?
Comment 7 Philipp Hahn univentionstaff 2015-01-12 11:03:58 CET
(In reply to Dmitry Galkin from comment #6)
> The test still fails nondeterministically, on different roles. Perhaps any
> suggestions?

Most probably a timing problem: I guess the slapd restart sometimes take a few seconds more to start due to VM load or load by other processes.

Wait longer for slapd to be running again?
At least add more debug output with time-stamps to help diagnose failures?
Comment 8 Dmitry Galkin univentionstaff 2015-01-12 13:00:07 CET
(In reply to Philipp Hahn from comment #7)
> Most probably a timing problem: I guess the slapd restart sometimes take a
> few seconds more to start due to VM load or load by other processes.
> 
> Wait longer for slapd to be running again?
> At least add more debug output with time-stamps to help diagnose failures?

Sometimes actually it fails without performing the 'ldap/client/retry/count' number of attempts, like in here: http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-0/job/Autotest%20MultiEnv/110/SambaVersion=s4,Systemrolle=slave/testReport/junit/10_ldap/28reconnect_univention-ldapsearch/test/

where the actual 'ldap/client/retry/count' is 10 initially, test seem to fail after just one ldapsearch attempt. Added more debug output for now.
Comment 9 Dmitry Galkin univentionstaff 2015-01-16 12:45:21 CET
(In reply to Dmitry Galkin from comment #8)
> Sometimes actually it fails without performing the 'ldap/client/retry/count'
> number of attempts, 
> 
> where the actual 'ldap/client/retry/count' is 10 initially, test seem to
> fail after just one ldapsearch attempt. Added more debug output for now.

It seems that univention-ldapsearch sometimes starts before the slapd being completely stopped --> thus search is interrupted with slapd shutdown and test fails.

I've changed slapd stop to force-stop and univention-ldapsearch is now started only when server is stopped + 2 secs sleep. Also adjusted some other timings.

r57347:
  * Bug #37299: 10_ldap/28reconnect_univention-ldapsearch: don't perform
    univention-ldapsearch before slapd being stopped. Timings adjustment.
Comment 10 Philipp Hahn univentionstaff 2015-01-23 10:11:40 CET
Failed again @ s3-master: <http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-0/job/Autotest%20MultiEnv/121/SambaVersion=s3,Systemrolle=master/testReport/junit/10_ldap/28reconnect_univention-ldapsearch/test/>

debug 2015-01-22 18:59:59	 Performing univention-ldapsearch
ldap_start_tls: Can't contact LDAP server (-1)
error 2015-01-22 18:59:59	 Search was not successful: ldap/client/retry/count=11 and slapd restart delay is 7
Comment 11 Dmitry Galkin univentionstaff 2015-01-27 12:57:33 CET
I've re-wrote the test in python:

r57569:
  * Bug #37299: 10_ldap/26reconnect_univention_ldapsearch: switched test to
    python.

(Probably, the part with Process running with timeout can be done better.)


The test still fails sometimes, mostly in Amazon EC2 (http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-0/job/Autotest%20MultiEnv/125/SambaVersion=s3,Systemrolle=master/testReport/junit/10_ldap/26reconnect_univention_ldapsearch/test/) and rarely in local dev envs.

I've opened Bug #37631 as univention-ldapsearch performs only 1 attempt instead of specified via UCR var.
Comment 12 Stefan Gohmann univentionstaff 2015-02-04 06:11:21 CET
(In reply to Dmitry Galkin from comment #11)
> I've re-wrote the test in python:
> 
> r57569:
>   * Bug #37299: 10_ldap/26reconnect_univention_ldapsearch: switched test to
>     python.
> 
> (Probably, the part with Process running with timeout can be done better.)
> 
> 
> The test still fails sometimes, mostly in Amazon EC2
> (http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-0/job/
> Autotest%20MultiEnv/125/SambaVersion=s3,Systemrolle=master/testReport/junit/
> 10_ldap/26reconnect_univention_ldapsearch/test/) and rarely in local dev
> envs.
> 
> I've opened Bug #37631 as univention-ldapsearch performs only 1 attempt
> instead of specified via UCR var.

OK, if the test case fails and Bug #37631 is the reason, we should disable the test case until Bug #37631 has been fixed. I'll tag Bug #37631 to 4.0-1-errata.
Comment 13 Dmitry Galkin univentionstaff 2015-02-06 11:24:46 CET
(In reply to Stefan Gohmann from comment #12)
> OK, if the test case fails and Bug #37631 is the reason, we should disable
> the test case until Bug #37631 has been fixed. I'll tag Bug #37631 to
> 4.0-1-errata.

r57818:
  * 10_ldap/26reconnect_univention_ldapsearch: skip test (Bug #37299).
Comment 14 Stefan Gohmann univentionstaff 2016-10-12 07:48:19 CEST
For this bug is no separate QA needed.