Univention Bugzilla – Bug 37299
Test 10_ldap.28reconnect_univention-ldapsearch.test failed in jenkins
Last modified: 2023-03-25 06:55:23 CET
The test fails indeterministically. Happens in various server roles and samba configurations. I am quite sure that everything is okay with this UCS installation.
Please have a look. It looks like a timing issue.
(In reply to Stefan Gohmann from comment #1) > Please have a look. It looks like a timing issue. A bit hard to reproduce as it mostly works, but the problem seems to be that slapd needs a bit of time to start actually. Added a bit of sleep time for each 'slapd start', after that test never failed in my envs. r56898: * 10_ldap/28reconnect_univention-ldapsearch: wait a bit for slapd to be started (Bug #37299).
(In reply to Dmitry Galkin from comment #2) > Added a bit of sleep time for > each 'slapd start', after that test never failed in my envs. Btw, I've noticed that in '10_ldap/25reconnect_uldap' test there is a '_wait_for_slapd_to_be_started' function with a 5 sec sleep in it.
In the latest Jenkins run, script "10_ldap/28reconnect_univention-ldapsearch" failed on a multi server environment: http://jenkins.knut.univention.de:8080/job/UCSschool%204.0/job/UCSschool%204.0%20Multiserver/SambaVersion=s4-only-master/lastCompletedBuild/testReport/10_ldap/28reconnect_univention-ldapsearch/test/ ldap_start_tls: Can't contact LDAP server (-1) ldap_start_tls: Can't contact LDAP server (-1) ldap_start_tls: Can't contact LDAP server (-1) ldap_start_tls: Can't contact LDAP server (-1) ldap_start_tls: Can't contact LDAP server (-1) ldap_start_tls: Can't contact LDAP server (-1) ldap_start_tls: Can't contact LDAP server (-1) ldap_start_tls: Can't contact LDAP server (-1) error 2014-12-16 06:16:45 Search was not successful: ldap/client/retry/count=11 and slapd restart delay is 9 error 2014-12-16 06:16:45 **************** Test failed above this line (1) **************** and on a UCS 4.0 single server also failed: 1st try: Stopping ldap server(s): slapd ...done. ldap_start_tls: Can't contact LDAP server (-1) error 2014-12-17 06:13:58 First default test with a delay of 5 seconds failed error 2014-12-17 06:13:58 **************** Test failed above this line (1) **************** 2nd try: Starting ldap server(s): slapd ...done. Checking Schema ID: ...done. Setting ldap/client/retry/count Stopping ldap server(s): slapd ...done. ldap_start_tls: Can't contact LDAP server (-1) error 2014-12-17 06:14:47 Search was not successful: ldap/client/retry/count=11 and slapd restart delay is 9 error 2014-12-17 06:14:47 **************** Test failed above this line (1) **************** A suggestion regarding wait_for_slapd(): to make sure that slapd is started, instead of using `sleep`, check if slapd is running using something like `pgrep slapd`, `ps -ef | grep slapd`, or any similar method in a waiting loop.
(In reply to Ammar Najjar from comment #4) > In the latest Jenkins run, script > "10_ldap/28reconnect_univention-ldapsearch" failed on a multi server > environment: Ammar, I've checked all the most recent ucs@school test results, including: UCS 4.0 Multiserver all tests: http://jenkins.knut.univention.de:8080/job/UCSschool%204.0/job/UCSschool%204.0%20Multiserver/13/testReport/ UCS 4.0 Singleserver all tests: http://jenkins.knut.univention.de:8080/job/UCSschool%204.0/job/UCSschool%204.0%20Singleserver/13/testReport/ --> and this test passed in there. I guess, you could have looked into the following test results: UCS 4.0 Multiserver all tests: http://jenkins.knut.univention.de:8080/job/UCSschool%204.0/job/UCSschool%204.0%20Multiserver/12/testReport/ UCS 4.0 Singleserver all tests: http://jenkins.knut.univention.de:8080/job/UCSschool%204.0/job/UCSschool%204.0%20Singleserver/12/testReport/ where both were run with the svn revison 56783, but the commit was done with r56898 and you can also see the commit time (2014-12-16 14:56:52) vs. the jenkins job run time (16.12.2014 11:39:49) -> job was started before the actual commit... anyway, I've added pgrepping for slapd after start as an extra-check: r57008 | dgalkin | 2014-12-19 11:52:12 +0100 (Fr, 19. Dez 2014) | 3 Zeilen * 10_ldap/28reconnect_univention-ldapsearch: added pgrep for slapd after its restart (Bug #37299).
The test still fails nondeterministically, on different roles. Perhaps any suggestions?
(In reply to Dmitry Galkin from comment #6) > The test still fails nondeterministically, on different roles. Perhaps any > suggestions? Most probably a timing problem: I guess the slapd restart sometimes take a few seconds more to start due to VM load or load by other processes. Wait longer for slapd to be running again? At least add more debug output with time-stamps to help diagnose failures?
(In reply to Philipp Hahn from comment #7) > Most probably a timing problem: I guess the slapd restart sometimes take a > few seconds more to start due to VM load or load by other processes. > > Wait longer for slapd to be running again? > At least add more debug output with time-stamps to help diagnose failures? Sometimes actually it fails without performing the 'ldap/client/retry/count' number of attempts, like in here: http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-0/job/Autotest%20MultiEnv/110/SambaVersion=s4,Systemrolle=slave/testReport/junit/10_ldap/28reconnect_univention-ldapsearch/test/ where the actual 'ldap/client/retry/count' is 10 initially, test seem to fail after just one ldapsearch attempt. Added more debug output for now.
(In reply to Dmitry Galkin from comment #8) > Sometimes actually it fails without performing the 'ldap/client/retry/count' > number of attempts, > > where the actual 'ldap/client/retry/count' is 10 initially, test seem to > fail after just one ldapsearch attempt. Added more debug output for now. It seems that univention-ldapsearch sometimes starts before the slapd being completely stopped --> thus search is interrupted with slapd shutdown and test fails. I've changed slapd stop to force-stop and univention-ldapsearch is now started only when server is stopped + 2 secs sleep. Also adjusted some other timings. r57347: * Bug #37299: 10_ldap/28reconnect_univention-ldapsearch: don't perform univention-ldapsearch before slapd being stopped. Timings adjustment.
Failed again @ s3-master: <http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-0/job/Autotest%20MultiEnv/121/SambaVersion=s3,Systemrolle=master/testReport/junit/10_ldap/28reconnect_univention-ldapsearch/test/> debug 2015-01-22 18:59:59 Performing univention-ldapsearch ldap_start_tls: Can't contact LDAP server (-1) error 2015-01-22 18:59:59 Search was not successful: ldap/client/retry/count=11 and slapd restart delay is 7
I've re-wrote the test in python: r57569: * Bug #37299: 10_ldap/26reconnect_univention_ldapsearch: switched test to python. (Probably, the part with Process running with timeout can be done better.) The test still fails sometimes, mostly in Amazon EC2 (http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-0/job/Autotest%20MultiEnv/125/SambaVersion=s3,Systemrolle=master/testReport/junit/10_ldap/26reconnect_univention_ldapsearch/test/) and rarely in local dev envs. I've opened Bug #37631 as univention-ldapsearch performs only 1 attempt instead of specified via UCR var.
(In reply to Dmitry Galkin from comment #11) > I've re-wrote the test in python: > > r57569: > * Bug #37299: 10_ldap/26reconnect_univention_ldapsearch: switched test to > python. > > (Probably, the part with Process running with timeout can be done better.) > > > The test still fails sometimes, mostly in Amazon EC2 > (http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-0/job/ > Autotest%20MultiEnv/125/SambaVersion=s3,Systemrolle=master/testReport/junit/ > 10_ldap/26reconnect_univention_ldapsearch/test/) and rarely in local dev > envs. > > I've opened Bug #37631 as univention-ldapsearch performs only 1 attempt > instead of specified via UCR var. OK, if the test case fails and Bug #37631 is the reason, we should disable the test case until Bug #37631 has been fixed. I'll tag Bug #37631 to 4.0-1-errata.
(In reply to Stefan Gohmann from comment #12) > OK, if the test case fails and Bug #37631 is the reason, we should disable > the test case until Bug #37631 has been fixed. I'll tag Bug #37631 to > 4.0-1-errata. r57818: * 10_ldap/26reconnect_univention_ldapsearch: skip test (Bug #37299).
For this bug is no separate QA needed.