Univention Bugzilla – Bug 35153
Add generic timeout for ucs-test cases
Last modified: 2016-02-12 12:29:07 CET
Sometimes a test hangs due to external problems: # ps axfu root 1469 0.0 0.5 79356 21272 ? S Jun16 0:01 | \_ /usr/bin/python /usr/sbin/ucs-test -E dangerous -F junit -l ucs-test.log root 15098 0.0 0.0 12496 1976 ? S Jun16 0:00 | \_ /bin/bash 28errors root 16020 0.0 0.1 30660 7488 ? S Jun16 0:00 | \_ /usr/bin/python2.6 /usr/sbin/univention-config-registry unset local/repository repository/online/prefix root 16021 0.0 0.2 36136 9188 ? S Jun16 0:00 | \_ /usr/bin/python2.6 # lsof -p 16021 python2.6 16021 root 3u IPv4 94614 0t0 TCP member096.autotest096.local:54604->download2.software-univention.de:www (ESTABLISHED) A timeout should be added to /usr/share/ucs-test/runner to terminate a test as failed after a default timeout of 120s, similar to /usr/bin/timeout. A header should be added to the ucs-test meta data, which can be used to specify a different timeout. Histogram of run time over "UCS 3.2 Autotest MultiEnv": $ sed -rne 's/.*time="([0-9]*)[0-9](\.[0-9]+)?".*/\1_/p' SambaVersion/*/Systemrolle/*/test-reports/*/* | sort -n | uniq -c 5672 _ 363 1_ 122 2_ 111 3_ 96 4_ 54 5_ 46 6_ 43 7_ 41 8_ 21 9_ 23 10_ 17 11_ 12 12_ 15 13_ 8 14_ 7 15_ 10 16_ 6 17_ 12 18_ 18 19_ 2 20_ 3 21_ 7 22_ 3 23_ 4 24_ 4 25_ 14 26_ 2 28_ 1 29_ 5 30_ 3 31_ 2 32_ 3 33_ 2 34_ 2 35_ 1 36_ 3 39_ 1 43_ 1 44_ 1 46_ 8 55_ 4 56_ 2 58_ 1 68_ 3 69_ 1 71_ 1 74_ 1 77_ 1 82_ 2 89_ 2 90_ 1 92_ 1 93_ 1 134_ 1 165_ 1 234_
I think it is difficult to determine how long such a test will be run.
I still think we should add a maximum default timeout of "~5 minutes"/"single test", which can be over-written on a case-by-case basis using a new tag. If a long running test is written, then ## timeout: 99999 should be used.
(In reply to Stefan Gohmann from comment #1) > I think it is difficult to determine how long such a test will be run. Another case of a test being stuck and running for 2 days by now: 9401 ? R 3049:07 | | \_ /usr/bin/python /usr/bin/ldapsearch-wrapper # ps u 9316 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 9316 0.0 0.0 12656 3424 ? S Mai08 0:00 /bin/bash 20grouplist # date Mo 11. Mai 02:20:04 EDT 2015 I'm just asking for a default timeout of 1h with the possibility to overwrite that default with a ucs-test tag if the test is supposed to run up to "a day/month/year/century"
# date Mi 22. Jul 03:03:11 EDT 2015 # ps xfu root 4854 0.0 1.1 90964 42744 ? S Jul20 0:17 | \_ /usr/bin/python /usr/sbin/ucs-test -E d root 25343 0.0 0.0 13176 3864 ? S Jul20 0:05 | \_ /bin/bash 41samba-tool_user_passwor root 18494 0.0 0.0 13176 2692 ? S 03:02 0:00 | \_ /bin/bash 41samba-tool_user_pas root 18495 0.2 0.0 12640 3300 ? S 03:02 0:00 | \_ /bin/bash /usr/bin/univenti root 18535 0.0 0.0 7012 428 ? S 03:03 0:00 | | \_ sleep 1 root 18496 0.0 0.0 11352 1272 ? S 03:02 0:00 | \_ sed -n s/^sambaPwdLastSet: slapd crashed because of OOM, Listener fails to start because of failed.ldif, test hangs since 2 days. I still recommend to set a default time-out of 1h for every single test and to add a test-tag to specify a different timeout for a specific test, possible with 0=infinite.
(In reply to Philipp Hahn from comment #4) > slapd crashed because of OOM, > Listener fails to start because of failed.ldif, > test hangs since 2 days. > > I still recommend to set a default time-out of 1h for every single test and > to add a test-tag to specify a different timeout for a specific test, > possible with 0=infinite. → Reopen
r62347 | Bug #35153 test: Add timeout mechanism Implement global --timeout parameter and test-tag '## timeout: ' Default timeout 1h, 0=infinity Package: ucs-test Version: 5.0.160-1.1099.201507231018 Branch: ucs_4.0-0 Scope: errata4.0-2 Test: echo /Td6WFoAAATm1rRGAgAhARYAAAB0L+Wj4AH/ALldAGOcPA/ijPqlg7acbYUIWGLStQ+W0/5wGJ+jRoE3qucyWuaHpnDwewQd1fJAXgbtNRvhHjBtgG3MxAcuzTeedsJUpv0zRSRBYf8FZrG8sGqiMrA1GyvX3YMEBqbzLwZF1keM+XZ0NWIzgjSUob4zs8Tdpl3xyfoUWYF3ZLNOwhVI0Hu3DohHuqVrX5Dx4zuFOdTlIeNOioMlDi29VtthtXgq8JaEnuu4Vg5NKMvG8pqQFQDkAouvDRoAAAAAAHI0DANdM3r4AAHVAYAEAAAWbpZWscRn+wIAAAAABFla | base64 -d | xz -d | cpio -i --make-directories ucs-test -vv -s test -t 0 # 01 fails, 02 succeeds ucs-test -vv -s test -t 2 # both fail ucs-test -vv -s test # -t 3600 # 01 fails, 02 succeeds
FYI: When a test is killed, it has no chance to perform its clean-up. This leads to the situation, were subsequent tests fail because (for example) the licence limit is exceeded.
I had two long running tests. These tests were killed after about five hours. The user import ended after five hours. I added ## timeout: 0 to the test cases (r65212), but they were again killed (signal 15) after about five hours. I added now '-t 0' to the test call (r65214) and it works now.
(In reply to Stefan Gohmann from comment #8) > I had two long running tests. These tests were killed after about five > hours. The user import ended after five hours. I added > ## timeout: 0 > to the test cases (r65212), but they were again killed (signal 15) after > about five hours. > > I added now '-t 0' to the test call (r65214) and it works now. r65269 | Bug #35153 test: Fix timeout handling YAML converts '0' to 0, which is treated as False. Package: ucs-test Version: 5.0.174-1.1329.201511061053 Branch: ucs_4.0-0 Scope: ucs4.0-4 Package: ucs-test Version: 6.0.16-1.1328.201511061053 Branch: ucs_4.1-0
r66268 | Bug #35153 test: Kill process group on timeout Also kill forked child processes and always terminate even when pipes are kept open Package: ucs-test Version: 6.0.27-13.1354.201512101751 Branch: ucs_4.1-0 Scope: errata4.1-0
the timeout works as expected.
Released to unmaintained errata.