Bug 35153 - Add generic timeout for ucs-test cases
Add generic timeout for ucs-test cases
Status: CLOSED FIXED
Product: UCS Test
Classification: Unclassified
Component: Framework
unspecified
Other Linux
: P5 normal (vote)
: UCS 4.0-3-errata
Assigned To: Philipp Hahn
Stefan Gohmann
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-18 09:39 CEST by Philipp Hahn
Modified: 2016-02-12 12:29 CET (History)
2 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2014-06-18 09:39:42 CEST
Sometimes a test hangs due to external problems:
# ps axfu
root      1469  0.0  0.5  79356 21272 ?        S    Jun16   0:01  |       \_ /usr/bin/python /usr/sbin/ucs-test -E dangerous -F junit -l ucs-test.log
root     15098  0.0  0.0  12496  1976 ?        S    Jun16   0:00  |           \_ /bin/bash 28errors
root     16020  0.0  0.1  30660  7488 ?        S    Jun16   0:00  |               \_ /usr/bin/python2.6 /usr/sbin/univention-config-registry unset local/repository repository/online/prefix
root     16021  0.0  0.2  36136  9188 ?        S    Jun16   0:00  |                   \_ /usr/bin/python2.6
# lsof -p 16021
python2.6 16021 root    3u  IPv4  94614      0t0    TCP member096.autotest096.local:54604->download2.software-univention.de:www (ESTABLISHED)

A timeout should be added to /usr/share/ucs-test/runner to terminate a test as failed after a default timeout of 120s, similar to /usr/bin/timeout.
A header should be added to the ucs-test meta data, which can be used to specify a different timeout.

Histogram of run time over "UCS 3.2 Autotest MultiEnv":
$ sed -rne 's/.*time="([0-9]*)[0-9](\.[0-9]+)?".*/\1_/p' SambaVersion/*/Systemrolle/*/test-reports/*/* | sort -n | uniq -c
   5672 _
    363 1_
    122 2_
    111 3_
     96 4_
     54 5_
     46 6_
     43 7_
     41 8_
     21 9_
     23 10_
     17 11_
     12 12_
     15 13_
      8 14_
      7 15_
     10 16_
      6 17_
     12 18_
     18 19_
      2 20_
      3 21_
      7 22_
      3 23_
      4 24_
      4 25_
     14 26_
      2 28_
      1 29_
      5 30_
      3 31_
      2 32_
      3 33_
      2 34_
      2 35_
      1 36_
      3 39_
      1 43_
      1 44_
      1 46_
      8 55_
      4 56_
      2 58_
      1 68_
      3 69_
      1 71_
      1 74_
      1 77_
      1 82_
      2 89_
      2 90_
      1 92_
      1 93_
      1 134_
      1 165_
      1 234_
Comment 1 Stefan Gohmann univentionstaff 2014-09-17 08:57:22 CEST
I think it is difficult to determine how long such a test will be run.
Comment 2 Philipp Hahn univentionstaff 2014-12-09 09:09:18 CET
I still think we should add a maximum default timeout of "~5 minutes"/"single test", which can be over-written on a case-by-case basis using a new tag.
If a long running test is written, then ## timeout: 99999 should be used.
Comment 3 Philipp Hahn univentionstaff 2015-05-11 08:22:58 CEST
(In reply to Stefan Gohmann from comment #1)
> I think it is difficult to determine how long such a test will be run.

Another case of a test being stuck and running for 2 days by now:

 9401 ?        R    3049:07  |                   |       \_ /usr/bin/python /usr/bin/ldapsearch-wrapper

# ps u 9316
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      9316  0.0  0.0  12656  3424 ?        S    Mai08   0:00 /bin/bash 20grouplist
# date
Mo 11. Mai 02:20:04 EDT 2015

I'm just asking for a default timeout of 1h with the possibility to overwrite that default with a ucs-test tag if the test is supposed to run up to "a day/month/year/century"
Comment 4 Philipp Hahn univentionstaff 2015-07-22 09:48:13 CEST
# date
Mi 22. Jul 03:03:11 EDT 2015
# ps xfu
root      4854  0.0  1.1  90964 42744 ?        S    Jul20   0:17  |       \_ /usr/bin/python /usr/sbin/ucs-test -E d
root     25343  0.0  0.0  13176  3864 ?        S    Jul20   0:05  |           \_ /bin/bash 41samba-tool_user_passwor
root     18494  0.0  0.0  13176  2692 ?        S    03:02   0:00  |               \_ /bin/bash 41samba-tool_user_pas
root     18495  0.2  0.0  12640  3300 ?        S    03:02   0:00  |                   \_ /bin/bash /usr/bin/univenti
root     18535  0.0  0.0   7012   428 ?        S    03:03   0:00  |                   |   \_ sleep 1
root     18496  0.0  0.0  11352  1272 ?        S    03:02   0:00  |                   \_ sed -n s/^sambaPwdLastSet: 

slapd crashed because of OOM,
Listener fails to start because of failed.ldif,
test hangs since 2 days.

I still recommend to set a default time-out of 1h for every single test and to add a test-tag to specify a different timeout for a specific test, possible with 0=infinite.
Comment 5 Stefan Gohmann univentionstaff 2015-07-22 10:35:32 CEST
(In reply to Philipp Hahn from comment #4)
> slapd crashed because of OOM,
> Listener fails to start because of failed.ldif,
> test hangs since 2 days.
> 
> I still recommend to set a default time-out of 1h for every single test and
> to add a test-tag to specify a different timeout for a specific test,
> possible with 0=infinite.

→ Reopen
Comment 6 Philipp Hahn univentionstaff 2015-07-23 10:24:54 CEST
r62347 | Bug #35153 test: Add timeout mechanism
 Implement global --timeout parameter and test-tag '## timeout: '
 Default timeout 1h, 0=infinity

Package: ucs-test
Version: 5.0.160-1.1099.201507231018
Branch: ucs_4.0-0
Scope: errata4.0-2


Test:
echo /Td6WFoAAATm1rRGAgAhARYAAAB0L+Wj4AH/ALldAGOcPA/ijPqlg7acbYUIWGLStQ+W0/5wGJ+jRoE3qucyWuaHpnDwewQd1fJAXgbtNRvhHjBtgG3MxAcuzTeedsJUpv0zRSRBYf8FZrG8sGqiMrA1GyvX3YMEBqbzLwZF1keM+XZ0NWIzgjSUob4zs8Tdpl3xyfoUWYF3ZLNOwhVI0Hu3DohHuqVrX5Dx4zuFOdTlIeNOioMlDi29VtthtXgq8JaEnuu4Vg5NKMvG8pqQFQDkAouvDRoAAAAAAHI0DANdM3r4AAHVAYAEAAAWbpZWscRn+wIAAAAABFla | base64 -d | xz -d | cpio -i --make-directories
ucs-test -vv -s test -t 0 # 01 fails, 02 succeeds
ucs-test -vv -s test -t 2 # both fail
ucs-test -vv -s test # -t 3600 # 01 fails, 02 succeeds
Comment 7 Philipp Hahn univentionstaff 2015-07-24 09:45:03 CEST
FYI: When a test is killed, it has no chance to perform its clean-up. This leads to the situation, were subsequent tests fail because (for example) the licence limit is exceeded.
Comment 8 Stefan Gohmann univentionstaff 2015-11-05 20:48:36 CET
I had two long running tests. These tests were killed after about five hours. The user import ended after five hours. I added
## timeout: 0
to the test cases (r65212), but they were again killed (signal 15) after about five hours.

I added now '-t 0' to the test call (r65214) and it works now.
Comment 9 Philipp Hahn univentionstaff 2015-11-06 10:58:50 CET
(In reply to Stefan Gohmann from comment #8)
> I had two long running tests. These tests were killed after about five
> hours. The user import ended after five hours. I added
> ## timeout: 0
> to the test cases (r65212), but they were again killed (signal 15) after
> about five hours.
> 
> I added now '-t 0' to the test call (r65214) and it works now.

r65269 | Bug #35153 test: Fix timeout handling
 YAML converts '0' to 0, which is treated as False.

Package: ucs-test
Version: 5.0.174-1.1329.201511061053
Branch: ucs_4.0-0
Scope: ucs4.0-4

Package: ucs-test
Version: 6.0.16-1.1328.201511061053
Branch: ucs_4.1-0
Comment 10 Philipp Hahn univentionstaff 2015-12-11 09:27:00 CET
r66268 | Bug #35153 test: Kill process group on timeout                                                             
 Also kill forked child processes and always terminate even when pipes are kept open

Package: ucs-test
Version: 6.0.27-13.1354.201512101751
Branch: ucs_4.1-0
Scope: errata4.1-0
Comment 11 Stefan Gohmann univentionstaff 2016-02-01 07:55:35 CET
the timeout works as expected.
Comment 12 Janek Walkenhorst univentionstaff 2016-02-12 12:29:07 CET
Released to unmaintained errata.