Bug 38174 - 60_umc-system.22_umc-service-proc-killing.test fails
60_umc-system.22_umc-service-proc-killing.test fails
Status: CLOSED FIXED
Product: UCS Test
Classification: Unclassified
Component: UMC
unspecified
Other Linux
: P5 normal (vote)
: ---
Assigned To: Dmitry Galkin
Philipp Hahn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-03-31 16:08 CEST by Philipp Hahn
Modified: 2023-03-25 06:46 CET (History)
0 users

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
Test program for WNOHANG (289 bytes, text/plain)
2015-03-31 16:08 CEST, Philipp Hahn
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2015-03-31 16:08:37 CEST
Created attachment 6793 [details]
Test program for WNOHANG

The test fails some times: <http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-1/job/Autotest%20MultiEnv/SambaVersion=s4,Systemrolle=slave/54/testReport/junit/60_umc-system/22_umc-service-proc-killing/test/>

*** BEGIN *** ['/usr/bin/python', '22_umc-service-proc-killing'] ***
*** 60_umc-system/22_umc-service-proc-killing *** Test the UMC backend process killing ***
*** START TIME: 2015-03-30 21:00:28 ***
Testing UMC process killing with signal 'SIGTERM'
Testing UMC process killing with signal 'SIGKILL'
### FAIL ###
Process exit status is 0x0 instead of SIGKILL(9).
###      ###
Created process with pid '11298' was not terminated, forcing kill
The exit status while force kill is 0x0 instead of 'SIGKILL'(9).
*** END TIME: 2015-03-30 21:00:33 ***
*** TEST DURATION (H:MM:SS.ms): 0:00:05.082807 ***

Please note that the test took ~5s total, while the child process does a sleep(30) before doing an exit(0).


This is caused by the os.wait4(..., NOHANG), which returns immediately when the process was not yet terminated. The attached test program shows this:
> pid=25867 _pid=0, ret=0x0
...
> pid=25867 _pid=25867, ret=0x0

The main process should loop for 30s while _pid==0



I also got the following Traceback during testing with
 while ./22_umc-service-proc-killing -f </dev/null;do :;done

> Testing UMC process killing with signal 'SIGTERM'
> Traceback (most recent call last):
>   File "22_umc-service-proc-killing", line 130, in <module>
>     sys.exit(TestUMC.main())
>   File "22_umc-service-proc-killing", line 119, in main
>     self.test('SIGTERM')
>   File "22_umc-service-proc-killing", line 108, in test
>     self.force_process_kill()
>   File "22_umc-service-proc-killing", line 65, in force_process_kill
>     if self.Proc.is_running():
> AttributeError: 'NoneType' object has no attribute 'is_running'

I had to Ctrl-C the run and then got this:

> ^CCreated process with pid '20068' was not terminated, forcing kill
> The exit status while force kill is 0x100 instead of 'SIGKILL'(9).
> Traceback (most recent call last):
>   File "22_umc-service-proc-killing", line 130, in <module>
>     sys.exit(TestUMC.main())
>   File "22_umc-service-proc-killing", line 119, in main
>     self.test('SIGTERM')
>   File "22_umc-service-proc-killing", line 96, in test
>     if self.query_process_exists(pid):
>   File "22_umc-service-proc-killing", line 49, in query_process_exists
>     request_result = self.make_top_query_request()
>   File "./TestUMCSystemModule.py", line 125, in make_top_query_request
>     return self.make_query_request('top')
>   File "./TestUMCSystemModule.py", line 114, in make_query_request
>     options, flavor)
>   File "/usr/lib/pymodules/python2.7/univention/lib/umc_connection.py", line 132, in request
>     response = con.getresponse()
>   File "/usr/lib/python2.7/httplib.py", line 1034, in getresponse
>     response.begin()
>   File "/usr/lib/python2.7/httplib.py", line 407, in begin
>     version, status, reason = self._read_status()
>   File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
>     line = self.fp.readline()
>   File "/usr/lib/python2.7/socket.py", line 430, in readline
>     data = recv(1)
>   File "/usr/lib/python2.7/ssl.py", line 241, in recv
>     return self.read(buflen)
>   File "/usr/lib/python2.7/ssl.py", line 160, in read
>     return self._sslobj.read(len)
> KeyboardInterrupt
Comment 1 Philipp Hahn univentionstaff 2015-03-31 18:29:05 CEST
First change commited:
 r59520 | Bug #38174 test: Wait for child exit
Comment 2 Philipp Hahn univentionstaff 2015-04-13 08:26:03 CEST
Failed again: <http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-1/job/Autotest%20MultiEnv/SambaVersion=s4,Systemrolle=backup/67/testReport/junit/60_umc-system/22_umc-service-proc-killing/test/>:
> [  0.388]Testing UMC process killing with signal 'SIGTERM'
> [  3.810]Testing UMC process killing with signal 'SIGKILL'
> [  5.084]### FAIL ###
> [  5.084]Process exit status is 0x0 instead of SIGKILL(9).
> [  5.084]###      ###
> [  5.084]Created process with pid '25411' was not terminated, forcing kill

Can you please habe a look?
Maybe
- compare with "self.Proc.pid" with "child",
- print and delay if they are not equal,
- only exit the loop if they're equal or the max-count is reached

I have the nagging fealing that Python/UMC sometimes forkes a child process and the test then collects that process.
Comment 3 Dmitry Galkin univentionstaff 2015-04-17 16:57:22 CEST
(In reply to Philipp Hahn from comment #2)
> Failed again:
> <http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-1/job/
> Autotest%20MultiEnv/SambaVersion=s4,Systemrolle=backup/67/testReport/junit/
> 60_umc-system/22_umc-service-proc-killing/test/>:
> > [  0.388]Testing UMC process killing with signal 'SIGTERM'
> > [  3.810]Testing UMC process killing with signal 'SIGKILL'
> > [  5.084]### FAIL ###
> > [  5.084]Process exit status is 0x0 instead of SIGKILL(9).
> > [  5.084]###      ###
> > [  5.084]Created process with pid '25411' was not terminated, forcing kill
> 
> Can you please habe a look?

As I understand, this line in 'def test':
_pid, exit_status, _res_usage = wait4(pid, WNOHANG)

gives us (pid, 0, 0) due to WNOHANG option when "no child process status is available immediately".

this is confirmed by the fact, that test fails on comparison with 0 status and total execution time far less than 30 seconds, i.e. that's not the child's 0 exit status, but the result of 'wait4 with WNOHANG'.

It happens rarely, I assume sometimes the test executed faster than the UMC terminates the process actually (it however, reports 'success' in response).

To confirm:
r59928:
  * 60_umc-system/22_umc-service-proc-killing: wait a bit after UMC request
    to terminate process was done (Bug #38174).

If it will always work from now -> I think we can delete the WNOHANG option from wait4.


> Maybe
> - compare with "self.Proc.pid" with "child",
> - print and delay if they are not equal,
> - only exit the loop if they're equal or the max-count is reached

As of now, the test won't fail in the 'force_process_kill' function, so this would be executed due to 'finally' statement, but test would still be considered failed.
Comment 4 Philipp Hahn univentionstaff 2015-04-19 21:39:51 CEST
(In reply to Dmitry Galkin from comment #3)
> (In reply to Philipp Hahn from comment #2)
> > Failed again:
> > <http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-1/job/
> > Autotest%20MultiEnv/SambaVersion=s4,Systemrolle=backup/67/testReport/junit/
> > 60_umc-system/22_umc-service-proc-killing/test/>:
> > > [  0.388]Testing UMC process killing with signal 'SIGTERM'
> > > [  3.810]Testing UMC process killing with signal 'SIGKILL'
> > > [  5.084]### FAIL ###
> > > [  5.084]Process exit status is 0x0 instead of SIGKILL(9).
> > > [  5.084]###      ###
> > > [  5.084]Created process with pid '25411' was not terminated, forcing kill
> > 
> > Can you please habe a look?
> 
> As I understand, this line in 'def test':
> _pid, exit_status, _res_usage = wait4(pid, WNOHANG)
> 
> gives us (pid, 0, 0) due to WNOHANG option when "no child process status is
> available immediately".
> 
> this is confirmed by the fact, that test fails on comparison with 0 status
> and total execution time far less than 30 seconds, i.e. that's not the
> child's 0 exit status, but the result of 'wait4 with WNOHANG'.
> 
> It happens rarely, I assume sometimes the test executed faster than the UMC
> terminates the process actually (it however, reports 'success' in response).
> 
> To confirm:
> r59928:
>   * 60_umc-system/22_umc-service-proc-killing: wait a bit after UMC request
>     to terminate process was done (Bug #38174).

Please have a look at r59520 and perhaps adapt that change to the 2nd instnce where WNOHANG is used.
Bonus points for moving that to a separate method.

> If it will always work from now -> I think we can delete the WNOHANG option
> from wait4.

WONOHANG is used to not wait infinitely if something goes wrong with killing the test process: The test should always terminate after 30s.
Comment 5 Dmitry Galkin univentionstaff 2015-04-20 15:20:54 CEST
As discussed:

r59977:
  * 60_umc-system/22_umc-service-proc-killing updated:
    More debug output; do not wait too much when the process
    is killed during the clean-up; wait when terminated via UMC (Bug #38174).
Comment 6 Philipp Hahn univentionstaff 2015-04-21 17:47:23 CEST
OK: r59977
OK: ./22_umc-service-proc-killing  -vvf