Univention Bugzilla – Bug 38174
60_umc-system.22_umc-service-proc-killing.test fails
Last modified: 2023-03-25 06:46:29 CET
Created attachment 6793 [details] Test program for WNOHANG The test fails some times: <http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-1/job/Autotest%20MultiEnv/SambaVersion=s4,Systemrolle=slave/54/testReport/junit/60_umc-system/22_umc-service-proc-killing/test/> *** BEGIN *** ['/usr/bin/python', '22_umc-service-proc-killing'] *** *** 60_umc-system/22_umc-service-proc-killing *** Test the UMC backend process killing *** *** START TIME: 2015-03-30 21:00:28 *** Testing UMC process killing with signal 'SIGTERM' Testing UMC process killing with signal 'SIGKILL' ### FAIL ### Process exit status is 0x0 instead of SIGKILL(9). ### ### Created process with pid '11298' was not terminated, forcing kill The exit status while force kill is 0x0 instead of 'SIGKILL'(9). *** END TIME: 2015-03-30 21:00:33 *** *** TEST DURATION (H:MM:SS.ms): 0:00:05.082807 *** Please note that the test took ~5s total, while the child process does a sleep(30) before doing an exit(0). This is caused by the os.wait4(..., NOHANG), which returns immediately when the process was not yet terminated. The attached test program shows this: > pid=25867 _pid=0, ret=0x0 ... > pid=25867 _pid=25867, ret=0x0 The main process should loop for 30s while _pid==0 I also got the following Traceback during testing with while ./22_umc-service-proc-killing -f </dev/null;do :;done > Testing UMC process killing with signal 'SIGTERM' > Traceback (most recent call last): > File "22_umc-service-proc-killing", line 130, in <module> > sys.exit(TestUMC.main()) > File "22_umc-service-proc-killing", line 119, in main > self.test('SIGTERM') > File "22_umc-service-proc-killing", line 108, in test > self.force_process_kill() > File "22_umc-service-proc-killing", line 65, in force_process_kill > if self.Proc.is_running(): > AttributeError: 'NoneType' object has no attribute 'is_running' I had to Ctrl-C the run and then got this: > ^CCreated process with pid '20068' was not terminated, forcing kill > The exit status while force kill is 0x100 instead of 'SIGKILL'(9). > Traceback (most recent call last): > File "22_umc-service-proc-killing", line 130, in <module> > sys.exit(TestUMC.main()) > File "22_umc-service-proc-killing", line 119, in main > self.test('SIGTERM') > File "22_umc-service-proc-killing", line 96, in test > if self.query_process_exists(pid): > File "22_umc-service-proc-killing", line 49, in query_process_exists > request_result = self.make_top_query_request() > File "./TestUMCSystemModule.py", line 125, in make_top_query_request > return self.make_query_request('top') > File "./TestUMCSystemModule.py", line 114, in make_query_request > options, flavor) > File "/usr/lib/pymodules/python2.7/univention/lib/umc_connection.py", line 132, in request > response = con.getresponse() > File "/usr/lib/python2.7/httplib.py", line 1034, in getresponse > response.begin() > File "/usr/lib/python2.7/httplib.py", line 407, in begin > version, status, reason = self._read_status() > File "/usr/lib/python2.7/httplib.py", line 365, in _read_status > line = self.fp.readline() > File "/usr/lib/python2.7/socket.py", line 430, in readline > data = recv(1) > File "/usr/lib/python2.7/ssl.py", line 241, in recv > return self.read(buflen) > File "/usr/lib/python2.7/ssl.py", line 160, in read > return self._sslobj.read(len) > KeyboardInterrupt
First change commited: r59520 | Bug #38174 test: Wait for child exit
Failed again: <http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-1/job/Autotest%20MultiEnv/SambaVersion=s4,Systemrolle=backup/67/testReport/junit/60_umc-system/22_umc-service-proc-killing/test/>: > [ 0.388]Testing UMC process killing with signal 'SIGTERM' > [ 3.810]Testing UMC process killing with signal 'SIGKILL' > [ 5.084]### FAIL ### > [ 5.084]Process exit status is 0x0 instead of SIGKILL(9). > [ 5.084]### ### > [ 5.084]Created process with pid '25411' was not terminated, forcing kill Can you please habe a look? Maybe - compare with "self.Proc.pid" with "child", - print and delay if they are not equal, - only exit the loop if they're equal or the max-count is reached I have the nagging fealing that Python/UMC sometimes forkes a child process and the test then collects that process.
(In reply to Philipp Hahn from comment #2) > Failed again: > <http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-1/job/ > Autotest%20MultiEnv/SambaVersion=s4,Systemrolle=backup/67/testReport/junit/ > 60_umc-system/22_umc-service-proc-killing/test/>: > > [ 0.388]Testing UMC process killing with signal 'SIGTERM' > > [ 3.810]Testing UMC process killing with signal 'SIGKILL' > > [ 5.084]### FAIL ### > > [ 5.084]Process exit status is 0x0 instead of SIGKILL(9). > > [ 5.084]### ### > > [ 5.084]Created process with pid '25411' was not terminated, forcing kill > > Can you please habe a look? As I understand, this line in 'def test': _pid, exit_status, _res_usage = wait4(pid, WNOHANG) gives us (pid, 0, 0) due to WNOHANG option when "no child process status is available immediately". this is confirmed by the fact, that test fails on comparison with 0 status and total execution time far less than 30 seconds, i.e. that's not the child's 0 exit status, but the result of 'wait4 with WNOHANG'. It happens rarely, I assume sometimes the test executed faster than the UMC terminates the process actually (it however, reports 'success' in response). To confirm: r59928: * 60_umc-system/22_umc-service-proc-killing: wait a bit after UMC request to terminate process was done (Bug #38174). If it will always work from now -> I think we can delete the WNOHANG option from wait4. > Maybe > - compare with "self.Proc.pid" with "child", > - print and delay if they are not equal, > - only exit the loop if they're equal or the max-count is reached As of now, the test won't fail in the 'force_process_kill' function, so this would be executed due to 'finally' statement, but test would still be considered failed.
(In reply to Dmitry Galkin from comment #3) > (In reply to Philipp Hahn from comment #2) > > Failed again: > > <http://jenkins.knut.univention.de:8080/job/UCS-4.0/job/UCS-4.0-1/job/ > > Autotest%20MultiEnv/SambaVersion=s4,Systemrolle=backup/67/testReport/junit/ > > 60_umc-system/22_umc-service-proc-killing/test/>: > > > [ 0.388]Testing UMC process killing with signal 'SIGTERM' > > > [ 3.810]Testing UMC process killing with signal 'SIGKILL' > > > [ 5.084]### FAIL ### > > > [ 5.084]Process exit status is 0x0 instead of SIGKILL(9). > > > [ 5.084]### ### > > > [ 5.084]Created process with pid '25411' was not terminated, forcing kill > > > > Can you please habe a look? > > As I understand, this line in 'def test': > _pid, exit_status, _res_usage = wait4(pid, WNOHANG) > > gives us (pid, 0, 0) due to WNOHANG option when "no child process status is > available immediately". > > this is confirmed by the fact, that test fails on comparison with 0 status > and total execution time far less than 30 seconds, i.e. that's not the > child's 0 exit status, but the result of 'wait4 with WNOHANG'. > > It happens rarely, I assume sometimes the test executed faster than the UMC > terminates the process actually (it however, reports 'success' in response). > > To confirm: > r59928: > * 60_umc-system/22_umc-service-proc-killing: wait a bit after UMC request > to terminate process was done (Bug #38174). Please have a look at r59520 and perhaps adapt that change to the 2nd instnce where WNOHANG is used. Bonus points for moving that to a separate method. > If it will always work from now -> I think we can delete the WNOHANG option > from wait4. WONOHANG is used to not wait infinitely if something goes wrong with killing the test process: The test should always terminate after 30s.
As discussed: r59977: * 60_umc-system/22_umc-service-proc-killing updated: More debug output; do not wait too much when the process is killed during the clean-up; wait when terminated via UMC (Bug #38174).
OK: r59977 OK: ./22_umc-service-proc-killing -vvf