Bug 45254 - UMC-Server crashes with "IndexError: pop from empty list" in do_send()
UMC-Server crashes with "IndexError: pop from empty list" in do_send()
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: UMC (Generic)
UCS 4.2
Other Linux
: P5 normal (vote)
: UCS 4.2-2-errata
Assigned To: Florian Best
Johannes Keiser
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-23 19:50 CEST by Stefan Gohmann
Modified: 2017-10-10 15:38 CEST (History)
2 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 3: Will affect average number of installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.360
Enterprise Customer affected?: Yes
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Ticket number:
Bug group (optional): Error handling, Workaround is available
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Gohmann univentionstaff 2017-08-23 19:50:04 CEST
23.08.17 19:22:48.323  LOCALE      ( WARN    ) : Could not find translation file: 'umc-core.mo'
23.08.17 19:22:49.083  MAIN        ( PROCESS ) : Connection timed out.
23.08.17 19:22:49.151  MAIN        ( ERROR   ) : Traceback (most recent call last):
  File "/usr/sbin/univention-management-console-server", line 243, in <module>
    umc_daemon.do_action()
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action
    func(self)
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start
    self.app.run()
  File "/usr/sbin/univention-management-console-server", line 210, in run
    notifier.loop()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop
    step()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 273, in step
    not __sockets[ cond ][ fd ]( sock_obj ):
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 168, in _do_send
    id, first = state.resend_queue.pop(0)
IndexError: pop from empty list
Comment 1 Stefan Gohmann univentionstaff 2017-08-23 19:52:21 CEST
It worked after restarting umc web server and umc server:

root@master40:~# /etc/init.d/univention-management-console-web-server restart
[ ok ] Restarting univention-management-console-web-server (via systemctl): univention-management-console-web-server.service.
root@master40:~# /etc/init.d/univention-management-console-server restart
[ ok ] Restarting univention-management-console-server (via systemctl): univention-management-console-server.service.
root@master40:~# 

Previously, I restarted the UMC server only.
Comment 2 Florian Best univentionstaff 2017-08-24 11:27:58 CEST
I think this is more often triggered now by the fix of Bug #44965.
Comment 3 Florian Best univentionstaff 2017-08-24 11:44:46 CEST
Actually I don't understand how this can happen:

_do_send() is only called twice:
204 »   »   »   »   if not state.resend_queue:
205 »   »   »   »   »   notifier.socket_add(state.socket, self._do_send, notifier.IO_WRITE)
206 »   »   »   »   state.resend_queue.append((msg.id, data[ret:]))


212 »   »   »   notifier.socket_add(state.socket, self._do_send, notifier.IO_WRITE)
213 »   »   »   state.resend_queue.append(data)

in both of the cases state.resend_queue can't be empty.
There is only one place where something is removed from state.resend_queue, and that is the place where the exception happens - in do_send().

If do_send() returns True it is called again by the notifier step() function.
do_send() returns True in 2 places:
178 »   »   »   state.resend_queue.insert(0, (id, first))
179 »   »   »   return True
→ state_resend.queue is not empty!

185 »   »   return (len(state.resend_queue) > 0)
→ state_resend.queue is not empty!

So, what is going on here?
Comment 4 Sönke Schwardt-Krummrich univentionstaff 2017-09-04 10:18:28 CEST
(In reply to Florian Best from comment #2)
> I think this is more often triggered now by the fix of Bug #44965.

A school customer just called me and reported, that this issue happened several times since the fix of bug #44965. I saw the same traceback in the customers logfile.
Comment 5 Sönke Schwardt-Krummrich univentionstaff 2017-09-04 10:27:37 CEST
04.09.17 10:13:09.681  PARSER      ( INFO    ) : The message body is not complete: 2192556 of 2647979 bytes
04.09.17 10:13:09.684  PARSER      ( INFO    ) : The message body is not complete: 2258092 of 2647979 bytes
04.09.17 10:13:09.685  PARSER      ( INFO    ) : The message body is not complete: 2323628 of 2647979 bytes
04.09.17 10:13:09.687  PARSER      ( INFO    ) : The message body is not complete: 2389164 of 2647979 bytes
04.09.17 10:13:09.689  PARSER      ( INFO    ) : The message body is not complete: 2454700 of 2647979 bytes
04.09.17 10:13:09.690  PARSER      ( INFO    ) : The message body is not complete: 2520236 of 2647979 bytes
04.09.17 10:13:09.692  PARSER      ( INFO    ) : The message body is not complete: 2585772 of 2647979 bytes
04.09.17 10:13:09.709  PARSER      ( INFO    ) : UMCP RESPONSE 150451278574914-11775 parsed successfully
04.09.17 10:13:09.709  PROTOCOL    ( INFO    ) : Received UMCP RESPONSE 150451278574914-11775
04.09.17 10:13:10.344  MAIN        ( INFO    ) : The module join is inactive for too long. Sending EXIT request to module
04.09.17 10:13:10.344  MAIN        ( INFO    ) : There are unfinished requests. Waiting for 150450983357357-2035
04.09.17 10:13:10.350  MAIN        ( INFO    ) : The module schoolusers is inactive for too long. Sending EXIT request to module
04.09.17 10:13:10.350  MAIN        ( INFO    ) : There are unfinished requests. Waiting for 150426394903480-1864
04.09.17 10:13:10.507  MAIN        ( INFO    ) : The module udm is inactive for too long. Sending EXIT request to module
04.09.17 10:13:10.507  MAIN        ( INFO    ) : There are unfinished requests. Waiting for 150416396799105-1083
04.09.17 10:13:11.065  MAIN        ( INFO    ) : The module udm is inactive for too long. Sending EXIT request to module
04.09.17 10:13:11.065  MAIN        ( INFO    ) : There are unfinished requests. Waiting for 150408880216911-690
04.09.17 10:13:11.144  MAIN        ( PROCESS ) : Connection timed out.
04.09.17 10:13:11.144  MAIN        ( INFO    ) : The session is shutting down. Sending UMC modules an EXIT request (1 processes)
04.09.17 10:13:11.144  MAIN        ( INFO    ) : Ask module udm to shutdown gracefully
04.09.17 10:13:11.144  PROTOCOL    ( INFO    ) : Sending UMCP EXIT REQUEST 150451279114467-2112
04.09.17 10:13:11.145  MAIN        ( INFO    ) : The session is shutting down
04.09.17 10:13:11.145  MAIN        ( PROCESS ) : Processor: dying
04.09.17 10:13:11.145  MAIN        ( PROCESS ) : ModuleProcess: dying
04.09.17 10:13:11.150  MAIN        ( WARN    ) : Socket died (module=udm)
04.09.17 10:13:11.150  MAIN        ( PROCESS ) : ModuleProcess: child stopped
04.09.17 10:13:11.150  MAIN        ( PROCESS ) : Connection timed out.
04.09.17 10:13:11.150  MAIN        ( INFO    ) : The session is shutting down. Sending UMC modules an EXIT request (1 processes)
04.09.17 10:13:11.150  MAIN        ( INFO    ) : Ask module udm to shutdown gracefully
04.09.17 10:13:11.150  PROTOCOL    ( INFO    ) : Sending UMCP EXIT REQUEST 150451279115089-2113
04.09.17 10:13:11.151  MAIN        ( INFO    ) : The session is shutting down
04.09.17 10:13:11.151  MAIN        ( PROCESS ) : Processor: dying
04.09.17 10:13:11.151  MAIN        ( PROCESS ) : ModuleProcess: dying
04.09.17 10:13:11.156  MAIN        ( WARN    ) : Socket died (module=udm)
04.09.17 10:13:11.156  MAIN        ( PROCESS ) : ModuleProcess: child stopped
04.09.17 10:13:11.159  MAIN        ( INFO    ) : Incoming connection from ::ffff:127.0.0.1:57832
04.09.17 10:13:11.159  MAIN        ( INFO    ) : Established connection: ::ffff:127.0.0.1:57832
04.09.17 10:13:11.159  LOCALE      ( INFO    ) : Locale or domain missing. Stopped loading of translation
04.09.17 10:13:11.159  RESOURCES   ( INFO    ) : Retrieving list of permitted commands
04.09.17 10:13:11.159  RESOURCES   ( INFO    ) : mod='ucr' flavor=None deactivated=False hidden=False
04.09.17 10:13:11.160  RESOURCES   ( INFO    ) : mod='schoolrooms' flavor=None deactivated=False hidden=False
04.09.17 10:13:11.160  RESOURCES   ( INFO    ) : mod='ipchange' flavor=None deactivated=False hidden=False
[...]
04.09.17 10:13:11.185  RESOURCES   ( INFO    ) : mod='setup' flavor='languages' deactivated=False hidden=False
04.09.17 10:13:11.185  RESOURCES   ( INFO    ) : mod='setup' flavor='network' deactivated=False hidden=False
04.09.17 10:13:11.186  RESOURCES   ( INFO    ) : mod='setup' flavor='certificate' deactivated=False hidden=False
04.09.17 10:13:11.187  MAIN        ( PROCESS ) : Connection timed out.
04.09.17 10:13:11.187  MAIN        ( INFO    ) : The session is shutting down. Sending UMC modules an EXIT request (1 processes)
04.09.17 10:13:11.187  MAIN        ( INFO    ) : Ask module udm to shutdown gracefully
04.09.17 10:13:11.187  PROTOCOL    ( INFO    ) : Sending UMCP EXIT REQUEST 150451279118755-2114
04.09.17 10:13:11.187  MAIN        ( INFO    ) : The session is shutting down
04.09.17 10:13:11.187  MAIN        ( PROCESS ) : Processor: dying
04.09.17 10:13:11.187  MAIN        ( PROCESS ) : ModuleProcess: dying
04.09.17 10:13:11.187  MAIN        ( WARN    ) : Socket died (module=udm)
04.09.17 10:13:11.187  MAIN        ( PROCESS ) : ModuleProcess: child stopped
04.09.17 10:13:11.238  MAIN        ( ERROR   ) : Traceback (most recent call last):
  File "/usr/sbin/univention-management-console-server", line 243, in <module>
    umc_daemon.do_action()
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action
    func(self)
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start
    self.app.run()
  File "/usr/sbin/univention-management-console-server", line 210, in run
    notifier.loop()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop
    step()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 273, in step
    not __sockets[ cond ][ fd ]( sock_obj ):
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 168, in _do_send
    id, first = state.resend_queue.pop(0)
IndexError: pop from empty list
Comment 6 Sönke Schwardt-Krummrich univentionstaff 2017-09-13 13:18:17 CEST
And again (this time with debug output):

13.09.17 12:48:54.927  MAIN        ( ERROR   ) : Exception in _do_send: state.resend_queue is empty - socket=<OpenSSL.SSL.Connection object at 0x7f030eae9410>; state=<State (client='::ffff:127.0.0.1:57138', buffer='', requests=0, time_remaining=29)>, states={<OpenSSL.SSL.Connection object at 0x7f030eb4b250>: <State (client='::ffff:127.0.0.1:57142', buffer='', requests=0, time_remaining=29)>, <OpenSSL.SSL.Connection object at 0x7f03107b1750>: <State (client='::ffff:127.0.0.1:56598', buffer='', requests=0, time_remaining=2)>, <OpenSSL.SSL.Connection object at 0x7f030eae9410>: <State (client='::ffff:127.0.0.1:57138', buffer='', requests=0, time_remaining=29)>, <OpenSSL.SSL.Connection object at 0x7f030eb2e450>: <State (client='::ffff:127.0.0.1:56560', buffer='', requests=0, time_remaining=1)>, <OpenSSL.SSL.Connection object at 0x7f0310347510>: <State (client='::ffff:127.0.0.1:56700', buffer='', requests=0, time_remaining=9)>, <OpenSSL.SSL.Connection object at 0x7f030ebdcf10>: <State (client='::ffff:127.0.0.1:56780', buffer='', requests=0, time_remaining=27)>, <OpenSSL.SSL.Connection object at 0x7f030f35f150>: <State (client='::ffff:127.0.0.1:57132', buffer='', requests=0, time_remaining=30)>}
13.09.17 12:48:54.928  MAIN        ( ERROR   ) : Traceback (most recent call last):
  File "/usr/sbin/univention-management-console-server", line 243, in <module>
    umc_daemon.do_action()
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action
    func(self)
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start
    self.app.run()
  File "/usr/sbin/univention-management-console-server", line 210, in run
    notifier.loop()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop
    step()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 273, in step
    not __sockets[ cond ][ fd ]( sock_obj ):
  File "/usr/lib/pymodules/python2.7/univention/management/console/protocol/server.py", line 169, in _do_send
    id, first = state.resend_queue.pop(0)
IndexError: pop from empty list
Comment 7 Florian Best univentionstaff 2017-09-21 17:53:38 CEST
I added the following block on the customer system:
                try:
                        id, first = state.resend_queue.pop(0)
                except IndexError:
                        CORE.error('IndexError: pop from empty list in _do_send!')
                        return False
Comment 8 Florian Best univentionstaff 2017-09-27 12:34:30 CEST
I could not find the reason for the crash nor reproduce it but I can prevent the crash.

univention-management-console (9.0.80-74):
66494a43917b | Bug #45254: Merge branch 'fbest/45254-crash-in-do-send' into 4.2-2
f38d0b92d7b6 | Bug #45254: fix UMC server crash in do_send()

univention-management-console.yaml:
66494a43917b | Bug #45254: Merge branch 'fbest/45254-crash-in-do-send' into 4.2-2
44c902d3b3db | YAML Bug #45254
Comment 9 Florian Best univentionstaff 2017-10-05 12:37:01 CEST
Thank you, the KeyError should have been IndexError.

univention-management-console (9.0.80-75):
96d8cc6cba69 | Bug #45254: Merge branch 'fbest/45254-crash-in-do-send' into 4.2-2
9d4e04245ac5 | Bug #45254: fix s/KeyError/IndexError/
Comment 10 Johannes Keiser univentionstaff 2017-10-05 18:30:57 CEST
OK Pop from empty list is catched
YAML OK
-> verified
Comment 11 Arvid Requate univentionstaff 2017-10-10 15:38:45 CEST
<http://errata.software-univention.de/ucs/4.2/196.html>