Bug 44222 - UMC Server crash: OverflowError: Python int too large to convert to C int
UMC Server crash: OverflowError: Python int too large to convert to C int
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: UMC (Generic)
UCS 4.3
Other Linux
: P5 normal (vote)
: UCS 4.3-1-errata
Assigned To: Jannik Ahlers
Arvid Requate
:
: 36808 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-04-03 12:31 CEST by Erik Damrose
Modified: 2019-11-06 19:11 CET (History)
5 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.240
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
1.patch (593 bytes, patch)
2018-03-13 14:10 CET, Arvid Requate
Details | Diff
1.patch (601 bytes, patch)
2018-03-19 13:22 CET, Arvid Requate
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Erik Damrose univentionstaff 2017-04-03 12:31:10 CEST
During system setup i tried to join into an AD domain. The domain controller had a vastly different time. We sync the time with the AD to ensure kerberos works. 

After the time sync, the UMC server crashed, and system setup could not continue. I got a popup that the UMC server is not running.

Logfiles show:

management-console-module-setup.log
03.04.17 12:25:07.324  MODULE      ( PROCESS ) : AD Info: {'Domain': 'ucs.local', 'LDAP Base': 'DC=ucs,DC=local', 'Forest': 'ucs.local', 'Client Site': 'Default-First-Site-Name', 'DC Netbios Name': 'WIN-DUA7EARHMN9', 'DC DNS Name': 'WIN-DUA7EARHMN9.ucs.local', 'Netbios Domain': 'UCS0', 'DC IP': '10.200.29.19', 'Server Site': 'Default-First-Site-Name'}
03.04.17 12:25:07.527  MODULE      ( PROCESS ) : Synchronizing time to WIN-DUA7EARHMN9.ucs.local
25.05.16 17:10:58.787  MODULE      ( PROCESS ) : Time difference is less than 180 seconds, skipping reset of local time
25.05.16 17:10:58.990  MODULE      ( PROCESS ) : Prepare Kerberos UCR settings
25.05.16 17:10:58.991  MODULE      ( PROCESS ) : Setting UCR variables: [u'kerberos/defaults/dns_lookup_kdc=true', u'kerberos/realm=UCS.LOCAL']
25.05.16 17:10:59.198  MODULE      ( PROCESS ) : Unsetting UCR variables: [u'kerberos/kdc', u'kerberos/kpasswdserver', u'kerberos/adminserver']
25.05.16 17:11:00.196  MODULE      ( PROCESS ) : User is default AD Administrator
25.05.16 17:11:00.205  MODULE      ( ERROR   ) : Traceback (most recent call last):
  File "/usr/sbin/univention-management-console-module", line 121, in <module>
    notifier.loop()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop
    step()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 219, in step
    fds = __poll.poll( timeout )
OverflowError: Python int too large to convert to C int


==> /var/log/univention/management-console-server.log <==
25.05.16 17:11:00.486  MAIN        ( WARN    ) : Socket died (module=setup)
25.05.16 17:11:00.486  MAIN        ( WARN    ) : Module process setup died (pid: 1373, exit status: -1, signal: -1, status: -1)
25.05.16 17:11:00.486  MAIN        ( WARN    ) : Cleaning up requests
25.05.16 17:11:00.486  MAIN        ( WARN    ) : Remove inactivity timer
25.05.16 17:11:00.586  MAIN        ( PROCESS ) : ModuleProcess: child died
25.05.16 17:11:00.586  MAIN        ( WARN    ) : Module process setup died (pid: 1373, exit status: 1, signal: -1, status: 256)
25.05.16 17:11:00.588  MAIN        ( ERROR   ) : Traceback (most recent call last):
  File "/usr/sbin/univention-management-console-server", line 243, in <module>
    umc_daemon.do_action()
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action
    func(self)
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start
    self.app.run()
  File "/usr/sbin/univention-management-console-server", line 210, in run
    notifier.loop()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop
    step()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 219, in step
    fds = __poll.poll( timeout )
OverflowError: Python int too large to convert to C int
Comment 2 Florian Best univentionstaff 2017-05-24 15:10:47 CEST
Happened again to me, I simply did:
service univention-management-console-server reload
and it crashed.
Comment 3 Florian Best univentionstaff 2017-05-24 15:11:18 CEST
23.05.12 08:17:17.031  MAIN        ( WARN    ) : Module process udm died (pid: 23968, exit status: -1, signal: -1, status: -1)
23.05.12 08:17:17.031  MAIN        ( INFO    ) : Checking for kill timer ()
23.05.12 08:17:17.031  MAIN        ( WARN    ) : Cleaning up requests
23.05.12 08:17:17.031  MAIN        ( INFO    ) : No pending requests found
23.05.12 08:17:17.031  MAIN        ( WARN    ) : Remove inactivity timer
23.05.12 08:17:17.032  MAIN        ( PROCESS ) : ModuleProcess: child died
23.05.12 08:17:17.032  MAIN        ( WARN    ) : Module process udm died (pid: 23968, exit status: 1, signal: -1, status: 256)
23.05.12 08:17:17.032  MAIN        ( INFO    ) : Checking for kill timer ()
23.05.12 08:17:17.039  MAIN        ( ERROR   ) : Traceback (most recent call last):
  File "/usr/sbin/univention-management-console-server", line 243, in <module>
    umc_daemon.do_action()
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action
    func(self)
  File "/usr/sbin/univention-management-console-server", line 167, in _crestart
    self._start()
  File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start
    self.app.run()
  File "/usr/sbin/univention-management-console-server", line 210, in run
    notifier.loop()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop
    step()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 219, in step
    fds = __poll.poll( timeout )
OverflowError: Python int too large to convert to C int

23.05.12 08:17:38.428  DEBUG_INIT
23.05.12 08:17:38.429  MAIN        ( PROCESS ) : Reload failed: server ist not running
Comment 4 Florian Best univentionstaff 2017-05-24 15:12:13 CEST
And again:

23.05.17 16:30:10.783  DEBUG_INIT
23.05.17 16:37:34.991  MODULE      ( ERROR   ) : Failed connection: ECONNRESET
23.05.17 16:37:35.015  MAIN        ( WARN    ) : Shutting down all open connections
24.05.17 13:29:28.035  DEBUG_INIT
24.05.17 13:39:31.947  MAIN        ( WARN    ) : Shutting down all open connections
24.05.17 15:06:31.708  DEBUG_INIT
23.05.12 08:17:16.310  MODULE      ( ERROR   ) : Traceback (most recent call last):
  File "/usr/sbin/univention-management-console-module", line 121, in <module>
    notifier.loop()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop
    step()
  File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 219, in step
    fds = __poll.poll( timeout )
OverflowError: Python int too large to convert to C int
Comment 5 Dirk Wiesenthal univentionstaff 2018-03-11 13:48:37 CET
*** Bug 36808 has been marked as a duplicate of this bug. ***
Comment 6 Dirk Wiesenthal univentionstaff 2018-03-11 13:50:21 CET
Experienced that, too. 64bit. I manually changed the time of the system with "date --set=...".
Comment 9 Arvid Requate univentionstaff 2018-03-13 14:10:18 CET
Created attachment 9466 [details]
1.patch

select.pipe().poll() only accepts 32bit (apparently also on 64bit systems). Maybe this patch is enough, behavior untested.
Comment 10 Dirk Wiesenthal univentionstaff 2018-03-13 14:22:42 CET
Patching python-notifier seems to be the only option. The project looks abandoned...

Anyway, we should have a look what the "normal" timeout values are. I assume they are between 10 and 5000? I am not sure how UMC behaves if we poll for a billion milliseconds. I also don't know what it really does. Does it wait for any action? For a specific registered one? The notifier code is unknown to me.

(Granted, it is most probably better to poll for any number of milliseconds instead of crashing the server...)
Comment 11 Arvid Requate univentionstaff 2018-03-13 14:37:15 CET
It's a timeout for poll to avoid crashing the server in this corner case, where the system time is changed massively while running the notifier loop. Maybe a lower maximum timeout would be better, but that's a GUI decision. My patch attempts to fix the crash.
Comment 12 Florian Best univentionstaff 2018-03-16 11:32:08 CET
(In reply to Arvid Requate from comment #9)
> Created attachment 9466 [details]
> 1.patch
> 
> select.pipe().poll() only accepts 32bit (apparently also on 64bit systems).
> Maybe this patch is enough, behavior untested.

The patch would cause that the timer-callback is executed in 2147483647 seconds.
That's not correct and is too much in the future and may cause side effects if the application depends on the behavior of the timer.
An alternative would be to execute the timer immediately, which might also be incorrect. 
A correct fix would be to detect the time change and adjust the timer / find out the correct time.
Comment 13 Arvid Requate univentionstaff 2018-03-16 12:08:44 CET
> The patch would cause that the timer-callback is executed in 2147483647 seconds.

"cause"? - nope, read the code, it's min, not max. The patch would *limit* the timeout used for poll and time_sleep to 2147483647 where now it is unlimted. Sure, probably there are other things to fix to gracefully handle massive time jumps.
Comment 14 Florian Best univentionstaff 2018-03-16 12:47:15 CET
(In reply to Arvid Requate from comment #13)
> > The patch would cause that the timer-callback is executed in 2147483647 seconds.
> 
> "cause"? - nope, read the code, it's min, not max.
Yes, but min(nextCall, __uint32_max) would be 2147483647 in the overflow scenario.
> The patch would *limit*
> the timeout used for poll and time_sleep to 2147483647 where now it is unlimited.
No, currently it is a crash, not "unlimited".
Yes, this patch add a maximum limit for the poll() call.

That's what I criticise:

Imagine we have a timer interval of every 20 minutes and the system clock was set to any value in the future.

1. If the application listens to sockets:
With the patch, we have poll.poll(2147483647)
So it blocks as long as nothing happens on the sockets, even more than 21 minutes and up to 24 days. Everything might be fine, if something happens on any socket. But even then, the timer-callback is not executed but in 24 days.

2. If the applications has no sockets:
Then it just does time_sleep(timeout / 1000.0 ):
→ which would be 24 days of time.sleep().

> Sure, probably there are other things to fix to gracefully handle
> massive time jumps.

→ The patch removes the crash but causes a 24-day blocking application.
*Correct me If I am wrong!?*

I think we should either recalculate the timeout (if timeout > __uint32_max: timestamp = time.time() + interval; timers[$ID] = (interval, timestamp, callback)) or execute the timer immediately (if timeout > __uint32_max: timeout = 0).
But we must assure that the timer callback is executed again (and not in 24 days).
Comment 15 Arvid Requate univentionstaff 2018-03-19 13:22:45 CET
Created attachment 9479 [details]
1.patch

Ok, following the timeout = 0 approach. I just want to suggest a solution instead of waiting another year.
Comment 16 Erik Damrose univentionstaff 2018-04-17 11:44:14 CEST
Not surprisingly still happening with UCS 4.3
Comment 17 Jannik Ahlers univentionstaff 2018-06-07 12:57:15 CEST
the second patch seems to work, so we're using it.

Successful build
Package: python-notifier
Version: 0.9.7-4A~4.3.0.201806071253
Branch: ucs_4.3-0
Scope: errata4.3-1
Comment 18 Arvid Requate univentionstaff 2018-06-26 13:45:12 CEST
QA Feedback:

1. Ok, the new patch

  4.3-0-0-ucs/0.9.7-4-errata4.3-1/10_bug44222_overflowerror.patch

has been applied at build time. Usually the patch suffix should be ".quilt" if it patches the source code and ".patch" if it patches debian/* files only. But it works in this case because debian/source/format is not yet "3.0 (quilt)" in this specific Debian source package.


2. The version string of the binary package is follows the new version scheme of UCS 4.2 ( see https://hutten.knut.univention.de/mediawiki/index.php/Entwicklungsumgebung#UCS_4.2 ), but since the source package version 0.9.7-4 has already been imported in errata4.1-0, the version of the binary package is now lower than the binary package version shipped in errata4.1-0:

===========================================================================
root@master10:~# apt-cache policy python-notifier 
python-notifier:
  Installiert:           0.9.7-4.30.201601271815
  Installationskandidat: 0.9.7-4.30.201601271815
  Versionstabelle:
 *** 0.9.7-4.30.201601271815 500
        500 http://updates.software-univention.de/4.1/maintained 4.1-1/all/ Packages
        100 /var/lib/dpkg/status
     0.9.7-4A~4.3.0.201806071253 500
        500 http://192.168.0.10/build2 ucs_4.3-0-errata4.3-1/all/ Packages
===========================================================================

You can check this also with this command:
===========================================================================
root@master10:~# dpkg --compare-versions \
  "0.9.7-4A~4.3.0.201806071253" gt "0.9.7-4.30.201601271815" \
  && echo ok || echo fail
fail
===========================================================================

The wiki page quoted above explains that this is a special case, where our UCS 4.2 automatic package version scheme fails and you have to build the package with a manual version override, like this:

build-package-ng -P ucs -r 4.3-0-0 -s errata4.3-1 --no-pbuilder-update \
  -p python-notifier -v "0.9.7-4.A~4.3.0.201806071253"

Please note the "." before the "A~", which fixes the version sorting:
===========================================================================
root@master10:~# dpkg --compare-versions \
  "0.9.7-4.A~4.3.0.201806071253" gt "0.9.7-4.30.201601271815" \
  && echo ok || echo fail
ok
===========================================================================

I've done this for you.

Package: python-notifier
Version: 0.9.7-4.A~4.3.0.201806071253
Branch: ucs_4.3-0
Scope: errata4.3-1

Now it looks better and the package is actually updatable:

===========================================================================
root@master10:~# apt-cache policy python-notifier 
python-notifier:
  Installiert:           0.9.7-4.30.201601271815
  Installationskandidat: 0.9.7-4.A~4.3.0.201806071253
  Versionstabelle:
     0.9.7-4.A~4.3.0.201806071253 500
        500 http://192.168.0.10/build2 ucs_4.3-0-errata4.3-1/all/ Packages
 *** 0.9.7-4.30.201601271815 500
        500 http://updates.software-univention.de/4.1/maintained 4.1-1/all/ Packages
===========================================================================


3. The advisory file is missing.

Please fix point 3.
Comment 19 Jannik Ahlers univentionstaff 2018-06-26 17:54:55 CEST
> Please fix point 3.
Done.
Comment 20 Arvid Requate univentionstaff 2018-06-26 20:40:15 CEST
Ok
Comment 21 Erik Damrose univentionstaff 2018-06-27 14:37:22 CEST
<http://errata.software-univention.de/ucs/4.3/118.html>
Comment 22 Florian Best univentionstaff 2019-11-05 23:24:01 CET
FYI: it is not necessary to commit this into the patches repository. the package lies in management/python-notifier.
Comment 23 Arvid Requate univentionstaff 2019-11-06 19:11:40 CET
That's outdated, shorty after it has been committed to git:

===========================================================
commit 6f8ee8e
Author: Jannik Ahlers <ahlers@univention.de>
Date:   Thu Jun 7 12:51:39 2018 +0200

    Bug #44222: Fix overflowerror when system time changes
===========================================================

arequate@dimma:~$ cat /var/univention/buildsystem2/logs/ucs_4.3-0-0-errata4.3-1/python-notifier_0.9.7-5A~4.3.0.201807041139.patch.log
dpkg-source: Warnung: --skip-patches ist keine gültige Option für Dpkg::Source::Package::V1.
dpkg-source: Warnung: unsigniertes Quellpaket wird extrahiert (/var/univention/buildsystem2/src/python-notifier/0.9.7-5/python-notifier_0.9.7-5.dsc)
dpkg-source: Information: python-notifier wird nach python-notifier-0.9.7 extrahiert
dpkg-source: Information: python-notifier_0.9.7-5.tar.gz wird entpackt
'/var/univention/buildsystem2/src/python-notifier/0.9.7-5/python-notifier_0.9.7-5.tar.gz' -> 'python-notifier_0.9.7-5.tar.gz'
'/var/univention/buildsystem2/src/python-notifier/0.9.7-5/python-notifier_0.9.7-5.dsc' -> 'python-notifier_0.9.7-5.dsc'
No handlers could be found for logger "univention.repong.vcs.svn"
No patches were found.