Univention Bugzilla – Bug 44222
UMC Server crash: OverflowError: Python int too large to convert to C int
Last modified: 2019-11-06 19:11:40 CET
During system setup i tried to join into an AD domain. The domain controller had a vastly different time. We sync the time with the AD to ensure kerberos works. After the time sync, the UMC server crashed, and system setup could not continue. I got a popup that the UMC server is not running. Logfiles show: management-console-module-setup.log 03.04.17 12:25:07.324 MODULE ( PROCESS ) : AD Info: {'Domain': 'ucs.local', 'LDAP Base': 'DC=ucs,DC=local', 'Forest': 'ucs.local', 'Client Site': 'Default-First-Site-Name', 'DC Netbios Name': 'WIN-DUA7EARHMN9', 'DC DNS Name': 'WIN-DUA7EARHMN9.ucs.local', 'Netbios Domain': 'UCS0', 'DC IP': '10.200.29.19', 'Server Site': 'Default-First-Site-Name'} 03.04.17 12:25:07.527 MODULE ( PROCESS ) : Synchronizing time to WIN-DUA7EARHMN9.ucs.local 25.05.16 17:10:58.787 MODULE ( PROCESS ) : Time difference is less than 180 seconds, skipping reset of local time 25.05.16 17:10:58.990 MODULE ( PROCESS ) : Prepare Kerberos UCR settings 25.05.16 17:10:58.991 MODULE ( PROCESS ) : Setting UCR variables: [u'kerberos/defaults/dns_lookup_kdc=true', u'kerberos/realm=UCS.LOCAL'] 25.05.16 17:10:59.198 MODULE ( PROCESS ) : Unsetting UCR variables: [u'kerberos/kdc', u'kerberos/kpasswdserver', u'kerberos/adminserver'] 25.05.16 17:11:00.196 MODULE ( PROCESS ) : User is default AD Administrator 25.05.16 17:11:00.205 MODULE ( ERROR ) : Traceback (most recent call last): File "/usr/sbin/univention-management-console-module", line 121, in <module> notifier.loop() File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop step() File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 219, in step fds = __poll.poll( timeout ) OverflowError: Python int too large to convert to C int ==> /var/log/univention/management-console-server.log <== 25.05.16 17:11:00.486 MAIN ( WARN ) : Socket died (module=setup) 25.05.16 17:11:00.486 MAIN ( WARN ) : Module process setup died (pid: 1373, exit status: -1, signal: -1, status: -1) 25.05.16 17:11:00.486 MAIN ( WARN ) : Cleaning up requests 25.05.16 17:11:00.486 MAIN ( WARN ) : Remove inactivity timer 25.05.16 17:11:00.586 MAIN ( PROCESS ) : ModuleProcess: child died 25.05.16 17:11:00.586 MAIN ( WARN ) : Module process setup died (pid: 1373, exit status: 1, signal: -1, status: 256) 25.05.16 17:11:00.588 MAIN ( ERROR ) : Traceback (most recent call last): File "/usr/sbin/univention-management-console-server", line 243, in <module> umc_daemon.do_action() File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action func(self) File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start self.app.run() File "/usr/sbin/univention-management-console-server", line 210, in run notifier.loop() File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop step() File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 219, in step fds = __poll.poll( timeout ) OverflowError: Python int too large to convert to C int
https://github.com/crunchy-github/python-notifier/issues/8
Happened again to me, I simply did: service univention-management-console-server reload and it crashed.
23.05.12 08:17:17.031 MAIN ( WARN ) : Module process udm died (pid: 23968, exit status: -1, signal: -1, status: -1) 23.05.12 08:17:17.031 MAIN ( INFO ) : Checking for kill timer () 23.05.12 08:17:17.031 MAIN ( WARN ) : Cleaning up requests 23.05.12 08:17:17.031 MAIN ( INFO ) : No pending requests found 23.05.12 08:17:17.031 MAIN ( WARN ) : Remove inactivity timer 23.05.12 08:17:17.032 MAIN ( PROCESS ) : ModuleProcess: child died 23.05.12 08:17:17.032 MAIN ( WARN ) : Module process udm died (pid: 23968, exit status: 1, signal: -1, status: 256) 23.05.12 08:17:17.032 MAIN ( INFO ) : Checking for kill timer () 23.05.12 08:17:17.039 MAIN ( ERROR ) : Traceback (most recent call last): File "/usr/sbin/univention-management-console-server", line 243, in <module> umc_daemon.do_action() File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 186, in do_action func(self) File "/usr/sbin/univention-management-console-server", line 167, in _crestart self._start() File "/usr/lib/pymodules/python2.7/daemon/runner.py", line 131, in _start self.app.run() File "/usr/sbin/univention-management-console-server", line 210, in run notifier.loop() File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop step() File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 219, in step fds = __poll.poll( timeout ) OverflowError: Python int too large to convert to C int 23.05.12 08:17:38.428 DEBUG_INIT 23.05.12 08:17:38.429 MAIN ( PROCESS ) : Reload failed: server ist not running
And again: 23.05.17 16:30:10.783 DEBUG_INIT 23.05.17 16:37:34.991 MODULE ( ERROR ) : Failed connection: ECONNRESET 23.05.17 16:37:35.015 MAIN ( WARN ) : Shutting down all open connections 24.05.17 13:29:28.035 DEBUG_INIT 24.05.17 13:39:31.947 MAIN ( WARN ) : Shutting down all open connections 24.05.17 15:06:31.708 DEBUG_INIT 23.05.12 08:17:16.310 MODULE ( ERROR ) : Traceback (most recent call last): File "/usr/sbin/univention-management-console-module", line 121, in <module> notifier.loop() File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 286, in loop step() File "/usr/lib/pymodules/python2.7/notifier/nf_generic.py", line 219, in step fds = __poll.poll( timeout ) OverflowError: Python int too large to convert to C int
*** Bug 36808 has been marked as a duplicate of this bug. ***
Experienced that, too. 64bit. I manually changed the time of the system with "date --set=...".
Created attachment 9466 [details] 1.patch select.pipe().poll() only accepts 32bit (apparently also on 64bit systems). Maybe this patch is enough, behavior untested.
Patching python-notifier seems to be the only option. The project looks abandoned... Anyway, we should have a look what the "normal" timeout values are. I assume they are between 10 and 5000? I am not sure how UMC behaves if we poll for a billion milliseconds. I also don't know what it really does. Does it wait for any action? For a specific registered one? The notifier code is unknown to me. (Granted, it is most probably better to poll for any number of milliseconds instead of crashing the server...)
It's a timeout for poll to avoid crashing the server in this corner case, where the system time is changed massively while running the notifier loop. Maybe a lower maximum timeout would be better, but that's a GUI decision. My patch attempts to fix the crash.
(In reply to Arvid Requate from comment #9) > Created attachment 9466 [details] > 1.patch > > select.pipe().poll() only accepts 32bit (apparently also on 64bit systems). > Maybe this patch is enough, behavior untested. The patch would cause that the timer-callback is executed in 2147483647 seconds. That's not correct and is too much in the future and may cause side effects if the application depends on the behavior of the timer. An alternative would be to execute the timer immediately, which might also be incorrect. A correct fix would be to detect the time change and adjust the timer / find out the correct time.
> The patch would cause that the timer-callback is executed in 2147483647 seconds. "cause"? - nope, read the code, it's min, not max. The patch would *limit* the timeout used for poll and time_sleep to 2147483647 where now it is unlimted. Sure, probably there are other things to fix to gracefully handle massive time jumps.
(In reply to Arvid Requate from comment #13) > > The patch would cause that the timer-callback is executed in 2147483647 seconds. > > "cause"? - nope, read the code, it's min, not max. Yes, but min(nextCall, __uint32_max) would be 2147483647 in the overflow scenario. > The patch would *limit* > the timeout used for poll and time_sleep to 2147483647 where now it is unlimited. No, currently it is a crash, not "unlimited". Yes, this patch add a maximum limit for the poll() call. That's what I criticise: Imagine we have a timer interval of every 20 minutes and the system clock was set to any value in the future. 1. If the application listens to sockets: With the patch, we have poll.poll(2147483647) So it blocks as long as nothing happens on the sockets, even more than 21 minutes and up to 24 days. Everything might be fine, if something happens on any socket. But even then, the timer-callback is not executed but in 24 days. 2. If the applications has no sockets: Then it just does time_sleep(timeout / 1000.0 ): → which would be 24 days of time.sleep(). > Sure, probably there are other things to fix to gracefully handle > massive time jumps. → The patch removes the crash but causes a 24-day blocking application. *Correct me If I am wrong!?* I think we should either recalculate the timeout (if timeout > __uint32_max: timestamp = time.time() + interval; timers[$ID] = (interval, timestamp, callback)) or execute the timer immediately (if timeout > __uint32_max: timeout = 0). But we must assure that the timer callback is executed again (and not in 24 days).
Created attachment 9479 [details] 1.patch Ok, following the timeout = 0 approach. I just want to suggest a solution instead of waiting another year.
Not surprisingly still happening with UCS 4.3
the second patch seems to work, so we're using it. Successful build Package: python-notifier Version: 0.9.7-4A~4.3.0.201806071253 Branch: ucs_4.3-0 Scope: errata4.3-1
QA Feedback: 1. Ok, the new patch 4.3-0-0-ucs/0.9.7-4-errata4.3-1/10_bug44222_overflowerror.patch has been applied at build time. Usually the patch suffix should be ".quilt" if it patches the source code and ".patch" if it patches debian/* files only. But it works in this case because debian/source/format is not yet "3.0 (quilt)" in this specific Debian source package. 2. The version string of the binary package is follows the new version scheme of UCS 4.2 ( see https://hutten.knut.univention.de/mediawiki/index.php/Entwicklungsumgebung#UCS_4.2 ), but since the source package version 0.9.7-4 has already been imported in errata4.1-0, the version of the binary package is now lower than the binary package version shipped in errata4.1-0: =========================================================================== root@master10:~# apt-cache policy python-notifier python-notifier: Installiert: 0.9.7-4.30.201601271815 Installationskandidat: 0.9.7-4.30.201601271815 Versionstabelle: *** 0.9.7-4.30.201601271815 500 500 http://updates.software-univention.de/4.1/maintained 4.1-1/all/ Packages 100 /var/lib/dpkg/status 0.9.7-4A~4.3.0.201806071253 500 500 http://192.168.0.10/build2 ucs_4.3-0-errata4.3-1/all/ Packages =========================================================================== You can check this also with this command: =========================================================================== root@master10:~# dpkg --compare-versions \ "0.9.7-4A~4.3.0.201806071253" gt "0.9.7-4.30.201601271815" \ && echo ok || echo fail fail =========================================================================== The wiki page quoted above explains that this is a special case, where our UCS 4.2 automatic package version scheme fails and you have to build the package with a manual version override, like this: build-package-ng -P ucs -r 4.3-0-0 -s errata4.3-1 --no-pbuilder-update \ -p python-notifier -v "0.9.7-4.A~4.3.0.201806071253" Please note the "." before the "A~", which fixes the version sorting: =========================================================================== root@master10:~# dpkg --compare-versions \ "0.9.7-4.A~4.3.0.201806071253" gt "0.9.7-4.30.201601271815" \ && echo ok || echo fail ok =========================================================================== I've done this for you. Package: python-notifier Version: 0.9.7-4.A~4.3.0.201806071253 Branch: ucs_4.3-0 Scope: errata4.3-1 Now it looks better and the package is actually updatable: =========================================================================== root@master10:~# apt-cache policy python-notifier python-notifier: Installiert: 0.9.7-4.30.201601271815 Installationskandidat: 0.9.7-4.A~4.3.0.201806071253 Versionstabelle: 0.9.7-4.A~4.3.0.201806071253 500 500 http://192.168.0.10/build2 ucs_4.3-0-errata4.3-1/all/ Packages *** 0.9.7-4.30.201601271815 500 500 http://updates.software-univention.de/4.1/maintained 4.1-1/all/ Packages =========================================================================== 3. The advisory file is missing. Please fix point 3.
> Please fix point 3. Done.
Ok
<http://errata.software-univention.de/ucs/4.3/118.html>
FYI: it is not necessary to commit this into the patches repository. the package lies in management/python-notifier.
That's outdated, shorty after it has been committed to git: =========================================================== commit 6f8ee8e Author: Jannik Ahlers <ahlers@univention.de> Date: Thu Jun 7 12:51:39 2018 +0200 Bug #44222: Fix overflowerror when system time changes =========================================================== arequate@dimma:~$ cat /var/univention/buildsystem2/logs/ucs_4.3-0-0-errata4.3-1/python-notifier_0.9.7-5A~4.3.0.201807041139.patch.log dpkg-source: Warnung: --skip-patches ist keine gültige Option für Dpkg::Source::Package::V1. dpkg-source: Warnung: unsigniertes Quellpaket wird extrahiert (/var/univention/buildsystem2/src/python-notifier/0.9.7-5/python-notifier_0.9.7-5.dsc) dpkg-source: Information: python-notifier wird nach python-notifier-0.9.7 extrahiert dpkg-source: Information: python-notifier_0.9.7-5.tar.gz wird entpackt '/var/univention/buildsystem2/src/python-notifier/0.9.7-5/python-notifier_0.9.7-5.tar.gz' -> 'python-notifier_0.9.7-5.tar.gz' '/var/univention/buildsystem2/src/python-notifier/0.9.7-5/python-notifier_0.9.7-5.dsc' -> 'python-notifier_0.9.7-5.dsc' No handlers could be found for logger "univention.repong.vcs.svn" No patches were found.