Bug 47114 - UMC Memory Leaks with UVMM
UMC Memory Leaks with UVMM
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Virtualization - UVMM
UCS 4.2
Other Linux
: P5 normal (vote)
: UCS 4.3-1-errata
Assigned To: Philipp Hahn
Jürn Brodersen
:
Depends on:
Blocks: 47569 47857
  Show dependency treegraph
 
Reported: 2018-06-01 13:35 CEST by Christian Völker
Modified: 2019-12-02 17:31 CET (History)
8 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 4: A User would return the product
User Pain: 0.229
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support: Yes
Ticket number: 2018053121000468, 2018062321000016
Bug group (optional):
Max CVSS v3 score:


Attachments
collectd VirtualMemorySize graph of UVMM-UMC (11.13 KB, image/png)
2018-07-04 09:58 CEST, Philipp Hahn
Details
reproduce memory leak (531 bytes, text/x-python)
2018-07-11 14:00 CEST, Jürn Brodersen
Details
patch (1.92 KB, patch)
2018-08-07 12:15 CEST, Florian Best
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Völker univentionstaff 2018-06-01 13:35:28 CEST
Customer's UMC uses an increasing amount of memory.

During one hour the usage of "univention-management-console-server" grew from 600MB to 2.3GB!

At this stage users trying to open UVMM get the error:
=======================================
UCS Version: 4.2-3 errata310 (Lesum)

Konnte das Modul nicht öffnen. Auf dem Server ist nicht genügend freier Arbeitsspeicher verfügbar. Bitte später erneut versuchen.

None

Request: handle_request_command
Konnte das Modul nicht öffnen. Auf dem Server ist nicht genügend freier Arbeitsspeicher verfügbar. Bitte später erneut versuchen.
========================================================
A restart of the service cleans up and the system works as expected.

It happens approximately once a day.

Here output of "top":
===================================
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
18857 root      20   0 1885552 1,322g  25096 S   0,0 34,2   0:51.19 /usr/bin/python2.7 /usr/sbin/univention-management-console-module -m uvmm -s /var/run/univention-management-console/4415-1527726624735.socket -d 2 -l de_DE.UTF-8
4415 root      20   0 2771332 678012  13928 S   0,3 16,7 224:42.58 /usr/bin/python2.7 /usr/sbin/univention-management-console-server start
13075 root      20   0  862764 170908  25176 S   0,3  4,2   0:10.46 /usr/bin/python2.7 /usr/sbin/univention-management-console-module -m uvmm -s /var/run/univention-management-console/4415-1527744643097.socket -d 2 -l de_DE.UTF-8
28211 root      20   0  755972 116332  24704 S   0,0  2,9   0:05.89 /usr/bin/python2.7 /usr/sbin/univention-management-console-module -m uvmm -s /var/run/univention-management-console/4415-1527747733269.socket -d 2 -l de_DE.UTF-8
32734 root      20   0 2008344  49632      0 S   7,3  1,2  27:58.19 /usr/bin/python2.7 /usr/sbin/univention-virtual-machine-manager-daemon -l /var/log/univention/virtual-machine-manager-daemon-errors.log
1062 root      20   0 3690100  24192   5272 S   0,3  0,6   3890:32 /usr/sbin/slapd -h ldapi:/// ldap://:7389/ ldap://:389/ ldaps://:7636/ ldaps://:636/
 ===================================

Appears it is not limited to UVMM module.
Comment 2 Christian Völker univentionstaff 2018-06-01 13:38:57 CEST
Possibly related:

See also https://forge.univention.org/bugzilla/show_bug.cgi?id=36640
Comment 3 Nico Gulden univentionstaff 2018-06-26 16:31:40 CEST
Probably related Ticket#2018062321000016.
Comment 4 Philipp Hahn univentionstaff 2018-07-03 14:51:08 CEST
This is not caused by pickle.

The leak is easy to see with collectd.

I can re-produce the leak with:
 from univention.lib.umc import Client
 from pprint import pprint
 umc = Client('dc0', 'Administrator', 'univention')
 while True:
     pprint(umc.umc_command('uvmm/query', dict(nodePattern='*', type='node')).data)

'service/query' on the other hand does not show that leak, so it's UVMM-UMC only.

Using the technique from <http://tech.labs.oliverwyman.com/blog/2008/11/14/tracing-python-memory-leaks/> I found this:
 UVMM_ConnectionThread      14480                                          
 Message                    14480                                          

The problem seems to be a dependency loop involving UVMM_ConnectionThread, which is a sub-class of Python-Notifier, which implements a __del__ method, which is very bad: <https://docs.python.org/2/library/gc.html#gc.garbage>

Combined with <https://docs.python.org/2/library/sys.html#sys.exc_info> this leads to this leak.
Comment 5 Philipp Hahn univentionstaff 2018-07-04 09:58:47 CEST
Created attachment 9587 [details]
collectd VirtualMemorySize graph of UVMM-UMC
Comment 6 Philipp Hahn univentionstaff 2018-07-04 12:09:33 CEST
As we have our own fork of python-notifier <https://git.knut.univention.de/univention/python-notifier/tree/ucs>, the workflow was as following:
1. git clone -b ucs -o gitlab git.knut.univention.de:univention/python-notifier
2. apply patches - please note that the fix for Bug #44222 was applied as a repo-ng Patch; I merged the patch into that repository and told repo-ng to skip copying the patch.
3. git push gitlab
4. switch to UCS repo ./management/python-notifier/
5. git submodule iniit python-notifier
6. git submodule update python-notifier
7. ( cd python-notifier && git pull --rebase )
8. git add python-notifier
9. dch -i 'Bug #47114: Fix memory leak'
10. git add debian/changelog
11. git commit -m 'Bug #47114: Fix memory leak'
12. git pull --rebase
13. git push ...

Package: python-notifier
Version: 0.9.7-5A~4.3.0.201807041038
Version: 0.9.7-5A~4.3.0.201807041139
Branch: ucs_4.3-0
Scope: errata4.3-1

[4.3-1] 3a266ced6c Bug #47114: python-notifier 0.9.7-5A~4.3.0.201807041139
 doc/errata/staging/python-notifier.yaml | 11 +++++++++++
 1 file changed, 11 insertions(+)

OK: zless /usr/share/doc/python-notifier/changelog.Debian.gz

FYI: <https://github.com/crunchy-github/python-notifier/pull/14>

TODO: After QA clone the bug for UCS-4.2-4 and apply the same logic.
Comment 7 Philipp Hahn univentionstaff 2018-07-04 12:13:20 CEST
(In reply to Philipp Hahn from comment #6)
> TODO: After QA clone the bug for UCS-4.2-4 and apply the same logic.

As an alternative announce the same binary also for that older release as it contains also the fix for Bug #44222 .

The test is running on xen1 and the VM size no longer increases: <http://xen1.knut.univention.de/collectd/bin/index.cgi?hostname=dc0.phahn.dev&plugin=processes&timespan=86400&action=show_selection&ok_button=OK>
Comment 8 Florian Best univentionstaff 2018-07-04 12:25:51 CEST
I saw the pull request in python-notifier. The changes there will cause that some tracebacks are broken and not displayed in UMC anymore. See:

management/univention-management-console/src/univention/management/console/auth.py:
»   »   »   AUTH.error(''.join(traceback.format_exception(*thread.exc_info)))

management/univention-management-console/src/univention/management/console/base.py:
»   »   self.__error_handling(request, method, *thread.exc_info)
Comment 9 Philipp Hahn univentionstaff 2018-07-04 12:33:23 CEST
(In reply to Florian Best from comment #8)
> I saw the pull request in python-notifier. The changes there will cause that
> some tracebacks are broken and not displayed in UMC anymore. See:
> 
> management/univention-management-console/src/univention/management/console/
> auth.py:
> »   »   »   AUTH.error(''.join(traceback.format_exception(*thread.exc_info)))
> 
> management/univention-management-console/src/univention/management/console/
> base.py:
> »   »   self.__error_handling(request, method, *thread.exc_info)

Better then a crash due OOM; UMC should be fixed: <https://git.knut.univention.de/univention/ucs/commit/15621de896819a686d593c360b309eedef5475e2>
Comment 10 Quality Assurance univentionstaff 2018-07-04 16:02:34 CEST
--- mirror/ftp/4.1/unmaintained/4.1-1/source/python-notifier_0.9.7-4.30.201601271815.dsc
+++ apt/ucs_4.3-0-errata4.3-1/source/python-notifier_0.9.7-5A~4.3.0.201807041139.dsc
@@ -1,6 +1,10 @@
-0.9.7-4.30.201601271815 [Wed, 27 Jan 2016 18:15:13 +0100] Univention builddaemon <buildd@univention.de>:
+0.9.7-5A~4.3.0.201807041139 [Wed, 04 Jul 2018 11:39:26 +0200] Univention builddaemon <buildd@univention.de>:
 
   * UCS auto build. No patches were applied to the original source package
+
+0.9.7-5 [Wed, 04 Jul 2018 10:36:25 +0200] Philipp Hahn <hahn@univention.de>:
+
+  * Bug #47114: Fix memory leak
 
 0.9.7-4 [Wed, 27 Jan 2016 18:12:51 +0100] Florian Best <best@univention.de>:
 

<http://10.200.17.11/4.2-4/#7474818989236411566>
Comment 11 Quality Assurance univentionstaff 2018-07-04 16:05:05 CEST
--- mirror/ftp/4.3/unmaintained/component/4.3-1-errata/source/python-notifier_0.9.7-4.A~4.3.0.201806071253.dsc
+++ apt/ucs_4.3-0-errata4.3-1/source/python-notifier_0.9.7-5A~4.3.0.201807041139.dsc
@@ -1,7 +1,10 @@
-0.9.7-4.A~4.3.0.201806071253 [Tue, 26 Jun 2018 13:25:55 +0200] Univention builddaemon <buildd@univention.de>:
+0.9.7-5A~4.3.0.201807041139 [Wed, 04 Jul 2018 11:39:26 +0200] Univention builddaemon <buildd@univention.de>:
 
-  * UCS auto build. The following patches have been applied to the original source package
-    10_bug44222_overflowerror
+  * UCS auto build. No patches were applied to the original source package
+
+0.9.7-5 [Wed, 04 Jul 2018 10:36:25 +0200] Philipp Hahn <hahn@univention.de>:
+
+  * Bug #47114: Fix memory leak
 
 0.9.7-4 [Wed, 27 Jan 2016 18:12:51 +0100] Florian Best <best@univention.de>:
 

<http://10.200.17.11/4.3-1/#7474818989236411566>
Comment 12 Jürn Brodersen univentionstaff 2018-07-11 14:00:59 CEST
Created attachment 9589 [details]
reproduce memory leak

How to reproduce:
Download mem_eater.py
Add:
'''
def _big_exception():
  raise Exception(['foo'] * 1000000)
return lambda _1, _2: _big_exception()
'''
to univention/management/console/modules/top/__init__.py:55 (inside at the top of the query function)
python mem_eater.py
Watch the memory consumption
Comment 13 Jürn Brodersen univentionstaff 2018-07-11 14:07:36 CEST
As an alternative to changing the api, it seems adding a manual garbage collection then the thread is removed from the global _threads variable has an comparable effect.
As a drawback that wouldn't help if some other code holds on to the thread object.
Comment 14 Arvid Requate univentionstaff 2018-07-12 14:47:52 CEST
I think Comment 8 / Comment 9 need to be addressed before releasing this.
Comment 15 Philipp Hahn univentionstaff 2018-07-31 17:29:15 CEST
(In reply to Arvid Requate from comment #14)
> I think Comment 8 / Comment 9 need to be addressed before releasing this.

As discussed lets fix UMC and Co:
[4.3-1] 4d9287adf2 Bug #47114: Fix notifyer traceback memory leaks YAML
[4.3-1] dc3023661c Bug #47114: Fix notifyer traceback memory leaks
[4.3-1] fe1c5308cd Bug #47114 USS: Fix notifier traceback memory leak
[4.3-1] 1d3e78dc87 Bug #47114 UAC: Fix notifier traceback memory leak
[4.3-1] 96965b711c Bug #47114 join: Fix notifier traceback memory leak
[4.3-1] d381fa1933 Bug #47114 UMC: Fix notifier traceback memory leak

After successful ucs-test and QA cherry-pick to 4.2-4 is needed.
Comment 16 Philipp Hahn univentionstaff 2018-07-31 17:35:50 CEST
Package: univention-system-setup
Version: 11.0.5-11A~4.3.0.201807311709
Branch: ucs_4.3-0
Scope: errata4.3-1

Package: univention-join
Version: 10.0.0-18A~4.3.0.201807311721
Branch: ucs_4.3-0
Scope: errata4.3-1

Package: univention-appcenter
Version: 7.0.2-18A~4.3.0.201807311723
Branch: ucs_4.3-0
Scope: errata4.3-1

Package: univention-management-console
Version: 10.0.6-9A~4.3.0.201807311732
Branch: ucs_4.3-0
Scope: errata4.3-1

[4.3-1] 90c1cd94e8 Bug #47114: Fix notifyer traceback memory leaks YAML 2
Comment 17 Jürn Brodersen univentionstaff 2018-08-07 09:59:03 CEST
Tracebacks from inside threads don't print the context from inside the thread anymore but instead the context during which they were reraised.

__error_handling
management/univention-management-console/src/univention/management/console/base.py:307

L292 calls __error_handling if a traceback occurred inside a threads
L258 calls __error_handling for "normal" tracebacks

I suggest instead of reraising the exception we raise a more generic exception like "UmcOperationsError" with the already formatted traceback inside the message.
Comment 18 Florian Best univentionstaff 2018-08-07 11:29:59 CEST
(In reply to Jürn Brodersen from comment #17)
> I suggest instead of reraising the exception we raise a more generic
> exception like "UmcOperationsError" with the already formatted traceback
> inside the message.

No please not, it was hard work to get rid of such an exception handling.
Comment 19 Jürn Brodersen univentionstaff 2018-08-07 11:35:46 CEST
(In reply to Florian Best from comment #18)
> (In reply to Jürn Brodersen from comment #17)
> > I suggest instead of reraising the exception we raise a more generic
> > exception like "UmcOperationsError" with the already formatted traceback
> > inside the message.
> 
> No please not, it was hard work to get rid of such an exception handling.

Any Ideas? :)

The traceback object is gone at that point and I would really like to see the context from inside the thread.
Comment 20 Florian Best univentionstaff 2018-08-07 12:15:21 CEST
Created attachment 9622 [details]
patch

Maybe this?
Comment 21 Philipp Hahn univentionstaff 2018-08-07 15:52:55 CEST
(In reply to Florian Best from comment #20)
> Created attachment 9622 [details]
> Maybe this?

Yes, which one small fix. Thanks:
[4.3-1] 51e79b6b3e Bug #47114 UMC: Fix traceback after fixing memory leak
 .../univention-management-console/debian/changelog |  6 ++++
 .../src/univention/management/console/base.py      | 32 ++++++++++++++++++++--
 2 files changed, 36 insertions(+), 2 deletions(-)

Package: univention-management-console
Version: 10.0.6-10A~4.3.0.201808071548
Branch: ucs_4.3-0
Scope: errata4.3-1

[4.3-1] 2c4274e7fa Bug #47114 UMC: Fix traceback after fixing memory leak YAML
 doc/errata/staging/univention-management-console.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

OK: errata-announce -V --only univention-management-console.yaml
Comment 22 Jürn Brodersen univentionstaff 2018-08-10 13:41:24 CEST
Looks good now :)

I will check on Tuesday if no other packages block the release and set this to verified if possible.
Comment 23 Jürn Brodersen univentionstaff 2018-08-10 14:12:34 CEST
(In reply to Jürn Brodersen from comment #22)
> Looks good now :)
> 
> I will check on Tuesday if no other packages block the release and set this
> to verified if possible.

OK jenkins will check this -> Verified