Bug 33458 - Increasing CPU usage over time
Increasing CPU usage over time
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Virtualization - UVMM
UCS 3.0
Other Linux
: P3 normal (vote)
: UCS 3.2-0-errata
Assigned To: Erik Damrose
Philipp Hahn
:
: 28548 31370 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-18 15:55 CET by Janis Meybohm
Modified: 2014-03-12 14:43 CET (History)
3 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional): Large environments, UCS Performance, Usability
Max CVSS v3 score:


Attachments
Threded event loop UVMM performance problem (769 bytes, text/plain)
2014-01-22 10:49 CET, Philipp Hahn
Details
Threded event loop UVMM performance problem v2 (1.20 KB, text/plain)
2014-01-28 22:45 CET, Philipp Hahn
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Janis Meybohm univentionstaff 2013-11-18 15:55:19 CET
univention-virtual-machine-manager-daemon's CPU usage is increasing over time. In a customer's environment with 95 Xen Hosts and ~240 virtual machines the UVMM (running as virtual machine, 2GB Ram, 2 vCPU's) is unusable slow after 2 days (even over the weekend, without heavy frontend usage).
Comment 1 Stefan Gohmann univentionstaff 2013-11-22 07:48:34 CET
We should also check a backport to 3.1-1.
Comment 2 Philipp Hahn univentionstaff 2014-01-21 17:33:53 CET
Further debugging with a running "vmstat 1" showed, that after 1h the maximum number of open files (1024) was reached and UVMMd bombed, because it could no longer write its (cache) files:
in      cs      us      sy      id
6426    7984    5       2       93
7230    8795    43      19      38
35532   36021   41      18      40

/v/l/u/vmm.log shows failed TLS connection errors for 27 of the 93 servers, so after the initial ½,1,2s three new file descriptors get added every 5 minutes.
That is after 50 minutes the maximum 1024 is reached:
echo $((27*3*10 + 66*3)) # 27 failing * 3 FD/try * 50min/5m + 66 working = 1008

I can reproduct it locally with adding other libvirtd's, which use a different SSL CA.

  uvmm add qemu://xen16.knut.univention.de/system
  uvmm add qemu://xen14.knut.univention.de/system
  uvmm add qemu://skepp.knut.univention.de/system
  uvmm add qemu://krus.knut.univention.de/system
  uvmm add qemu://isalla.knut.univention.de/system
  uvmm add qemu://boksel.knut.univention.de/system
  uvmm add qemu://utby.knut.univention.de/system
  uvmm add qemu://xen2.knut.univention.de/system


# lsof -p $(pgrep -f /usr/sbin/univention-virtual-machine-manager-daemon) |
 awk -F ' ' '/TCP/ {print gensub(".*:[0-9]+->(.+):[0-9]+","\\1","g",$9),$10;}' | 
 sort |
 uniq -c
     19 192.168.0.109 (CLOSE_WAIT)
     19 192.168.0.203 (CLOSE_WAIT)
     19 192.168.0.205 (CLOSE_WAIT)
     19 192.168.0.238 (CLOSE_WAIT)
     19 192.168.0.87 (CLOSE_WAIT)
      1 *:2106 (LISTEN)
      1 xen12.phahn.dev (ESTABLISHED)
Comment 3 Philipp Hahn univentionstaff 2014-01-22 10:47:50 CET
For debugging libvirt: LIBVIRT_GNUTLS_DEBUG=1

My UVMMd failed this night and reproduced the high CPU load.

I've been unable to reproduce the FD leak by using "virsh", but the attached codes clearly shows the problem to be related to the event loop implementation. Notice that UVMMd is still using the pure-Python-implementation and not the default C implementation → Bug #31371)

There's already a very similar problem to TLS not working and the connection not being closed properly: Bug #31370.
Bug #20296 and Bug #20476 looks also like event-loop-problems.
Comment 4 Philipp Hahn univentionstaff 2014-01-22 10:49:12 CET
Created attachment 5741 [details]
Threded event loop UVMM performance problem

Will register the default event-loop and start 5 connections in parallel.
Number of open file descriptors is increasing each round.
Comment 5 Philipp Hahn univentionstaff 2014-01-28 22:45:43 CET
Created attachment 5759 [details]
Threded event loop UVMM performance problem v2

Start thread to run event loop.
Enable debug output.
Comment 6 Erik Damrose univentionstaff 2014-01-30 11:28:26 CET
The problem was using the outdated eventloop python code from libvirt. Changing to the internal libvirt implementation fixed the problem with open file handles.

In my tests i found no regressions while using UVMM: Start, Stop, Destroy, Snapshot-{Create|Revert} from UVMM and virsh.

r47521 univention-virtual-machine-manager-daemon 3.0.17-4.483.201401301119
r47522 update copyright
r47524 2014-01-30-univention-virtual-machine-manager-daemon.yaml
Comment 7 Philipp Hahn univentionstaff 2014-01-31 09:42:06 CET
OK: r47521,r47522,r47524
OK: aptitude install '?source-package(univention-virtual-machine-manager)?installed'
OK: announce_errata -V 2014-01-30-univention-virtual-machine-manager-daemon.yaml

OK: uvmm add qemu+tls://*.knut.univention.de/system # rejected → no new connections
OK: uvmm add qemu+tls://$(hostname -f)/systen # 3 new FDs
OK: uvmm remove qemu+tls://$(hostname -f)/systen # 3 FDs closed

OK: uvmm add xen://xen14.knut.univention.de/
OK: uvmm remove xen://xen14.knut.univention.de/

OK: lsof -p `pgrep -f /usr/sbin/univention-virtual-machine-manager-daemon`
OK: less /var/log/univention/virtual-machine-manager-daemon.log

OK: 3.0.17-4.483.201401301119
Comment 8 Moritz Muehlenhoff univentionstaff 2014-02-06 13:37:02 CET
http://errata.univention.de/ucs/3.2/51.html
Comment 9 Philipp Hahn univentionstaff 2014-02-28 08:16:11 CET
*** Bug 28548 has been marked as a duplicate of this bug. ***
Comment 10 Philipp Hahn univentionstaff 2014-03-12 14:43:30 CET
*** Bug 31370 has been marked as a duplicate of this bug. ***