Univention Bugzilla – Bug 48098
Check RAM over-commitment before live migration
Last modified: 2019-07-31 08:26:37 CEST
UVMM should detect CPU/RAM/... over-commitment when a live migration is triggered and issue a warning. A customer has regular problems with over-commitment because they have quite a lot of KVM servers and always checking resources of both migration partners quickly becomes a tedious task.
RAM is a hard limit: Actually starting too many VMs might crash the host system, which leads to the loss of all runtime state of all running VMs. CPU is a soft limit: Less a problem, but all VMs are penalized by getting less CPU time.
A suggestion for the implementation: - If we reach the RAM over-commitment limit while creating an instance, we should show a warning. - If we reach the RAM over-commitment limit while starting an instance, we should abort with an error message. - If we reach the RAM over-commitment limit while migrating a stopped instance, we should show a warning. - If we reach the RAM over-commitment limit while performing a live migration, we should abort with an error message before starting the migration.
Having a simple warning will not prevent high I/O on the KVM servers during migration. Or at least not without including a HUGE buffer... When a VM with a good amount of virtual memory will be migrated to a KVM server all memory pages appear to be "active" for the target KVM server. In practice this resulted in heavy swapping of the target host even though memory overcommitment was only around +1% of RAM. The only way to prevent such conditions seems to disallow memory overcommitment from Linux Kernel (see https://www.kernel.org/doc/Documentation/vm/overcommit-accounting) additionally with disallowing swap usage for all VM's by locking (see https://libvirt.org/formatdomain.html#elementsMemoryBacking) together with a hard_limit through virsh (see ftp://libvirt.org/libvirt/virshcmdref/html/sect-memtune.html).
[4.3-3] a1ab5d999d Bug #48098 UVMM: RAM overcommitment Package: univention-virtual-machine-manager-daemon Version: 7.0.0-20A~4.3.0.201903061506 Branch: ucs_4.3-0 Scope: errata4.3-3 [4.3-3] 3948ec7d05 Bug #48098 UVMM: RAM overcommitment YAML .../staging/univention-virtual-machine-manager-daemon.yaml | 11 +++++++++++ 1 file changed, 11 insertions(+) TODO: Merge to 4.4-0 after QA FYI: UCRV 'uvmm/overcommit/reserved' is a *global* limit, which applies to *all* hosts and needs only be defined on the host where UVMMd runs. By default it is unset, so the hard limit on start/migration is not enforced. For this set the UCVR to at least "1" byte. For simplicity that amount is subtracted from the nodes physical memory. FYI: Some useful commands for QA: uvmm query "qemu://$(hostname -f)/system" | grep Mem curMem := sum_{running VMs}(currently configured memory)¹ maxMem := sum_(all VMs)(maximum configured memory) phyMem := physical memory of host - reserve ¹: think memory ballooning virsh nodememstats QA: Feel free to use xen1(=dc0) and xen16
[4.3-3] 8d17cf6118 Bug #48098 UVMM: Add RAM overcommit protection - spelling fixes .../univention-virtual-machine-manager-daemon/debian/changelog | 6 ++++++ .../univention-virtual-machine-manager-daemon/src/de.po | 8 ++++---- .../univention-virtual-machine-manager-daemon/umc/js/de.po | 8 ++++---- 3 files changed, 14 insertions(+), 8 deletions(-) Package: univention-virtual-machine-manager-daemon Version: 7.0.0-21A~4.3.0.201903061839 Branch: ucs_4.3-0 Scope: errata4.3-3 [4.3-3] 836f614a3e Bug #48098: univention-virtual-machine-manager-daemon 7.0.0-21A~4.3.0.201903061839 doc/errata/staging/univention-virtual-machine-manager-daemon.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Offline migrated host never appear on the target host. 2019-03-07 11:43:25,158 - uvmmd.node - INFO - Domain backuped to 8cc06646-245f-4a26-a662-05a2041a42ed..xml.save. 2019-03-07 11:43:25,160 - uvmmd.node - INFO - Starting migration of domain "8cc06646-245f-4a26-a662-05a2041a42ed" to host "qemu://slave3.univention.intranet/system" with flags 618 2019-03-07 11:43:25,284 - uvmmd.node.livecycle - ERROR - qemu://slave3.univention.intranet/system: Exception handling callback Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 925, in livecycle_event domStat = Domain(dom, node=self) File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 316, in __init__ self.pd.os_type = domain.OSType() File "/usr/lib/python2.7/dist-packages/libvirt.py", line 455, in OSType if ret is None: raise libvirtError ('virDomainGetOSType() failed', dom=self) libvirtError: Domain not found: no domain with matching uuid '8cc06646-245f-4a26-a662-05a2041a42ed' (ucs4-64-foo) 2019-03-07 11:43:25,285 - uvmmd.node - INFO - Finished migration of domain "8cc06646-245f-4a26-a662-05a2041a42ed" to host "qemu://slave3.univention.intranet/system" with flags 618
(In reply to Jürn Brodersen from comment #6) > Offline migrated host never appear on the target host. ... > 2019-03-07 11:43:25,160 - uvmmd.node - INFO - Starting migration of domain "XXX" to host "XXX" with flags 618 > 2019-03-07 11:43:25,284 - uvmmd.node.livecycle - ERROR - ... > libvirtError: Domain not found: no domain with matching uuid 'XXX' (XXX) Your slave3 has on old version of package libvirt*, which still is affected by Bug #47617 comment 5.
OK: uvmm/overcommit/reserved=0 Only warning for new vms with more ram than physical available is shown -> OK uvmm/overcommit/reserved=1073741824 reserved ram is subtracted from total ram in tree -> OK No overcommit error if enough ram is available -> OK overcommit error before live migration -> OK overcommit error before offline migration -> OK overcommit error before vm start -> OK Profiles work -> OK Wizard works -> OK YAML -> OK
<http://errata.software-univention.de/ucs/4.3/452.html>