Univention Bugzilla – Bug 49940
Check RAM over-commitment before live migration by only using available memory
Last modified: 2023-06-28 10:46:09 CEST
+++ This bug was initially created as a clone of Bug #48098 +++ UVMM should detect CPU/RAM/... over-commitment when a live migration is triggered and issue a warning. A customer has regular problems with over-commitment because they have quite a lot of KVM servers and always checking resources of both migration partners quickly becomes a tedious task. Additionally to the fix from the original bug we need to check for the migration only for available memory. In customer environment after a live migration the host (and all VMs) went unusable due to the high amount of swap in/out or low memory ressources. The check should consider the possiblity of a memory leak and therefor not use the total amount of memory but the available amount of memory (which is free+bug/cache).
I think the best/savest solution should be: Free/Useable RAM for new/migrated VMs = "Total RAM by e.g. by free" - "Max useable RAM of ALL VMs" - "RAM used by Host System excl. VMs" - "Buffer from uvmm/overcommit/reserved e.g for system qemu-Processes"
We should add an additional check to test whether the configured RAM of a to be migrated or to be started virtual machine fits in the currently available free memory. We can not check whether some process might have a memory leak now or in the future. This needs to be monitored with other available tools (Nagios, UCS Dashboard) as it can occur at any time, not only while a virtual machine is started / migrated. The "uvmm/overcommit/reserved" configuration is meant to include both the needed RAM for standard services on the node and RAM needed to manage virtual machines and other QEMU related needs.
From my understanding the support case related to this Bug Report is closed, so I unset the "Waiting for Support" flag.
This issue has been filed against UCS 4.3. UCS 4.3 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed. If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.