Bug 48858 - Print Progress During Live Migration
Print Progress During Live Migration
Status: CLOSED WONTFIX
Product: UCS
Classification: Unclassified
Component: Virtualization - UVMM
UCS 4.3
Other Linux
: P5 enhancement (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-03-05 12:58 CET by Christian Völker
Modified: 2023-06-28 10:46 CEST (History)
5 users (show)

See Also:
What kind of report is it?: Feature Request
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2019030521000438
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Völker univentionstaff 2019-03-05 12:58:24 CET
When migrating a VM through UVMM something like a progress bar would be helpful. Or at least an information about the estimated time remaining.
Comment 1 Philipp Hahn univentionstaff 2019-03-05 13:45:26 CET
(In reply to Christian Völker from comment #0)
> When migrating a VM through UVMM something like a progress bar would be
> helpful. Or at least an information about the estimated time remaining.

Already implemented in UVMM since <http://errata.software-univention.de/ucs/4.3/382.html> as a side effect of Bug #47617:
in the UVMM domain overview hover the mouse over the VM name; you will see a tooltip with:

> Status: läuft
> Server: dc0.phahn.dev
> VNC-Port: 5900
> Migration gestartet vor 21.912 sec, Iteration 1

You can only see that a (live-)migration is in progress and how long it already took.
You will NOT get a remaining time as that time is UNKNOWN: Depending on what happens inside your VM, other VMs running on the same host, the network bandwidth utilization, and many more factors, a migration actually may NEVER migrate successfully.

Showing a progress bar would count down to "0 byte/seconds remaining" just to jump up again and to continue that loop ad Infinitum.

We did not make to action modal so Admins can do other things in parallel; for example pause the VM by hand to make the migration finally converge.

PS: The RAM content of the VMs needs to be transferred over the network. That needs time. During that time the VM continues running and dirties its memory, which needs to be re-transferred (again). If the VM dirties its memory faster than the network can transfer it, the migration never converges. You have 4 options:
- abort the migration
- allow longer down-time for the final round of memory transfer
- throttle the CPUs to slow down their dirtying - pausing the VM is the extreme
- force post-copy-migration which immediately transfers execution to the target host; the remaining dirty memory is pulled "as needed" which then causes stalls or in the worst case might crash the VM if the network goes down before all memory could have been fetched.
Comment 2 Oliver Bohlen 2019-03-06 08:30:56 CET
I think a "virsh migrate" with the --verbose option which dispays the percentage of migration progress viewed how ever in the UVMM GUI will help to see/identify if there is such a problem with re-transferring RAM-VM-dirties again and again.

Another good reason for showing that there is something going on in the UVMM GUI is to show the user that the work is even in progress. If there is nothing happening for a longer time it looks for the user like the progress ins hanging or in any undefined state.

Btw. in VMware there is a progress view for Live-Migration too in the GUI.
Comment 3 Philipp Hahn univentionstaff 2019-03-06 09:40:57 CET
(In reply to Oliver Bohlen from comment #2)
> I think a "virsh migrate" with the --verbose option which displays the
> percentage of migration progress viewed.

No, it does not show any expected or remaining time directly; you have to do some math yourself:
> # domjobinfo ucs43
> Job type:         Unbounded   
this indicates that there is no upper bound.

> Time elapsed:     91744        ms
This we shown in UMC

> Data processed:   9,762 GiB
> Data remaining:   717,875 MiB
> Data total:       4,016 GiB
This is the sum of RAM and Disk and other data. The VM only has 4 GiB (total), but already twice as much was transferred.

> Memory processed: 9,762 GiB
> Memory remaining: 717,875 MiB
> Memory total:     4,016 GiB
This is only RAM - "processed" and "remaining" will increase with each iteration.

> Dirty rate:       29196        pages/s
The rate pages are dirtied. Calculated after each new iteration only.

> Iteration:        5
This we shown in UMC

> Constant pages:   428760
This is the number of 4K pages, which were NOT changed during the last iteration. All other pages were changed and need a re-transfer.

> Expected downtime: 1023         ms
The expected time the VM needs to be paused to transfer the last state. During that time the VM will not response. You can increase the allowed downtime, but that depends on your definition of "live" migration.



> how ever in the UVMM GUI will help
> to see/identify if there is such a problem with re-transferring
> RAM-VM-dirties again and again.
> 
> Another good reason for showing that there is something going on in the UVMM
> GUI is to show the user that the work is even in progress. If there is
> nothing happening for a longer time it looks for the user like the progress
> ins hanging or in any undefined state.

As stated in comment 1 UVMM already indicated a migration is in progress:
- a different icon is shown in UMC for the VM
- the tool-tip of the VM shows some data
- this includes the "iteration count": if it is is ever (slowly) increasing, you have the mention problem.

I'm not against improving the displayed information, just stating that we already display some information. And when we implemented the current solution, we already discussed the problem internally and decided to go for the current minimal information as it was not our main concern of Bug #47617 back than.

We can do something like this:
- calculate the currently used/available network bandwidth:
  network_rate := data_total / time_elapsed
- if (dirty_rate * 4KiB) > network_rate:
    "show warning that migration will not converge."
  else
    estimated_time := data_remaing / network_rate

The data is already available to the front-end, no change the UVMMd should be necessary.

Again: this is only an estimation, but it it helps, I'm all for doing this.
Comment 4 Philipp Hahn univentionstaff 2019-07-23 12:47:23 CEST
*** Bug 49909 has been marked as a duplicate of this bug. ***
Comment 5 Ingo Steuwer univentionstaff 2021-05-14 13:46:10 CEST
This issue has been filed against UCS 4.3.

UCS 4.3 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed.

If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.