Bug 30504 - Second snapshot takes a long time
Second snapshot takes a long time
Status: CLOSED WORKSFORME
Product: UCS
Classification: Unclassified
Component: Virtualization - KVM
UCS 4.2
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
:
Depends on: 22231
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-20 07:36 CET by Stefan Gohmann
Modified: 2023-06-28 10:46 CEST (History)
1 user (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 2: Improvement: Would be a product improvement
Who will be affected by this bug?: 3: Will affect average number of installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.069
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Gohmann univentionstaff 2013-02-20 07:36:43 CET
The first snapshot of a default Windows 7 VM takes about 4 seconds. The second snapshot takes about 6 minutes. I use paravirtual devices and qcow2 with cache=none.

I can reproduce this with 3.1-0errata0 and the current errata versions which are in the QA.
Comment 1 Stefan Gohmann univentionstaff 2013-02-20 07:41:36 CET
I didn't see this issue in our own virtualization environment. Maybe it depend on the client hardware. My test hardware is desktop hardware (xen13).
Comment 2 Philipp Hahn univentionstaff 2013-02-20 08:39:16 CET
I've had a look at it yesterday and also got very varying results: from 30s to 6 minutes. The cache setting (expect unsafe) does not affect the times very much, so I also conclude that there is some IO problem with the test hardware: The fdatasync() does horrible things to performance when the each 32 KiB data block is physically flushed to the hard disk. On server hardware with battery backed caching, that's fast, on consumer grade hardware it is abyssal slow.

Newer qemu (AFAIK >= 1.2 or 1.3) the block layer was rewritten to not use the same code path for doing snapshots as with running VMs: For running VMs the sync() should only return, when the data is really stable, but for snapshots a single sync() after all writes is sufficient, as that process is singe threaded.

I've added Bug #22231 as a dependency of this bug, because there I already did some work on performance improvements. It also contains some timing data from previous tests.
Comment 3 Philipp Hahn univentionstaff 2013-02-21 08:27:25 CET
There is a patch series for qemu-1.4 which reduces the time for qcow2 internal snapshot creation significantly, from more than 3 minutes to under 1 second.

<http://lists.nongnu.org/archive/html/qemu-devel/2013-02/msg02987.html>
Comment 4 Philipp Hahn univentionstaff 2013-10-10 11:16:31 CEST
Patch for qemu-1.6 <http://lists.gnu.org/archive/html/qemu-devel/2013-09/msg01186.html>
Comment 5 Philipp Hahn univentionstaff 2013-10-25 09:07:54 CEST
There's also <http://git.qemu-project.org/?p=qemu.git;a=commit;h=211ea74022f51164a7729030b28eec90b6c99a08> which fixes a performance-drop after a VM was migrated/suspended.
The patch from qemu-1.6 does not apply to qemu-1.1.2 as-is and needs to be back-ported.
Comment 6 Philipp Hahn univentionstaff 2017-04-21 16:38:23 CEST
UCS-4.2 has Qemu-2.8, which has seen many improvements to Qcow2.