Bug 34984 - image decompression with python-lzma destroys "sparse" feature in ucc images
image decompression with python-lzma destroys "sparse" feature in ucc images
Status: CLOSED WONTFIX
Product: Z_Univention Corporate Client (UCC)
Classification: Unclassified
Component: Image management
unspecified
Other Linux
: P5 enhancement
: UCC 3.x
Assigned To: UCC maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-05-27 10:52 CEST by Felix Botner
Modified: 2023-06-28 10:33 CEST (History)
3 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:
dormann: Patch_Available+


Attachments
Patch for ucc-image-toolkit so images will still be sparse files after decompressing (534 bytes, patch)
2014-06-25 11:31 CEST, Drees Dormann
Details | Diff
New aproach: split uncompressed date into 4k blocks and check them seperatly, so we can keep the progress functionality of the original code (1.30 KB, patch)
2014-06-26 12:07 CEST, Drees Dormann
Details | Diff
Newer Version of patch, changed according to alexanders proposals (1.42 KB, patch)
2014-07-03 09:05 CEST, Drees Dormann
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Botner univentionstaff 2014-05-27 10:52:10 CEST
Afte downloading and decompression (pyhton-lzma) the ucc image is no longer a sparse file:

16G     /var/lib/univention-client-boot/ucc-2.0-desktop-image.img

When I decompress the xz file with unxz, the actual image size is much smaller 

3,8G    /opt/ucc-2.0-desktop-image.img

We losing the "sparse" feature when decompressing with python-lzma. Not tragic at the moment but maybe we can keep the image a sparse file for future optimizations (e.g. transferring the sparse file to the ucc client instead of the whole 16GB image).
Comment 1 Moritz Muehlenhoff univentionstaff 2014-05-27 11:10:59 CEST
Three possible options:
- Check whether Pylzma can be instructed to write the decompressed file in sparse mode
- Wrap the image file in a tar archive (which preserves the sparseness)
- Re-sparse the downloaded file locally using with "cp --sparse=always"
Comment 2 Alexander Kläser univentionstaff 2014-06-17 12:00:25 CEST
(In reply to Moritz Muehlenhoff from comment #1)
> Three possible options:
> - Check whether Pylzma can be instructed to write the decompressed file in
> sparse mode
> - Wrap the image file in a tar archive (which preserves the sparseness)
> - Re-sparse the downloaded file locally using with "cp --sparse=always"

AFAIS, this is in the our usage not a Pylmza issue, as we are writing the file data ourselves to the hard disk:

> [...]
> decompressor = lzma.LZMADecompressor()
> [...]
> with contextlib.nested(open(infile, 'rb'), open(outfile, 'wb')) as (fin, fout):
>   [...]
>   while True:
>     [...]
>     compressed_data = fin.read(DEFAULT_CHUNK_SIZE)
>     [...]
>     uncompressed_data = decompressor.decompress(compressed_data)
>     fout.write(uncompressed_data)
>     [...]

If I see it correctly, sparse files could be created by checking the current uncompressed_data chunk (8KiB in it size) whether it contains only 0s. If so, this chunk can be skipped via fout.seek().

Some example code that I found: http://blogs.tulsalabs.com/?p=166
Comment 3 Alexander Kläser univentionstaff 2014-06-17 12:02:24 CEST
Drees, could you please provide a patch for this issue (as attachment for this bug).
Comment 4 Drees Dormann univentionstaff 2014-06-25 11:31:19 CEST
Created attachment 5970 [details]
Patch for ucc-image-toolkit so images will still be sparse files after decompressing
Comment 5 Drees Dormann univentionstaff 2014-06-25 11:32:30 CEST
created a patch according to Alex's proposed method,
if decompressed chuck only consists of 0-bytes it will ke skipped in output file
Comment 6 Moritz Muehlenhoff univentionstaff 2014-06-25 11:51:30 CEST
Was this built already in a scope targeted for release? If no, please don't mark this as RESOLVED yet.
Comment 7 Drees Dormann univentionstaff 2014-06-26 12:07:54 CEST
Created attachment 5971 [details]
New aproach: split uncompressed date into 4k blocks and check them seperatly, so we can keep the progress functionality of the original code

Data will be decompressed like in the original code, then split into 4 kb blocks.
Each block will be checked if it consists solely of 0-bytes (which wil not be written)
This way the original progress function can be preserved.
Comment 8 Drees Dormann univentionstaff 2014-07-03 09:05:58 CEST
Created attachment 5985 [details]
Newer Version of patch, changed according to alexanders proposals
Comment 9 Felix Botner univentionstaff 2016-06-27 10:48:36 CEST
Patch does work (can't mount the image). This is no option for now
Comment 10 Philipp Hahn univentionstaff 2023-06-28 10:30:37 CEST
UCC is EoL