Univention Bugzilla – Bug 34984
image decompression with python-lzma destroys "sparse" feature in ucc images
Last modified: 2023-06-28 10:33:04 CEST
Afte downloading and decompression (pyhton-lzma) the ucc image is no longer a sparse file: 16G /var/lib/univention-client-boot/ucc-2.0-desktop-image.img When I decompress the xz file with unxz, the actual image size is much smaller 3,8G /opt/ucc-2.0-desktop-image.img We losing the "sparse" feature when decompressing with python-lzma. Not tragic at the moment but maybe we can keep the image a sparse file for future optimizations (e.g. transferring the sparse file to the ucc client instead of the whole 16GB image).
Three possible options: - Check whether Pylzma can be instructed to write the decompressed file in sparse mode - Wrap the image file in a tar archive (which preserves the sparseness) - Re-sparse the downloaded file locally using with "cp --sparse=always"
(In reply to Moritz Muehlenhoff from comment #1) > Three possible options: > - Check whether Pylzma can be instructed to write the decompressed file in > sparse mode > - Wrap the image file in a tar archive (which preserves the sparseness) > - Re-sparse the downloaded file locally using with "cp --sparse=always" AFAIS, this is in the our usage not a Pylmza issue, as we are writing the file data ourselves to the hard disk: > [...] > decompressor = lzma.LZMADecompressor() > [...] > with contextlib.nested(open(infile, 'rb'), open(outfile, 'wb')) as (fin, fout): > [...] > while True: > [...] > compressed_data = fin.read(DEFAULT_CHUNK_SIZE) > [...] > uncompressed_data = decompressor.decompress(compressed_data) > fout.write(uncompressed_data) > [...] If I see it correctly, sparse files could be created by checking the current uncompressed_data chunk (8KiB in it size) whether it contains only 0s. If so, this chunk can be skipped via fout.seek(). Some example code that I found: http://blogs.tulsalabs.com/?p=166
Drees, could you please provide a patch for this issue (as attachment for this bug).
Created attachment 5970 [details] Patch for ucc-image-toolkit so images will still be sparse files after decompressing
created a patch according to Alex's proposed method, if decompressed chuck only consists of 0-bytes it will ke skipped in output file
Was this built already in a scope targeted for release? If no, please don't mark this as RESOLVED yet.
Created attachment 5971 [details] New aproach: split uncompressed date into 4k blocks and check them seperatly, so we can keep the progress functionality of the original code Data will be decompressed like in the original code, then split into 4 kb blocks. Each block will be checked if it consists solely of 0-bytes (which wil not be written) This way the original progress function can be preserved.
Created attachment 5985 [details] Newer Version of patch, changed according to alexanders proposals
Patch does work (can't mount the image). This is no option for now
UCC is EoL