ddr_lzo − Data de/compression plugin for dd_rescue
-L
lzo[=option[:option[:...]]]
or
-L /path/to/libddr_lzo.so[=option[:option[:...]]]
About
LZO is an algorithm that de/compresses data. It is tuned for
speed (especially decompression speed) and trades the size
of the compressed file for it to some degree. There are
variants with slow compression (yet still very fast
decompression) available though. See the algorithm parameter
below.
This plugin has been written for dd_rescue and uses the plugin interface from it. See the dd_rescue(1) man page for more information on dd_rescue.
Options are passed using dd_rescue option passing syntax: The name of the plugin (lzo) is optionally followed by an equal sign (=) and options are separated by a colon (:). the lzo plugin also allows for most options to be abbreviated to five or six letters. See the EXAMPLES section below.
Compression
or decompression
The lzo dd_rescue plugin (subsequently referred to as just
ddr_lzo which reflects the variable parts of the filename
libddr_lzo.so) choses compression or decompression mode
automatically if one of the input/output files has an [lt]zo
suffix; otherwise you may specify compr[ess] or
decom[press] parameters on the command line.
The parameter opt[imize] will tell ddr_lzo to do an
optimization pass after compression. This might speed up
decompression by a few percent when creating compressed data
with high compression levels and large block sizes.
The plugin also supports the parameter bench[mark] ; if it’s specified, it will output some information about CPU usage and resulting compression or decompression bandwidth. (For small files, the numbers become meaningless due to jitter and limited time resolution -- ddr_lzo will skip the output if the numbers are very tiny.)
De/compression
algorithm
The lzo plugin supports a number of the (de)compression
algorithms from liblzo2. You can specify which one you want
to use by passing algo=XXX , where XXX can be
lzo1x_1, lzo1x_1_15, lzo1x_999, lzo1x_1_11, lzo1x_1_12,
lzo1y_1, lzo1y_999, lzo1f_1, lzo1f_999, lzo1b_1 ... lzo1b_9,
lzo1b_99, lzo1b_999, lzo2a_999. Pass algo=help to get
a list of available algorithms. Consult the liblzo
documentation for more information on the algorithms. Note
that only the first three are supported by lzop (it can
decompress the first five though, as they’re all
handled by the same decompression routine).
The default (lzo1x_1) is a good choice for fast compression
and very fast decompression and ensures compatibility with
lzop. For higher compression you might want to chose
lzo1x_999, which is very slow but lzop compatible or
lzo2a_999, which is twice as fast, but not compatible with
lzop.
Debugging
The debug flag will cause the ddr_lzo to output
information about blocks and other internal data. It’s
meant for debugging purposes.
Finally there is also a flags=XXXX parameter. This sets the flags field in the header (default is 0x03000403) and is used for testing only. It is not sanity checked and you can easily set values that will break decompression or cause ddr_lzo to abort. Really only use for development purposes when you know meaning of the various bits.
Error
recovery
On compression, when input bytes can’t be read,
ddr_lzo will encode holes in the compressed output file --
these will be skipped over on decompression.
On
decompression, erroneous blocks can be detected by the
checksums (most often) or by the decompressor. The lzo
plugin tries to continue in that case if the block header
that specifies de/compressed lengths is intact. It will then
result in a block being skipped over (hole) and the
decompression will be continued with the next block. This
avoids corrupt data to end up in the output file (or
preexisting, potentially good data there being overwritten).
The behaviour can be modified by specifying the
nodisc[ard] option. When given, the
decompressor’s output (filled up with zeros if too
short for the block) will be written to the output file.
Even if we know that the data is incorrect, with some luck,
parts of the block may actually be valid.
When the block headers are corrupt, your situation is desperate, as you will have lost the remainder of the file. To recover pieces after such a block header corruption, ddr_lzo supports the search option. With it, the plugin will search the input file (starting from the position given in dd_rescue with -s) for data that looks like a block header and if a valid looking header is found, it will start decompressing from that position. (If you can’t find the data you look for, you might actually study the output generated with the debug flag.)
dd_rescue supports appending to files with the -x/--extend option. If ddr_lzo is loaded and the output file is an existing .lzo file, the new data will be appended in the format specified by the existing LZOP header. If the header does not indicate a multipart (archive) file, the EOF marker will be overwritten, so that a valid .lzo file is created. Otherwise a new part will be appended.
When dd_rescue
can’t read data or a sizable amount of zero-filled
data is found and the -a/--sparse option is active, then
dd_rescue will create sparse files (files with holes
inside). This is an optimization to save space -- the holes
are interpreted as zeroes again on normal reads, so this is
transparent. The holes also can be useful to ensure that
good data is not overwritten with zeroes when data
couldn’t be read.
When the lzo module gets fed holes in compression mode, it
will encode them in the compressed output file in a special
way (using lzop multipart feature, as lzop unfortunately
chokes on blocks with 0 compressed length). On
decompression, the holes will result in the data being
jumped over again (creating a hole in the output file, if no
data preexists at the location).
The plugin uses
the lzo1x_1 algorithm by default (just like lzop does by
default) and generates adler32 checksums to allow detecting
data corruption. The compressed files are compatible with
lzop and ddr_lzo should handle files generated by lzop.
Multipart (archive) files from lzop are decompressed to ONE
output file in the order they are stored.
Multipart files created by the lzo plugin to encode holes
will be extracted to several files from lzop. The holes are
encoded in the filenames (with a sequence number and the
hole size up to 1TB; use the timestamp for huge holes), so a
proper assembly of the fragments is possible even without
ddr_lzo.
lzop only
supports the lzo1x_ family of algorithms. If you chose
another algorithm to compress data with ddr_lzo, it will set
the needed_version_to_extract field in the resulting lzop
file to ddr_lzo’s own version (1.789) to indicate
incompatibility with lzop (as of 1.03).
lzop by default uses block sizes of 256kiB (on Unix
systems), but supports de/compression with smaller block
sizes as well. It needs to be recompiled to support block
sizes up to a possible maximum of 64MiB. Thus staying below
or at 256kiB is recommended; even when lzop compatibility is
no concern, blocks larger than 16MiB are not recommended,
see below.
Blocksize
considerations
When decompressing, the (soft) block size chosen in
dd_rescue must be sufficient (at least half the size of the
blocksize used when compressing); if you chose too small
blocks, ddr_lzo will warn and exit.
For compression, the chosen (soft)blocksize in dd_rescue
will determine the size of blocks to be fed to the
lzo??_?_compress() routines. Larger block sizes will
typically result in slightly better compression ratios,
though the returns on increasing the block size quickly
diminish after 64k.
The default from dd_rescue (128kiB) is a good choice. It is
NOT recommended to increase the block size too much -- when
an lzo file gets corrupted, at least one block will be lost;
larger blocks result in larger damage. Also, blocks larger
than 16MiB will not work well with the error tolerance
features of ddr_lzo. Also note that blocks larger than
256kiB need recompilation of lzop if you want to be able to
use lzop to process the .lzo files; blocks larger than 64MiB
prevent decompression even with a recompiled lzop.
Maturity
The plugin is new as of dd_rescue 1.43. Do not yet rely on
data saved with ddr_lzo as the only backup for valuable
data. Also expect some changes to ddr_lzo in the not too
distant future. (This should not break the file format, as
we’re following lzop ....)
Compressed data is more sensitive to data corruption than
plain data. Note that the checksums (adler32 or crc32) in
the lzop file format do NOT allow to correct for errors;
they just allow a somewhat reliable detection of data
corruption. (Ideally, a 32bit checksum just misses 1 out of
2^32 corruptions; on small changes, crc32 comes a bit closer
to the ideal than adler32. You may pass the crc32
option to use crc32 instead of adler32 checksums at the
expense of some speed -- unfortunately the crc32 polynomial
for lzop/gzip/... is not the crc32c polynomial that has
hardware support on many CPUs these days.) Also note that
the checksums are NOT cryptographic hashes; a malicious
attacker can easily find modifications of data that do not
alter the checksums. Use MD5 or better SHA-256/SHA-512 for
ensuring integrity against attackers. Use par2 or similar
software to create error correcting codes (Reed-Solomon /
Erasure Codes) if you want to be able to recover data in
face of corruption.
Security
While care has been applied to check the result of memory
allocations ..., the decompressor code has not been audited
and only limited fuzzing has been applied to ensure
it’s not vulnerable to malicious data -- be careful
when you process data from untrusted sources.
dd_rescue −ptAL lzo=algo=lzo1x_1_15:compress,hash=alg=sha256 infile outfile
compresses data from infile into outfile using the algorithm lzo1x_1_15 and calculates the sha256 hash value of outfile. outfile will have time stamp and access rights copied over from infile and it will be emptied before (if the file happens to exist). The output file won’t have encoded holes; errors in the infile will result in zeros.
dd_rescue −aL MD5,lzo=compr:bench,MD5,lzo=decompress,MD5 infile infile2
will copy infile to infile2 compressing the data and decompressing it again on the fly. It will output MD5 hashes for the compressed data as well (though it’s not stored) and for the two infiles -- the output should be identical, obviously. This command is rather artificial, used for testing. The -a flag makes dd_rescue detect zero blocks and create holes, thus testing hole encoding (sparse files) and decoding as well if the infile has sizable regions filled with zeros.
dd_rescue −s1M −S0 -L lzo=search,nodiscard infile.lzo outfile
will search for a lzop block header in infile.lzo starting at position 1MiB into the file and decompress the remainder of the file. On finding corrupted blocks, it will still write the output from the decompressor to outfile.
dd_rescue(1) liblzo2 documentation lzop(1)
Kurt Garloff <kurt@garloff.de>
The liblzo2
library and algorithm has been written by Markus Oberhumer.
http://www.oberhumer.com/opensource/lzo/
This plugin is under the same license as dd_rescue: The GNU General Public License (GPL) v2 or v3 - at your option.
ddr_lzo plugin was first introduced with dd_rescue 1.43 (May 2014).
Some additional
information can be found on
http://garloff.de/kurt/linux/ddrescue/