However the
question about fixing .Z files still stands - [...]
There is a fundamental problem
why it is not tractable in general.
The whole point of a compressor is to get rid of the redundancy as
much as possible [...]
Yes; to the extent that recovery from corrupted compressed data is
possible, it means that the compression is less than ideal. However,
most compression programs cannot take advantage of some of the
redundancy available; as a simple example, if a text about heart
disease suddenly starts talking about unusual stock market trading
styles, something has gone wrong - but that is unlikely to be redunancy
that a compressor can compress out. Indeed, few compressors are
capable of even so much as taking advantage of a text being (say)
almost entirely in English; corrupting the compressed form usually
results not in valid but inappropriate English words but rather in
non-words. That is the sort of redundancy that makes recovery
possible.
And compress (.Z) compression still leaves a lot of redundancy in the
compressed data. What I sketched was basically taking advantage of
that redundancy to deal with corrupted compressed data.
For example, I took the second paragraph above ("Yes; to...non-words.")
and compressed it to .Z, then changed a single 1 bit to a 0 (which I
think cannot produce an invalid code, though I'd have to think more to
be certain) and uncompressed it. Here's a diff between the original
and the result:
1,2c1,2
< Yes; to the extent that recovery from corrupted compressed data is
< possible, it means that the compression is deficient. However, most
---
Yes; to the extent that recovery from
corruptedesompressededata is
possible, it means that theesompression is deficient. However, most
7c7
< compress out. Indeed, few compressors are capable of even so much as
---
compress out. Indeed, fewesompressors are capable of
even so much as
9c9
< corrupting the compressed form usually results not in valid but
---
corrupting theesompressed form usually results not in
validebut
Obviously, there was redundancy lurking in the compressed form, and
that redundancy is what my suggested recovery tactics leverage.
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML mouse at rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B