On 02/09/11 11:41 AM, Colin Eby wrote:
Toby,
One thing to consider is how the format deals
with damage....
Good thought. I wonder if that integrity functional role could be delegated
to the container format rather than the payload element.
I don't believe that trying to do integrity measures at the wrapper
level will work well. Even if you checksum blocks, and can recover at a
block level, you don't know what the recoverable boundaries are. If the
payload is a compressed bitstream (that is not designed for
recoverability), then no matter what chunk-level recovery is done by the
wrapper or transport, you can't recover anything beyond the first
damaged bit.
HOWEVER, if the file never leaves a high-integrity environment (like
ZFS) during its lifetime, the data format itself can be as fragile as
you like. All the error detection and healing is done at a layer UNDER
the file (in ZFS' case, redundant vdev). That's a fine approach but
clearly only possible for tightly controlled situations (and not much
good for remote file servers using ordinary protocols, because the
network is relatively unreliable).
The payload
doesn't get internally marked up with checksum blocks, but we rely on
LZW/LZW2/other as the guarantor of file integrity. The rest of the
scavenging is done by the metadata/descriptor element (like card 1 = byte
[xxx]-byte[yyy]). Any damage or ambiguity is noted in this external
metadata rather than the actual capture blob. The container ensures that
the integrity of the contents are the same as when they were created. The
descriptor describes what was known at creation time. Workable?
I'm not sure I understand what this adds to recoverability. Can you
explain another way?
I guess my point is that eventually you have to design the payload for
recoverability. On the other hand... backups. :)
--Toby
Regards,
Colin Eby