On Wed, 11 Aug 2004, Jules Richardson wrote:
I would
encode binary data as hex to keep everything ascii. Data size would expand,
but the data would also be compressable so things could be kept in ZIP files of
whatever choice a person would want to for their archiving purposes.
"could be kept" in zip files, yes - but then that's no use in 50 years
time if someone stumbles across a compressed file and has no idea how to
decompress it in order to read it and see what it is :-)
Hence keeping the archive small would seem sensible so that it can be
left as-is without any compression. My wild guesstimate on archive size
would be to aim for 110 - 120% of the raw data size if possible...
I agree, but we'd definitely have to include compression features if we
are to meet this goal. Using a floppy disk as an example, a worst case
scenario is that the image would be maybe 205% the size of the original
media (200% is the fact that you are now using two bytes to store one, and
5% is all the markup tags).
Keeping the archive small should be a major goal, since that would
encourage people to keep the images stored uncompressed. Hard drives are
getting larger and all that, and my guess is at some point this issue will
be moot, but we can't know that for certain, so we should always assume a
worst case scenario (i.e. pessimism will be useful when designing this
specification :)
Certainly more
cpu cycles are needed for conversion and image file size is
larger, but we need a readable format
But the data describing all aspects of disk image would be readable by a
human; it's only the raw data itself that wouldn't be - for both
efficiency and for ease of use. The driving force for having
This is a point that needs to be highlighted. These images are meant to
be human readable, first and foremost. Machine readable is a secondary
concern. We know there will definitely be humans in the future (and if
not then who cares about this anyway). There will probably be machines.
Said machines may not be useful to the task of decoding these images, so
it must be designed with human readability in mind.
human-readable data in the archive is so that it can
be reconstructed at
a later date, possibly without any reference to any spec, is it not? If
Indeed.
it was guaranteed that a spec was *always* going to be
available, having
human-readable data at all wouldn't make much sense as it just
introduces bloat; a pure binary format would be better.
Correct. So even if the spec was lost, people (who could read English at
least) would be able to figure out how to reconstruct the image from the
archive.
I'm not quite sure what having binary data
represented as hex for the
original disk data gives you over having the raw binary data itself -
all it seems to do is make the resultant file bigger and add an extra
conversion step into the decode process.
But it also makes it human readable, and readable in any standard text
editor. Mixing binary data in with human readable data in a format that's
meant, first and foremost, to be human readable is antithetical to the
idea.
As for file size, if encoding as hex that at least
doubles the size of
your archive file compared to the original media (whatever it may be).
That's assuming no padding between hex characters. Seems like a big
waste to me :-(
Nope. Not a waste. Essential.
Yep, I'm with you there. CRC's are a nice
idea. Question: does it make
sense to make CRC info a compulsory section in the archive file? Does it
Yes. It's only one or two added bytes at the end of each data segment.
make sense to have it always present, given that
it's *likely* that
these archive files will only ever be transferred from place to place
using modern hardware? I'm not sure. If you're spitting data across a
buggy serial link, then the CRC info is nice to have - but maybe it
should be an optional inclusion rather than mandatory, so that in a lot
of cases archive size can be kept down? (and the assumption made that
there exists source code / spec for a utility to add CRC info to an
existing archive file if desired)
It doesn't hurt. It only adds negligible overhead. Certainly something
to discuss more.
I would make the specification unassuming about anything like this. For
example, say there is an optional CRC feature. I would make the default
for the image be that there was no CRC added to the data segments, unless
a meta tag was included in the header explicitly specifying that CRCs are
added. This makes it ever so slightly easier to decode the image data by
someone who knows nothing of the spec. No assumptions are made regarding
what people in the future will know about these images.
--
Sellam Ismail Vintage Computer Festival
------------------------------------------------------------------------------
International Man of Intrigue and Danger
http://www.vintage.org
[ Old computing resources for business || Buy/Sell/Trade Vintage Computers ]
[ and academia at
www.VintageTech.com || at
http://marketplace.vintage.org ]