On Wed, 11 Aug 2004, Jules Richardson wrote:
So, what difference does it make to a human analyst
whether the data is
stored as hex pairs or binary data? Both need decoding by some process
to make them usable. A human eye is no better off viewing a stream of
hex digits than they are a stream of arbitary ASCII data.
I disagree. I can see "1A". I might not be able to see a CTRL-Z.
Actually, binary data could possibly be more useful to
the human eye in
a browsing scenario, as at least the eye can quickly pick out meaningful
strings - such as filenames on the original media - from a sea of binary
data, without needing to do any decoding. At least viewing a file
containing binary data could give a clue as to what it contained if the
archive metadata (eg. description) wasn't up to much.
In that case, I would suggest someone develop a browser program to
interpret the archive data on the fly if they want to run a "strings" on
it.
As raised earlier though, I do wonder if it's an
idea to define several
possible encoding methods as part of the spec. Maximum flexibility
always seems the key to long-lived data formats, so it perhaps makes a
lot of sense to do so anyway. Who's to tell what use such archives might
be put to in the future - but if the spec covers a reasonable base for
now (with extensibility in mind such that others can be added if needs e
in future versions) then everyone's happy, and future generations can
always convert between formats as they see fit.
I would be perfectly fine with enabling the implantation of binary data
into the archive by having a tag to specify such. But I would strongly
discourage its use.
Oh, next random thought (which I expect someone's
already raised) - the
addressing format for the data on the original media needs to be
flexible enough to cope with different classes of data. Or rather, I'd
expect different addressing classes. For hard disk and floppy archives,
head/sector/track seem a logical addressing scheme. But for say a ROM
image, there's no concept of head/sector/track; maybe just an index to
the data and a length. Maybe someone will want to add scans of
documentation pages to an archive, in which case chapter / page
addressing is logical.
Right, and there will be appropriate tags for each type of medium.
I'd say that important field values like the
compression type should
always be human-readable rather than a numeric id, just rigidly defined
by the spec (e.g. 'none', 'base64' 'uuencode' etc.). That makes
life a
lot easier for someone potentially looking at this in the future if they
don't happen to have a copy of the spec handy!
Totally agreed. There should not be anything cryptic in the tags, and to
the extent possible, they should make sense to a smart human.
Hmm, I miss the old days of everyone chucking ideas
around like this :-)
Well, they're back! ;)
--
Sellam Ismail Vintage Computer Festival
------------------------------------------------------------------------------
International Man of Intrigue and Danger
http://www.vintage.org
[ Old computing resources for business || Buy/Sell/Trade Vintage Computers ]
[ and academia at
www.VintageTech.com || at
http://marketplace.vintage.org ]