On Thu, 19 May 2005, Jules Richardson wrote:
On Thu, 2005-05-19 at 10:09 -0700, Dwight K. Elvey
wrote:
I'm debating even keeping things in ASCII
for long term. Binary
is close to the original but lacks the ability to add format type
information. I still like to keep it human readable in something
like ASCII. ASCII has a relatively long history in the computer
industry.
Well something like XML supports different character encodings and so
should take care of that aspect.
I still don't like munging native binary data into some other format
though for the sake of preservation; I'd rather treat that as binary and
provide metadata / indexing alongside it, or in a seperate section or
whatever.
Here's a probably lame idea: since Unicode characters are basically
16-bit, and since a hex-as-ascii-encoded 8-bit byte is 16-bits, would it
be reasonable to store the actual binary data as the low order byte in a
Unicode character and have the upper byte be something like 0xFF?
That being said, I still prefer the simplicity of straight-up ASCII.
In any case,
these are all academic in comparison to the problems
of indexing. I don't even have the beginings of how to deal
with that problem.
Well, I don't think anyone's really thought about it yet - but including
as much info as possible in the metadata for a particular archive at
least gives some confidence that it *can* be indexed...
With adequate metadata, a search engine like Google is all the indexing
you really need as it's simple, effective, and takes place without you
knowing or needing to know about it.
--
Sellam Ismail Vintage Computer Festival
------------------------------------------------------------------------------
International Man of Intrigue and Danger
http://www.vintage.org
[ Old computing resources for business || Buy/Sell/Trade Vintage Computers ]
[ and academia at
www.VintageTech.com || at
http://marketplace.vintage.org ]