From: Colin Eby
Sent: Friday, September 02, 2011 2:02 AM
My thought was to avoid the problem by using a
non-format. Stay with me
on this... basically persist the data in as raw a format as possible,
with an externalised, self defining descriptor, all wrapped up in an open
archive format to form a single file. The concept being that the long
term storage format is just a not the viewer format. You maintain a
converter from the storage format to a current viewer format, but you
don't actually store the data in the current viewer format. By current
viewer, I mean PhotoShop, Acrobat, etc. Here's the basic file layout:
xxxxx.zip :
raw.bin (a simple sequential byte copy)
descriptor.xml (instructions on how to carve it into sectors, raster
lines etc.)
http://www.boogles.com/local/papers/tcfs-thesis/thesis.html
This is Brian Zuzga's 1995 undergraduate thesis at MIT on a project to
archive the backups done at the MIT AI Lab using what they named "the
Time Capsule File System". Nihil novi sub sole (Ecc. 1:9-10).
Where this falls down is say for instance, an MFM
diskette scanned
with a sampling board like a CatWesel.
[snip]
I like deterministic outcomes, especially in archival
work.
You could force an 8 bit boundary on the resulting
data, but things
like sector headers are sometimes deliberately encoded in
fluctuation sequences that don't conform to rest the data encoding.
That's hardly deterministic, and would certainly not work on, for
example, a disk written by a PDP-10 (36 bit words represented as pairs
of 18 bits + parity), to take a popular example. There *are* no
deterministic outcomes, especially in archival work. There is only
interpretation.
Rich Alderson
Vintage Computing Sr. Server Engineer
Vulcan, Inc.
505 5th Avenue S, Suite 900
Seattle, WA 98104
mailto:RichA at
vulcan.com
mailto:RichA at
LivingComputerMuseum.org
http://www.LivingComputerMuseum.org/