On Tue, 2005-05-17 at 17:06 -0500, Brian Wheeler wrote:
As coincidence would have it, I work at Indiana
University's Digital
Library Program and there was a lecture on archiving audio which hits
many of the same issues that have come up here. The conclusions that
they came up with for the project included:
* There's no such thing as an eternal media: the data must be
transportable to the latest generation of storage
Yep, it's recognised the need to periodically refresh data onto whatever
the current favourite media type is. The nice thing about a structured
and essentially human-readable metadata format is that there's a good
chance that it can be transferred as-is to a new type of media without
any reprocessing.
* Metadata should be bundled with the content
Just to clarify; do you mean bundled alongside content or interspersed
with? (From the rest of your message I believe you mean the former,
which happens to be my view too...)
* Act like you get one chance to read the media :(
Yep. Although sometimes multiple reads of media and a combination of the
resulting data can actually improve the ability to reconstruct it :)
I think the optimum format for doing this isn't a
single file, but a
collection of files bundled into a single package. Someone mentioned
tar, I think, and zip would work just as well.
The only danger there is that the two become separated over time, but in
my mind it's an acceptable risk. It's sort of like a librarian losing a
few volumes from a set of encyclopedia I suppose - something that you'd
have to be really careless to do.
I don't think there's any real need to
document the physical properties
of the media for EVERY disk archived -- there should probably be a
repository of 'standard' media types (1541's different-sectors-per-track
info, FM vs MFM per track information, etc) plus overrides in the media
metadata (uses fat-tracks, is 40 track vs 35, etc).
Now risk of seperation there might well be a problem if there's a single
copy of some metatdata for more than one disk image. I'd say that each
'bundle' forming a disk image (raw data + metadata) needs to totally
describe that disk...
cheers
Jules