On Tue, 2005-05-17 at 22:34 +0000, Jules Richardson wrote:
On Tue, 2005-05-17 at 17:06 -0500, Brian Wheeler
wrote:
As coincidence would have it, I work at Indiana
University's Digital
Library Program and there was a lecture on archiving audio which hits
many of the same issues that have come up here. The conclusions that
they came up with for the project included:
* There's no such thing as an eternal media: the data must be
transportable to the latest generation of storage
Yep, it's recognised the need to periodically refresh data onto whatever
the current favourite media type is. The nice thing about a structured
and essentially human-readable metadata format is that there's a good
chance that it can be transferred as-is to a new type of media without
any reprocessing.
* Metadata should be bundled with the content
Just to clarify; do you mean bundled alongside content or interspersed
with? (From the rest of your message I believe you mean the former,
which happens to be my view too...)
Bundled alongside so the 'raw' data (metadata or content) can be
manipulated with standard tools.
* Act like
you get one chance to read the media :(
Yep. Although sometimes multiple reads of media and a combination of the
resulting data can actually improve the ability to reconstruct it :)
True. During this lecture they were talking about recording the
stop/starts required to get the actual audio into the system.
Apparently there's some standard for doing that for audio. There were
several horror stories as well.
I think the
optimum format for doing this isn't a single file, but a
collection of files bundled into a single package. Someone mentioned
tar, I think, and zip would work just as well.
The only danger there is that the two become separated over time, but in
my mind it's an acceptable risk. It's sort of like a librarian losing a
few volumes from a set of encyclopedia I suppose - something that you'd
have to be really careless to do.
Yeah, but if the package is treated as a separate archival unit, then
the risk of separation should be fairly low. As to other's comments
about zip vs tar, I only suggested zip because it is more common today.
Just don't use ar or cpio! :)
I don't
think there's any real need to document the physical properties
of the media for EVERY disk archived -- there should probably be a
repository of 'standard' media types (1541's different-sectors-per-track
info, FM vs MFM per track information, etc) plus overrides in the media
metadata (uses fat-tracks, is 40 track vs 35, etc).
Now risk of seperation there might well be a problem if there's a single
copy of some metatdata for more than one disk image. I'd say that each
'bundle' forming a disk image (raw data + metadata) needs to totally
describe that disk...
Well, the on-disk-structure metadata is the only one that would benefit
from having a separate repository of definitions. I
don't see any
reason to not allow the data to be fully defined if the archivist
feels
the desire to do so, but a list of standard types (as well as a copy of
the full definitions stored somewhere) would take some of the tedium out
of it.
Brian