On Mar 17, 2007, at 8:32 PM, Dave Dunfield wrote:
[...]
= How to store the archive
I am a strong believer in preservation of the physical media as
historic artifacts, however I believe it is also vital to preserve
the data separately in modern formats, for several reasons:
- It allows easy replication and mirroring in multiple locations.
This will help insure that the material is not lost in the
future through any single failure point (fire, flood, death,
loss-of-interest - all of these things and more can wipe out
a single physical archive).
- It removes dependance on specific (and usually obsolete)
physical media. No need to put wear and tear on the original
artifacts, and allows for contingencies in the event that the
original devices become inoperative.
- Allows for easy sharing and movement of the data.
- Allows everything to be tracked in a central repository
(appropriately mirrored of course).
- Allows anyone who wants to set up the required equipment
to have complete access to the repository content.
[...]
Digital preservation is actually my day job. I work for the LOCKSS
project at Stanford University (
http://www.lockss.org/). LOCKSS is a
distributed peer-to-peer preservation system for electronic journals
used by libraries around the world. Basically, each library runs one
or more boxes that collect e-journals, and all the boxes participate
in audits with each other to make sure that content is not lost or
damaged. They establish and maintain an "Authoritative Copy" of the
content that all the boxes keep locally. If the original e-journal's
publisher has disappeared, the boxes will repair missing content from
each other. Best of all, it's entirely open source using the BSD
license and some LGPL components.
I've thought about using LOCKSS for software and documentation
preservation, but unfortunately it's really not ideal for the job.
As I mentioned, it was designed with the fairly narrow goal of
providing libraries and institutions an easy way to preserve
electronic journals, so it would require a lot of hacking to make it
useful for something like a software archive. Still, I think the
general principal should apply here. LOCKSS stands for "Lots Of
Copies Keep Stuff Safe", and that should certainly be the goal of any
digital archiving system.
Anyway, I just wanted to throw that out there as part of the
discussion. I'd be glad to help out on this: I have a server, disk
space, and bandwidth, and I'd like to see it used for this kind of
thing.
At a minimum, I should probably start mirroring
http://bitsavers.org/ !
-Seth