Dave Dunfield wrote:
OK, a more considered reply now I've got a few minutes...
= How to go about building it
A tricky question - Who's going to do all this work? The good
news is that it can be shared, but it will require a bit of a
committment. Not everyone is going to want to put up the
investment to acquire the means to process their disk collection.
What I am thinking of is several key volenteers positioned so as
to cover the major geographic areas who would be willing to set
up the necessary software and equipment to handle as many
different media types as possible, and provide a service to
nearby collectors to turn media into images, and to turn images
into media. Working with others in the project, the images
could be transmitted and shared so that any particuylar system
disk can be accessed from anywhere.
That much makes sense to me. I've been steadily collecting various relevant
odds and ends for the past couple of years with a view to doing just that -
various drives, controllers, tape units etc.
In order to spread this around as much as possible, we could probably do with
some "style guide" documents produced by some of the list gurus as to what
hardware works with what etc. (e.g. summarising things like the responses to
my recent question about hooking an 8" floppy drive up to a PC). That will
hopefully encourage more people to participate.
(There's certainly been a lot of knowledge shared on this list in the past
about recovering old tapes too - it'd be nice to see that written up somewhere
"central")
Hmmm, it's not really a FAQ, but there's perhaps scope for something like a
"classiccmp.org knowledge base" containing info like this?
= How to make the archive available
Another tricky question, which has two major components, legality
and accessability.
I still think the key here is distribution. Allow people who have specialist
knowledge in a particular area to run their own little corner of the web as
they see fit - but they also "publish" what they've made available through
some defined mechanism. In addition, they present some unified way of
searching across *all* participating sites for content.
To extend that concept a little further (thinking on my feet here), how's
about we say that each "published" bit of data can be marked with different
levels:
"Offline" (say pending copyright issue resolution!)
Free for download (i.e. available to all users)
Available for mirror (to specific peers)
Available for mirror (to anyone wishing to make a mirror of the item)
Obvious benefits:
1) The "specialists" in any given area retain control over what they have
available, the format that it's in, and the "look and feel" of whatever
frills
surround the actual data.
2) There's no unwieldy central repository, with corresponding high cost of
maintenance.
3) Mirroring is more controllable.
4) Searching (from a user POV) can be flexible and tailored to certain
pre-defined "classes" of content.
5) Copyright is at least a little less of an issue; if someone publishes
something that violates copyright, it's far more likely to come down on their
own head than jeaporising the distributed archive as a whole.
6) Dictating a "common archival format" that everyone agrees on probably
isn't possible, for various reasons. Look at what happened with Sellam's
efforts. However, dictating a common set of different content types (disk
image, documentation scan etc.), and the metadata fields that they can be
searched for with, is a heck of a lot easier!
7) Admin is a lot easier, as every participating site owner is an admin of
the stuff that *they* make available, rather than having one person (or a
team) trying to look after everything - Al can probably comment on what a
headache this is!
Downsides:
1) Probably needs a web interface at least for the searching (but that's
not to stop someone making the actual data available over FTP or whatever, and
the search interface code could be produced in a variety of languages - PHP,
ASP etc. - to provide flexibility)
2) The actual database of "what's where" probably still needs to be
central, because distributing that across multiple sites (ala DNS) is likely
impractical, plus the majority of websites probably do have some server-side
scripting ability, but not necessarily any kind of coherent database support)
cheers
Jules