New subject: Don Maslin/Archiving system software (was: ftparchives disappearing?)

19 Mar 2007

...
 From: Jules Richardson <julesrichardsonuk at
yahoo.co.uk>
Dave Dunfield wrote:

OK, a more considered reply now I've got a few minutes...
 = How to go about building it
A tricky question - Who's going to do all this work? The good
news is that it can be shared, but it will require a bit of a
committment. Not everyone is going to want to put up the
investment to acquire the means to process their disk collection.
What I am thinking of is several key volenteers positioned so as
to cover the major geographic areas who would be willing to set
up the necessary software and equipment to handle as many
different media types as possible, and provide a service to
nearby collectors to turn media into images, and to turn images
into media. Working with others in the project, the images
could be transmitted and shared so that any particuylar system
disk can be accessed from anywhere. 
That much makes sense to me. I've been steadily collecting various relevant
odds and ends for the past couple of years with a view to doing just that -
various drives, controllers, tape units etc.
In order to spread this around as much as possible, we could probably do
with some "style guide" documents produced by some of the list gurus as to
what hardware works with what etc. (e.g. summarising things like the
responses to my recent question about hooking an 8" floppy drive up to a
PC). That will hopefully encourage more people to participate.
(There's certainly been a lot of knowledge shared on this list in the past
about recovering old tapes too - it'd be nice to see that written up
somewhere "central")
Hmmm, it's not really a FAQ, but there's perhaps scope for something like a
"classiccmp.org knowledge base" containing info like this?
 = How to make the archive available
Another tricky question, which has two major components, legality
and accessability. 
I still think the key here is distribution. Allow people who have
specialist knowledge in a particular area to run their own little corner of
the web as they see fit - but they also "publish" what they've made
available through some defined mechanism. In addition, they present some
unified way of searching across *all* participating sites for content.
To extend that concept a little further (thinking on my feet here), how's
about we say that each "published" bit of data can be marked with different
levels:
   "Offline" (say pending copyright issue resolution!)
   Free for download (i.e. available to all users)
   Available for mirror (to specific peers)
   Available for mirror (to anyone wishing to make a mirror of the item)
Obvious benefits:
   1) The "specialists" in any given area retain control over what they
have available, the format that it's in, and the "look and feel" of
whatever frills surround the actual data.
   2) There's no unwieldy central repository, with corresponding high cost
of maintenance.
   3) Mirroring is more controllable.
   4) Searching (from a user POV) can be flexible and tailored to certain
pre-defined "classes" of content.
   5) Copyright is at least a little less of an issue; if someone publishes
something that violates copyright, it's far more likely to come down on
their own head than jeaporising the distributed archive as a whole.
   6) Dictating a "common archival format" that everyone agrees on probably
isn't possible, for various reasons. Look at what happened with Sellam's
efforts. However, dictating a common set of different content types (disk
image, documentation scan etc.), and the metadata fields that they can be
searched for with, is a heck of a lot easier!
   7) Admin is a lot easier, as every participating site owner is an admin
of the stuff that *they* make available, rather than having one person (or
a team) trying to look after everything - Al can probably comment on what a
headache this is!
Downsides:
   1) Probably needs a web interface at least for the searching (but that's
not to stop someone making the actual data available over FTP or whatever,
and the search interface code could be produced in a variety of languages -
PHP, ASP etc. - to provide flexibility)
   2) The actual database of "what's where" probably still needs to be
central, because distributing that across multiple sites (ala DNS) is
likely impractical, plus the majority of websites probably do have some
server-side scripting ability, but not necessarily any kind of coherent
database support) 
Hi
I thought I'd add a little here. One thing that is missing. We'd need
some way to keep it running beyond when we are gone. There should
be a trust setup. It should have at least two or three people maintaining
is and willing to deal with the tax issues.
I tend to agree that it shouldn't force format. Dave's open format is
vary well thought out and his tool is good but there is legacy data
that would need to be converted. As an example, the disk image
data I have for the Polymorphic 8813 is in the format used by the
Polymorphic software to move disk images through modems to other
Polymorphic machines. It does currently lack a method to restore it
to a machine that has no boot floppy ( over time I hope to fix that ).
Other images are not even as floppy images. As an example, all the
data that I have for my Nicolet floppy disk system is images on
paper tape data ( in files on my PC ).
  And of course, I also have cassette data images.
Some people like to store cassette images as audio files. I like to store
the actual data. I have done this for my Poly88 stuff and have a bootstrap
method to get things started without a first tape.
I also have images in a format that I'm just too lazy to convert. As an
example,
the images I made for my H89. These also have some legacy issues, as
my tool was used to create the images for the hard sectored images
on SEBHC. I've included a method to bootstrap without having a first
disk.
Of course, there should always be some form of readme with each
group of files explaining how to use the files. In the past, this has been
one of the hardest things to figure out on random archives. Many times,
I've seen images with no information on what to do with them. This
has happened because the directory was copied from someplace without
the support. Each directory should have at least a simple readme describing
how the images were made and what tool should be used to recover
them. Just a readme at the top directory is not enough. The top directory
may get separated. A readme is minimum.
These readme's may need updating as time goes by. Things like the fact
that some tool needs to run under a particular OS and not some future OS
that
may be more common, needs to be passed on. It is hard to tell what bit
of information that is assumed today may be key for extracting useful
data in the future.
It must be a living archive.
Just my thoughts
Dwight
_________________________________________________________________
Watch free concerts with Pink, Rod Stewart, Oasis and more. Visit MSN
Presents today.
http://music.msn.com/presents?icid=ncmsnpresentstagline&ocid=T002MSN03A…