Don Maslin/Archiving system software (was: ftp archives disappearing?)

19 Mar 2007

dave06a at dunfield.com wrote:
...
   In order to
spread this around as much as possible, we could probably do with
 some "style guide" documents produced by some of the list gurus as to what
 hardware works with what etc. (e.g. summarising things like the responses to
 my recent question about hooking an 8" floppy drive up to a PC). That will
 hopefully encourage more people to participate. 
 I agree - however, although platform specifc hardware will be a necessity for
 some data types, I would also like to encourage platforn independance where
 possible - it doesn't work too well if you are the key guy in your area, and you
 can't make a disk someone needs - ImageDisk goes part way to solving this,
 but there's still lots of things it can't do - we might want to include the
Catweasel
 in our list of standard tools (or a similar device we might design ourselves
 in our list of standard tools). 
I don't think anyone nominated as a local area's "data handling
specialist"
can hope to cover everything though - it's not only a question of hardware
availability, but also the sheer size of the hardware needed to handle some
media types.
So long as there's someone on the same continent who can handle xyz media, and
xyz media itself isn't the size of a bus, we're probably OK :-)
...
   Hmmm, it's
not really a FAQ, but there's perhaps scope for something like a
 "classiccmp.org knowledge base" containing info like this? 
 wiki? 
I wondered that, but wikis are often unstructured beasts it seems - and from a
user POV gleaning information can be a little time-consuming. It could
probably work though, providing a broad range of titles and document scopes
were worked out in advance.
...
  Agreed to some extent, however I want to see the
content distributed - perhaps
 everyone directly involved in the project should maintain a private mirror of the
 other parts. 
Maybe - although if any given original file in the central database has some
sort of unique ID number, then there's probably a mechanism that can be put in
place when another archive maintainer mirrors content such that the database
"knows" what content isn't being mirrored by anyone. From there humans can
see
what content needs at least one mirror and act in whatever way is appropriate.
That way the mirroring aspect is probably taken care of, but not at the
expense of participating site owner's disk space (because they don't *have* to
mirror anything if they don't have the space to do so).
We might want to set ourselves some sort of target level - like any given
content should ideally be mirrored two times before it's considered
"reasonably safe" or whatever.
...
  Theres also the issue that someone who is great at
collecting and archiving
 material doesn have the ability to host a site (dial-up etc.) - So I can see that
 one site might host multiple peoples contributions. But these are things that
 would be worked out by the individuals involved. 
Yep. e.g bitsavers would be one participating site even though the data might
come from all over. I think it all still hangs together in such a scenario.
...
      2)
There's no unwieldy central repository, with corresponding high cost of
 maintenance. 
 By "central repository", I don't (necessarily) mean a huge website and/or
FTP
 with everything contained on it 
aha ok, misunderstood there! I thought you were talking about one primary site
with a few large mirrors, rather than smaller participating sites each doing
what they could.
...
  - but I would like to see a central resource where
 people can begin a search for a specific need. 
That's perhaps where we differ; I see the data as to "who has what" stored
centrally, but the searching isn't necessarily done on that site but could be
via a search interface at each participating site (which behind the scenes
will go and query the central repository).
My thought for doing that is that it helps keep the smaller participating
sites alive in the same way that people put links and banners on their
websites already. It's purely a marketing thing. (The alternative is that
participating sites are just a file dump area for content - which carries much
more of risk that the site owners will suddenly ask the question of why
they're bothering and pull the plug)
...
  It would be advantagous to agree on formats where it
is appropriate, but
 only when so, and It may make sense to use multiple formats in some
 cases - for example, having Catweasel images of disks that can only be
 made on a specific platform might avoid the chicken-and-egg thing where
 you want to help someone, but you don't have the system, and he doesn't
 have a bootable disk to launch the resident client. 
Agreed.
...
      1) Probably
needs a web interface at least for the searching (but that's
 not to stop someone making the actual data available over FTP or whatever, and
 the search interface code could be produced in a variety of languages - PHP,
 ASP etc. - to provide flexibility) 
 The only real complaint I've heard about a web interface is that it loses the
 file dates - IMHO an os file date is a fragile information vessel at best - much
 better to record relavent dates in the metadata associated with the archive. 
For sure. But besides that, I see the web aspect purely as the most convenient
method to allow users to do the searching. There's nothing to stop a search
result pointing to a FTP site, physical mailing address to which to send your
pre-paid envelope in order to receive a CD of data back etc. :)
In other words, the actual data retrieval side doesn't *have* to be web-based
at all (although realistically in most cases it probably would be). As you
say, recording the date in the metadata is the way to go anyway!

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Don Maslin/Archiving system software (was: ftp archives disappearing?)