dwight elvey wrote:
I thought I'd add a little here. One thing that is
missing. We'd need
some way to keep it running beyond when we are gone. There should
be a trust setup.
Hmmm, I wonder if it can be engineered such that mirroring the information
will do that job? (i.e. sites are free to come and go - providing there's a
mirror active *somewhere* then it might not be a problem). Although yes,
worrying about what happens over time seems sensible - although maybe that's a
task for site owners to sort out individually, rather than there being some
"central policy"?
It should have at least two or three people
maintaining
is and willing to deal with the tax issues.
Tax issues are probably largely a US issue :-) In other words outside the
scope of any archive/collection...
I tend to agree that it shouldn't force format.
Dave's open format is
vary well thought out and his tool is good but there is legacy data
that would need to be converted.
I'd disagree on the *need* to convert. There may be cases where migration's
desirable as/when new formats are available / become popular - but it
shouldn't be a given. If the format does the job and the tools are still
widely available [1] then there's little reason to convert formats.
[1] Actually "archival tool" is probably a class of data that should be made
available / distributed / mirrored in its own right. After all, the people
"owning" the data at participating sites are most likely to be the ones who
recognise the need to make the tools available too.
As an example, the disk image
data I have for the Polymorphic 8813 is in the format used by the
Polymorphic software to move disk images through modems to other
Polymorphic machines. It does currently lack a method to restore it
to a machine that has no boot floppy ( over time I hope to fix that ).
See, that probably is a good candidate for archive format migration - *but*
it's still very useful to have the data out there and available now, even if
it's in a slightly sub-optimal format. Something's better than nothing...
Of course, there should always be some form of readme
with each
group of files explaining how to use the files. In the past, this has been
one of the hardest things to figure out on random archives.
I hear ya. I'm not quite sure how to tackle that one given a distributed
archive approach. Maybe a user should be expected to be able to type in
something like "reading foobar archives" into the search facility and to get a
document describing the format back.
But I guess for any class of data like "floppy disk image" or "magtape
image"
part of the metadata associated with that data (entered when the data is
"published") should be the name / version of the tool used to create that
image. That way at least every file gets tagged, with the assumption that the
documentation for that format is also reachable via the search engine.
These readme's may need updating as time goes by.
Things like the fact
that some tool needs to run under a particular OS and not some future OS
that may be more common, needs to be passed on.
Is that not a problem for whoever's maintaining the tools? They're the ones
who dictate what platforms their tools will run on - it's not necessarily the
job of the archive maintainers to know this; they simply record what tool (and
hence what file format) was used to create their images. The burden's on the
individual users to search for something that will understand that format, and
then filter out results to get something that will work with their particular
hardware combination.
e.g. I might have a site with a bunch of Superbrain floppies on, all in
Imagedisk format. It's my job as maintainer of that archive (and participating
in the "global" distributed archive) to say what tool I used to create those
images when I publish them - but it's Dave's job as Imagedisk maintainer to
dictate what other platforms Imagedisk might run on now or in the future.
Sure, I might be helpful and (with Dave's permission) put a few different
copies of Imagedisk on the site too and publish those so people can find them
- but I don't need to do any maintenance myself every time a new version of
Imagedisk comes out.
It is hard to tell what bit
of information that is assumed today may be key for extracting useful
data in the future.
Indeed - and it's easy to say "record as much as possible". Problem then is
that it discourages archive owners from publishing content simply because it's
time-consuming to enter the metadata for the items that they're making
available. Getting the balance right is probably going to be tricky.
cheers
Jules