Archiving information, was Re: ADM-3A question

16 Aug 2019

One of the problems with archiving is what to do with items that are not popular. Some
things might be more valued ten or twenty years in the future but not now. Is the fact
that the item has relatively low interest now a possible reason to not archive it in a
searchable form for future reference?
What about things that are scattered on other personal sites currently that may be gone
next week? So much information is already lost.
Who determines what should be saved? What say you come across a rare document but the copy
was poorly done at a lower than desired resolution. Do you refuse to post it because it
doesn't meet your standards or do you post it with a note that it is the best to date?
Judging such things can be arbitrary and be the reason for lost information.
At least when you publish a book, there is a chance that some copy may be saved. Now with
information sitting on someones disk drives, it could be deletes with one mistake.
This is a really complicated issue. I'm getting older and know I'm on the tail end
of my life. Still, I have no way to begin to pass on what I have. I doubt my heirs would
care much unless it had significant monetary value.
Dwight

________________________________
From: cctalk <cctalk-bounces at classiccmp.org> on behalf of Seth J. Morabito via
cctalk <cctalk at classiccmp.org>
Sent: Friday, August 16, 2019 8:31 AM
To: General Discussion: On-Topic and Off-Topic Posts <cctalk at classiccmp.org>
Subject: Re: Archiving information, was Re: ADM-3A question

Paul Koning via cctalk writes:

...
  Anything worth having around deserves backup.  Which
makes me wonder
 -- how is Wikipedia backed up?  I guess it has a fork, which isn't
 quite the same thing.  I know Bitsavers is replicated in a number of
 places.  And one argument in favor of GIT is that every workspace is a
 full backup of the original, history and all.

 One should worry for smaller scale efforts, though. 
This is a problem I think about a lot.

In the early 2000s I worked on the LOCKSS program at Stanford
University. LOCKSS stands for "Lots Of Copies Keep Stuff Safe", and is a
distributed network of servers that replicate backup copies of
electronic academic journals. It stemmed from a research project that
looked at how to design an attack resistent peer-to-peer digital
archival network.  Each node in the network keeps a copy of the original
journal content, does a cryptographic hash of each resource (HTML page,
image, PDF, etc.), and participates in a steady stream of polls with all
the other nodes where they vote on the hashes. If a minority of nodes
loses a poll, their content is assumed to be damaged, missing, or bad,
and they replicate the content from the winners of the poll.

It's designed as a "Dark" archive, meaning the data is there, but nobody
tries to access it unless the original web content disappears. Then, the
servers act as transparent web proxies, so when you hit the original URL
or URI, they serve up the content that's now missing from the real
public Internet.

It's a neat idea. It's also open source, and unencumbered with
patents. I've always thought a similar model could be used to archive
and replicate just about anything, but it's just one of those things
that nobody's ever gotten around to doing.

...
         paul 
-Seth

--
  Seth Morabito
  Poulsbo, WA, USA
  web at loomcom.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Archiving information, was Re: ADM-3A question