I've always thought that robots.txt would the be interesting stuff that
should be archived, perhaps it could be behind a paywall. There's no law
against archiving it other then subnets being blocked, which is easily
bypassed as matt cutts wrote a blog post on silently spidering content.
Also you can use the cloud proxies which call themselves sdwan. Yacy and
other p2p web crawlers are another way to go.
On Sat, May 22, 2021, 11:28 PM Chuck Guzis via cctalk <cctalk at classiccmp.org>
wrote:
On 5/22/21 7:41 PM, Adrian Stoness via cctalk wrote:
link rot is weird in what disapears vs still
works
On Sat, May 22, 2021 at 6:45 PM Ali via cctalk <cctalk at classiccmp.org>
wrote:
> Interesting article on Link Rot and its prevalence. According to the
> article even sources being referenced as early as 2018 have about a 60%
> Rot.I think all of us in this hobby can relate nor only to loss of
articles
> but from sites, drivers, file repositories,
etc....
>
https://www.theverge.com/2021/5/21/22447690/link-rot-research-new-york-time…
I've said it before--putting information on the web is like writing in
sand. Thank heavens for the Wayback machine (which is why I support
Brewster's efforts).
However, it's far from perfect--in particular ftp content has apparently
never been archived and many vendor's support pages have had robots.txt
files preventing them from being archived.
Still, it's better than nothing and I appreciate it. Were it more
complete, I might not have to spend so much time reverse-engineering
software.
Try searching for some of the older, say, HP support pages. I'm pretty
sure that some "executive' made the decision to pull all of the support
material for old systems, as that doesn't contribute to the bottom line.
The New HP Way.
A nasty trend is adware sites are simply quoting text from a large
number of now defunct pages; go to the link and you get the
"CONGRATULATIONS! YOU ARE THE ONE BILLIONTH VISITOR!" page. Run, do not
walk away.
A more disturbing popular trend is information being placed in long-ish
Youtube videos that could have been summarized concisely in a page of text.
--Chuck