Internet Archive and robots.txt
Antonio Carlini
a.carlini at ntlworld.com
Fri Jul 3 07:27:15 CDT 2020
When I try
http://www.openvms.digital.com/openvms/os/openvms-release-history.html
in archive.org I get the usual:
"Sorry.
This URL has been excluded from the Wayback Machine."
That's supposed to be because robots.txt prevents spidering so the
Internet Archive takes down the pages (even if they were previously
available, it seems).
But digital.com is back and if you go far enough down
https://digital.com/about/ you'll see that they know where the domain
came from.
So if whoever now controls digital.com could be persuaded to ask, would
the Internet Archive allow those digital.com pages back out into the
open again?
(I'm asking here because I think there's at least one person on this
list who might be able to provide a reasonably authoritative answer).
I did happen to notice that dec.com is back too ...
Antonio
--
Antonio Carlini
antonio at acarlini.com
More information about the cctalk
mailing list