Internet Archive and robots.txt

Antonio Carlini a.carlini at ntlworld.com
Fri Jul 3 07:27:15 CDT 2020


When I try 
http://www.openvms.digital.com/openvms/os/openvms-release-history.html 
in archive.org I get the usual:


"Sorry.
This URL has been excluded from the Wayback Machine."


That's supposed to be because robots.txt prevents spidering so the 
Internet Archive takes down the pages (even if they were previously 
available, it seems).

But digital.com is back and if you go far enough down 
https://digital.com/about/ you'll see that they know where the domain 
came from.

So if whoever now controls digital.com could be persuaded to ask, would 
the Internet Archive allow those digital.com pages back out into the 
open again?

(I'm asking here because I think there's at least one person on this 
list who might be able to provide a reasonably authoritative answer).

I did happen to notice that dec.com is back too ...


Antonio


-- 
Antonio Carlini
antonio at acarlini.com



More information about the cctech mailing list