Internet Archive and robots.txt
a.carlini at ntlworld.com
Fri Jul 3 07:27:15 CDT 2020
When I try
in archive.org I get the usual:
This URL has been excluded from the Wayback Machine."
That's supposed to be because robots.txt prevents spidering so the
Internet Archive takes down the pages (even if they were previously
available, it seems).
But digital.com is back and if you go far enough down
https://digital.com/about/ you'll see that they know where the domain
So if whoever now controls digital.com could be persuaded to ask, would
the Internet Archive allow those digital.com pages back out into the
(I'm asking here because I think there's at least one person on this
list who might be able to provide a reasonably authoritative answer).
I did happen to notice that dec.com is back too ...
antonio at acarlini.com
More information about the cctech