>>>> "Patrick" == Patrick
Finnegan <pat(a)computer-refuge.org> writes:
Patrick> On Tuesday 29 June 2004 12:16, Paul Williams wrote:
> Al Kossow wrote: > As soon as bitsavers came on
line again, google
> crawlers started > downloading EVERYTHING from multiple IP adrs.
>
> Put this in your robots.txt:
>
> User-agent: Googlebot Disallow: /*.pdf$
Patrick> Grr. Don't do this. I really hate it when people disallow
Patrick> google to index content. It always makes it harder to find
Patrick> stuff. The only time I'd consider doing it is if the
Patrick> "webserver" is on a dialup connection or something that
Patrick> won't stay at the same IP address.
There's a second reason, which applies here -- there IS no content
that Google can index, because those files are all bitmap -- no text.
If there were text files or OCRed PDF files, that would be different,
but as it is, Google will find absolutely nothing. So Al might as
well tell it not to try.
paul