[cctalk] Re: Large language model (LLM) Web Scrapers

17 Sep 2025

A web crawler that does not obey robots.txt is not a law abiding outfit.  Best would be to
block it entirely.  If they are that dismissive of honesty, they are also unlikely to pay
attention to such matters as copyright and intellectual property ownership.
        paul
...
  On Sep 16, 2025, at 8:55 PM, Wayne S via cctalk
&lt;cctalk(a)classiccmp.org&gt; wrote:
 They do not observe robots .txt
 Sent from my iPhone
> On Sep 16, 2025, at 17:53, Wayne S &lt;wayne.sudol(a)hotmail.com&gt; wrote:
>
> I did notice the scraping.
> I toyed with the idea of putting ludicrous text files up that a normal user would not
see and see which bot got them.
>
> Sent from my iPhone
>
>> On Sep 16, 2025, at 17:02, Bill Degnan via cctalk &lt;cctalk(a)classiccmp.org&gt;
wrote:
>>
>> For those of you who run vintage computing-related info sites, have you
>> noticed all of the LLM scraper activity?    AI services are using the LLM
>> scrapers to populate their knowledge bases.
>>
>> At any given moment 5-10 of them are active on vintagecomputer.net.  It’s
>> funny, when I ask an AI about something vintage computing-related,
>> something obscure, I can trick into giving me an answer from my own site.
>>
>> I have actually had to modify the site code to manage the traffic, to
>> improve efficiency.
>>
>> But they’re not going after just my site, these scrapers are absorbing
>> copies of the entire WWW.
>>
>> I wonder how long the WWW will remain open, it would be a bummer if I found
>> copies of my site elsewhere.
>>
>> Bill

2026

2025

2024

2023

2022

[cctalk] Re: Large language model (LLM) Web Scrapers