Anyone have contacts at the Amazon or OpenAI web spiders?

Patrick Clochesy patrick at mach.net
Wed Feb 14 03:38:43 UTC 2024


Both robots respect robots.txt, of course they’re not going to answer.

On Feb 13, 2024, at 8:35 PM, John Levine <johnl at iecc.com> wrote:
> 
> One day I set up the world's lamest content farm. You can see it here:
> 
> https://www.web.sp.am/
> 
> While humans tend not to find its six billion pages very interesting,
> some web spiders are entranced. In the past week or so, Amazon's
> amazonbot has visited it 6 million times, and OpenAI's gptbot 2.6
> million. (If you were wondering what they use to train ChatGPT, now
> you know.) I don't care that googlebot comes by every 5 or 10 minutes,
> but gptbot is every few seconds and amazon as fast as the server will
> respond.
> 
> They both come from predictable IPs so I can set packet filters but
> they're still hammering pretty hard. Each has a URL in the user agent
> string, Amazon's page has an address to write to but OpenAI's doesn't.
> I wrote to the Amazon address, no response.
> 
> If anyone has contacts at either I would appreciate it. A few years
> ago the bingbot got trapped but fortunately I knew someone at
> Microsoft who could pass the word. He reported back that while he
> could not go into detail, there was a great deal of animated
> conversation at the other end of the hall, and shortly after that it
> stopped.
> 
> R's,
> John


More information about the NANOG mailing list