broke Inktomi floods?

Suresh Ramasubramanian ops.lists at gmail.com
Thu Jan 20 12:43:23 UTC 2005


On Thu, 20 Jan 2005 14:30:04 +0200, Gadi Evron <gadi at tehila.gov.il> wrote:
> 
> Inktomi (now Yahoo!) sends it's spiders all over the Internet. Lately
> some of our systems are reporting that they open many HTTP connections
> to our web sites, without ever sending any data and immediately
> disconnecting. This is getting to a level where it disturbs us.
> 

I have heard previous stories of inktomi ignoring robots.txt (not seen
this for myself though).  And there are threads like this -

Quoting from http://www.webmasterworld.com/forum11/1968-1-15.htm

> I've got Scooter allowed in, but I've also got it lumped int with a
> number of agents that are not allowed to get non-HTML files. This is
> especially important at my site as it includes a number of very large
> binary datasets in numerous locations and the robots have proven too
> stupid to understand that downloading them is a waste of bandwidth.
> 
> RewriteCond %{HTTP_USER_AGENT} .*Ask.Jeeves.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*FAST.WebCrawl.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*ia_archiver.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*InfoSeek.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*inktomi.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*Scooter.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*Slurp.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*Teoma.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*VoilaBot.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*Google.*
> RewriteRule!.*(html¦htm¦txt¦/)$ /www/msgs/badagent.html [F]



More information about the NANOG mailing list