yahoo crawlers hammering us
leslie at craigslist.org
Tue Sep 7 20:42:42 UTC 2010
That speed doesn't seem too bad to me - robots.txt is our friend when
one had bandwidth limitations.
On 9/7/10 1:19 PM, Ken Chase wrote:
> So i guess im new at internets as my colleagues told me because I havent gone
> around to 30-40 systems I control (minus customer self-managed gear) and
> installed a restrictive robots.txt everywhere to make the web less useful to
> Does that really mean that a big outfit like yahoo should be expected to
> download stuff at high speed off my customers servers? For varying values of
> 'high speed', ~500K/s (4Mbps+) for a 3 gig file is kinda... a bit harsh.
> Especially for an exe a user left exposed in a webdir, thats possibly (C)
> software and shouldnt have been there (now removed by customer, some kinda OS boot
> cd/toolset thingy).
> This makes it look like Yahoo is actually trafficking in pirated software, but
> that's kinda too funny to expect to be true, unless some yahoo tech decided to
> use that IP/server @yahoo for his nefarious activity, but there are better sites
> than my customer's box to get his 'juarez'.
> At any rate:
>> From Address To Address Proto Bytes CPS
> 67.196.xx.xx..80 188.8.131.52..44507 tcp 14872000 523000
> $ host 184.108.40.206 220.127.116.11
> 18.104.22.168.in-addr.arpa domain name pointer b3091122.crawl.yahoo.net.
> CIDR: 22.214.171.124/16
> NetName: A-YAHOO-US8
> so that's yahoo, or really well spoofed.
> Is this expected/my own fault or what?
> A number of years ago, there were 1000s of videos on a customer site (training
> for elderly care, extremely exciting stuff for someone into -1-day movies to
> post on torrent sites). Customer called me to say his bw was gone, and I
> checked and found 12 yahoo crawlers hitting the site at 300K/s each (~30Mbps
> +) downloading all the videos. This was all the more injurious as it was only
> 2004 and bandwidth was more than $1/mbps back then. I did the really crass
> thing and nullrouted the whole /20 or whatever they were on per ARIN. It was
> the new-at-the-time video.yahoo.com search engine coming to index the whole
> site. I suppose they cant be too slow about it, or they'll never index a whole
> webfull of videos this century, but still, 12x 300K/s in 2004? (At the time
> Rasmus though it was kinda funny. I do too, now.)
More information about the NANOG