yahoo crawlers hammering us

Leslie leslie at craigslist.org
Tue Sep 7 20:42:42 UTC 2010


That speed doesn't seem too bad to me - robots.txt is our friend when 
one had bandwidth limitations.

Leslie



On 9/7/10 1:19 PM, Ken Chase wrote:
> So i guess im new at internets as my colleagues told me because I havent gone
> around to 30-40 systems I control (minus customer self-managed gear) and
> installed a restrictive robots.txt everywhere to make the web less useful to
> everyone.
>
> Does that really mean that a big outfit like yahoo should be expected to
> download stuff at high speed off my customers servers? For varying values of
> 'high speed', ~500K/s (4Mbps+) for a 3 gig file is kinda... a bit harsh.
> Especially for an exe a user left exposed in a webdir, thats possibly (C)
> software and shouldnt have been there (now removed by customer, some kinda OS boot
> cd/toolset thingy).
>
> This makes it look like Yahoo is actually trafficking in pirated software, but
> that's kinda too funny to expect to be true, unless some yahoo tech decided to
> use that IP/server @yahoo for his nefarious activity, but there are better sites
> than my customer's box to get his 'juarez'.
>
> At any rate:
>
>> From Address           To Address                Proto    Bytes    CPS
> ==============================================================================================================================================================================================
> 67.196.xx.xx..80       67.195.112.151..44507     tcp    14872000 523000
>
> $ host 67.195.112.151 8.8.8.8
>
> 151.112.195.67.in-addr.arpa domain name pointer b3091122.crawl.yahoo.net.
>
> CIDR:           67.195.0.0/16
> NetName:        A-YAHOO-US8
>
> so that's yahoo, or really well spoofed.
>
> Is this expected/my own fault or what?
>
> A number of years ago, there were 1000s of videos on a customer site (training
> for elderly care, extremely exciting stuff for someone into -1-day movies to
> post on torrent sites). Customer called me to say his bw was gone, and I
> checked and found 12 yahoo crawlers hitting the site at 300K/s each (~30Mbps
> +) downloading all the videos. This was all the more injurious as it was only
> 2004 and bandwidth was more than $1/mbps back then. I did the really crass
> thing and nullrouted the whole /20 or whatever they were on per ARIN. It was
> the new-at-the-time video.yahoo.com search engine coming to index the whole
> site. I suppose they cant be too slow about it, or they'll never index a whole
> webfull of videos this century, but still, 12x 300K/s in 2004? (At the time
> Rasmus though it was kinda funny. I do too, now.)
>
> /kc




More information about the NANOG mailing list