yahoo crawlers hammering us
ken at sizone.org
Tue Sep 7 15:19:58 CDT 2010
So i guess im new at internets as my colleagues told me because I havent gone
around to 30-40 systems I control (minus customer self-managed gear) and
installed a restrictive robots.txt everywhere to make the web less useful to
Does that really mean that a big outfit like yahoo should be expected to
download stuff at high speed off my customers servers? For varying values of
'high speed', ~500K/s (4Mbps+) for a 3 gig file is kinda... a bit harsh.
Especially for an exe a user left exposed in a webdir, thats possibly (C)
software and shouldnt have been there (now removed by customer, some kinda OS boot
This makes it look like Yahoo is actually trafficking in pirated software, but
that's kinda too funny to expect to be true, unless some yahoo tech decided to
use that IP/server @yahoo for his nefarious activity, but there are better sites
than my customer's box to get his 'juarez'.
At any rate:
>From Address To Address Proto Bytes CPS
67.196.xx.xx..80 184.108.40.206..44507 tcp 14872000 523000
$ host 220.127.116.11 18.104.22.168
22.214.171.124.in-addr.arpa domain name pointer b3091122.crawl.yahoo.net.
so that's yahoo, or really well spoofed.
Is this expected/my own fault or what?
A number of years ago, there were 1000s of videos on a customer site (training
for elderly care, extremely exciting stuff for someone into -1-day movies to
post on torrent sites). Customer called me to say his bw was gone, and I
checked and found 12 yahoo crawlers hitting the site at 300K/s each (~30Mbps
+) downloading all the videos. This was all the more injurious as it was only
2004 and bandwidth was more than $1/mbps back then. I did the really crass
thing and nullrouted the whole /20 or whatever they were on per ARIN. It was
the new-at-the-time video.yahoo.com search engine coming to index the whole
site. I suppose they cant be too slow about it, or they'll never index a whole
webfull of videos this century, but still, 12x 300K/s in 2004? (At the time
Rasmus though it was kinda funny. I do too, now.)
Ken Chase - ken at heavycomputing.ca - +1 416 897 6284 - Toronto CANADA
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
More information about the NANOG