Yahoo! clue (Slightly OT: Spiders)

Matthew Petach mpetach at netflight.com
Tue Jun 5 20:57:23 UTC 2007


On 3/30/07, Zach White <zwhite at darkstar.frop.org> wrote:
> On Thu, Mar 29, 2007 at 10:17:50AM -0400, Kradorex Xeron wrote:
> > Another problem is that the Yahoo/Inktomi search robots do not stop if no site
> > is present at that address, Thus, someone could register a DNS name and have
> > a site set on it temporarily,  just enough time for Yahoo/Inktomi's bots to
> > notice it, then redirect it thereafter to any internet host's address and the
> > bots would proceed to that host and access them over and over in succession,
> > wasting bandwidth of both the user end (Which in most cases is being
> > monitored and is limited, sometimes highly by the ISP), and the bot's end
> > wasted time that could have been used spidering other sites.
>
> It's not limited to that. I bought this domain which had previously been
> in use. I've owned the domain for over 5 years, but I still get requests
> for pages that I've never had up.
>
> <zwhite at leet:/var/www/logs:8>$ grep ' 404 ' access_log | grep
> darkstar.frop.org | awk '/Yahoo/ { print $8 }' | wc -l
>      830
> <zwhite at leet:/var/www/logs:9>$ grep ' 404 ' access_log | grep
> darkstar.frop.org | awk '/Yahoo/ { print $8 }' | sort -u | wc -l
>       82
>
> That's 82 unique URLs that have been returning a 404 for over 5 years.
> That log file was last rotated 2006 Sep 26. That's averaging 138
> requests per month for pages that don't exist on that one domain alone.
> How many bogus requests are they sending each month, and what can
> we do to stop them? (The first person to say something involving
> robots.txt gets a cookie made with pickle juice.)
>
> Sure, on my domain alone that's not a big deal. It hasn't cost me any
> money that I'm aware of, and it hasn't caused any trouble. However, it
> is annoying, and at some point it becomes a little ridiculous.
>
> Can anyone that runs a large web server farm weigh in on these sorts of
> requests? Has this annoyance multiplied over thousands of domains and
> IPs caused you problems? Increased bandwidth costs?
>
> -Zach


Speaking purely for myself, and not for any other organization, I would
wonder what level of response you had gotten from the abuse address
listed in the requesting netblock:

mpetach at netops:/home/mrtg/archive> whois -h whois.ra.net 74.6.0.0/16
route:      74.6.0.0/16
descr:      YST
origin:     AS14778
remarks:    Send abuse mail to slurp at inktomi.com
mnt-by:     MAINT-AS7280
source:     RADB
mpetach at netops:/home/mrtg/archive>

First line of inquiry in my mind would be to use the slurp@
email, and work my way along from there.

Matt



More information about the NANOG mailing list