Sorry! Here's the URL content (re. Paging Google...)
Matthew Elvey
matthew at elvey.com
Mon Nov 14 21:56:30 UTC 2005
Doh! I had no idea my thread would require login/be hidden from general
view! (A robots.txt info site had directed me there...) It seems I
fell for an SEO scam... how ironic. I guess that's why I haven't heard
from google...
Anyway, here's the page content (with some editing and paraphrasing):
Subject: paging google! robots.txt being ignored!
Hi. My robots.txt was put in place in August!
But google still has tons of results that violate the file.
http://www.searchengineworld.com/cgi-bin/robotcheck.cgi
doesn't complain (other than about the use of google's nonstandard
extensions described at
http://www.google.com/webmasters/remove.html )
The above page says that it's OK that
#per [[AdminRequests]]
User-agent: Googlebot
Disallow: /*?*
is last (after User-agent: *)
and seems to suggest that the syntax is OK.
I also tried
User-agent: Googlebot
Disallow: /*?
but it hasn't helped.
I asked google to review it via the automatic URL removal system
(http://services.google.com/urlconsole/controller).
Result:
URLs cannot have wild cards in them (e.g. "*"). The following line
contains a wild card:
DISALLOW: /*?
How insane is that?
Oh, and while /*?* wasn't per their example, it was legal, per their
syntax, same as /*? !
The site as around 35,000 pages, and I don't think a small robots.txt to
do what I want is possible without using the wildcard extension.
More information about the NANOG
mailing list