The *.com/robots.txt
Guy Coslado (GC0111)
guy at coslado.com
Wed Sep 24 18:52:24 UTC 2003
I've found inconsistencies in search engines mainly with domain name
having transient status. Such dn inherit a new IP , the *.com IP ( the
sitefinder IP).
And sitefinder itself has its own inconsistency:
Here an example using Nestscape or Mozilla (my IE6 config gives
other results).
http://www.pallet-containers-unlimited.com/bizdc.html
http://sitefinder.verisign.com/lpc?url=pallet-containers-unlimited.com/bizdc.html&host=pallet-containers-unlimited.com
That gives a link in
Did You Mean ?
We did find these similar Web addresses.
http://www.pallet-containers-unlimited.com/bizdc.html
And now searching with sitefinder
http://sitefinder.verisign.com/spc?sb=pallet-containers-unlimited.com&searchboxtype=1&op=landing&search=Search
If VeriSign sitefinder doesnt take care of this case, what can we
wait with other search engines ?
The query :
http://www.pallet-containers-unlimited.com/robots.txt
gives
User-agent: *
Disallow: /
is also a false answer that can confuse lot of http agents
=>
for simple example, sites with dn in REDEMPTIONPERIOD can be
suppressed or blacklisted from search engines indexes for a while.
Because nobody knows already all the side effects
I'm not sure having a robots.txt here is the best choice.
On the other hand SE indexes can keep undefinitively no (more) existent sites
without the *.com/robots.txt
Possibly the *.com redirect will give us other surprises with search engines.
Guy Coslado.
http://www.coslado.com Bots & Smart Agents
Pour la Guilde des metiers du logiciel: admin at fr.scguild.org
http://www.fr.scguild.com
More information about the NANOG
mailing list