The *.com/robots.txt

Guy Coslado (GC0111) guy at coslado.com
Wed Sep 24 18:52:24 UTC 2003


I've found 	inconsistencies in search engines mainly with  domain name 
having transient status. Such dn inherit a new IP , the  *.com IP ( the
sitefinder IP). 

And sitefinder itself has its own inconsistency:

Here an example using Nestscape or Mozilla  (my  IE6 config gives
other results).

http://www.pallet-containers-unlimited.com/bizdc.html

http://sitefinder.verisign.com/lpc?url=pallet-containers-unlimited.com/bizdc.html&host=pallet-containers-unlimited.com

That gives  a link in
Did You Mean ?
We did find these similar Web addresses. 
http://www.pallet-containers-unlimited.com/bizdc.html

And now searching with sitefinder
http://sitefinder.verisign.com/spc?sb=pallet-containers-unlimited.com&searchboxtype=1&op=landing&search=Search

If VeriSign sitefinder doesnt take care of  this case, what can we 
wait with other search engines ?

The query :
http://www.pallet-containers-unlimited.com/robots.txt
gives

User-agent: *
Disallow: /

is also a  false answer that can  confuse lot of  http agents 
=>
for simple example,  sites with dn in REDEMPTIONPERIOD  can  be 
suppressed or  blacklisted   from  search engines indexes for a while.

Because nobody knows already all the side effects
I'm not sure having a robots.txt  here is  the best choice.

On the other hand SE indexes can keep undefinitively no (more) existent sites  
without the  *.com/robots.txt

Possibly  the  *.com redirect will give us other surprises  with search engines.



Guy Coslado.

http://www.coslado.com  Bots & Smart Agents
Pour la Guilde des metiers du logiciel: admin at fr.scguild.org
http://www.fr.scguild.com

 



More information about the NANOG mailing list