news from Google

JC Dill jcdill.lists at gmail.com
Sat Dec 12 03:38:18 UTC 2009


Seth Mattinen wrote:
> JC Dill wrote:
>> Seth Mattinen wrote:
>>>  Hell, all you gmail users on this list right now are feeding the 
>>> machine with all our data.
>>>
>>> The part that gets me: everyone seems happy with this. 
>>
>> This list has public archives that are already crawled and archived 
>> by Google.  For example:
>>
>> http://www.merit.edu/mail.archives/nanog/threads.html
>> http://seclists.org/nanog/2009/Dec/434
>>
>> Subscribing to the list with a gmail account doesn't change anything 
>> about what Google knows about the list or list members.
>>
>
> Those URL's don't seem to include "google.com" in them. Maybe I'm 
> misreading them.

I *found* them by searching with Google.  I found the second link by 
searching for a unique phrase from your email:

http://www.google.com/search?q=nanog+%22feeding+the+machine

A mere 1 hour after you emailed it to the NANOG list, Google web search 
has that email archived from the website on seclists.org.

> Crawlers can be excluded with robots.txt if so chosen by the site 
> owner so long as google respects said file. 

Google does respect that file, but you are counting on other subscribers 
respecting the site owner's wishes regarding web archives.  In my 
experience, this has become a futile fight.  If the list doesn't have a 
web accessible archive, it's likely one of the list's subscribers might 
start their own archive or have it archived with one of the many archive 
sites e.g. gmane.

> Some lists also respect a "no archive" header that some people choose 
> to include with their messages.

If you are emailing a publicly archived mailing list that you know is 
web archived and likely spidered by Google, a "no archive" header is 
mostly useless.  When someone replies to your email (as I'm doing now) 
your quoted text in the reply will be archived, preserving what you 
posted to the list.  At best, the "no archive" header merely messes up 
threading.  The "no archive" header idea never really worked in the 
first place - witness all the old usenet server posts that ended up on 
dejagoogle even when the posts had "no archive" headers.
>
> Preventing my email to gmail from entering their vast database of 
> whatever they track doesn't have any such control features that I'm 
> aware of.

Preventing any email you send to anyone from being leaked out to the 
public is something you have no control of.  I.e. the CRU hacked email 
controversy.  If you don't want what you write to be posted on or 
archived on the internet and findable with web searches, don't use the 
internet to write or transmit it.  Even then, you are at risk of someone 
scanning and posting what you write.  As a NANOG subscriber you should 
be clueful enough to know all of this already.  So what's the big issue 
here?

jc





More information about the NANOG mailing list