How to get a list of research and academic ISP ?

Tom Vest tvest at pch.net
Tue Nov 21 04:33:25 UTC 2006


You might have a look at:

http://www.caida.org/publications/papers/2006/revealingas/ 
revealingas.pdf

The algorithm produces a lot of false negatives for non-English  
speaking countries that don't use .edu uniformly, but is otherwise an  
excellent place to start...

TV

On Nov 20, 2006, at 3:59 PM, Marshall Eubanks wrote:

>
> Hello;
>
> On Nov 20, 2006, at 3:13 PM, Maciej Kurant wrote:
>
>> Dear All,
>>
>>
>>
>>
>>
>> Thank you very much for numerous and quick replies for my email. I  
>> must say that nanog list is really highly responsive.
>>
>>
>>
>> I needed some time to digest your comments and try some new ideas.  
>> I share the preliminary results with you now, begging for further  
>> comments.
>>
>>
>>
>> The problem was (and still is) to find a good heuristic to  
>> distinguish between commercial (COM) and educational/research/ 
>> academic (EDU) ASes.
>>
>>
>
> I would suggest you need to think a little about what exactly you want
>
> - a list of _all_ academic ASN ?  (that will be tough, and you will  
> have to deal with corner cases, and you will not fully automate it)
> - a list of _some_ academic ASN ? (you have that now - so are you  
> worried about completeness or size or ... ?)
> - a list of _no_ academic ASN ? (again, this will be tough)
> or something else ?
>
> Note, too, that these lists will change with time.
>
>> *EDU_Abilene*
>>
>> My first approach (see my original email) was to extract a list of  
>> all destinations announced by Abilene. (The assumption is that  
>> Abilene generally does not announce commercial prefixes.) This  
>> results in a list, call it “EDU_Abilene”, of 1333 ASes.
>>
>>
>>
>>
>> *EDU_description*
>>
>> Some of you suggested looking at the names and descriptions of  
>> ASes. I used the AS list available at:
>>
>> http://www.multicasttech.com/status/asn_expand.txt
>>
>> and searched the last column ("Organization") for the following  
>> strings:
>>
>> "Universit|Univerz|Universida|research|education|science|scientif| 
>> academic|college|institut|laborator|school|ecole|
>>
>> edu|R&D|library|academy|Etudes"
>>
>> This approach finds 1796 "educational" ASes, call this set  
>> “EDU_description”.
>>
>>
>>
>> Of course, these two lists overlap, but less than I expected. In  
>> particular:
>>
>> len(EDU_Abilene)=1333
>>
>> len(EDU_description)=1796
>>
>> union(EDU_Abilene, EDU_description)=2269
>>
>> intersection(EDU_Abilene, EDU_description)=860
>>
>>
>>
>>
>>
>> For many reasons, these lists are far from being very precise. For  
>> instance EDU_Abilene contains AS 7132 (AT&T) and AS 8075  
>> (Microsoft). Therefore I need further data sets or filtering  
>> methodology. This raises some questions:
>>
>>
>>
>> 1) What other EDU networks (preferably with BGP tables available  
>> in the web) can I take as examples of ASes that (generally) do not  
>> announce commercial prefixes? Based on them I could construct  
>> lists similar in spirit to EDU_Abilene. I guess, the more the better.
>>
>>
>
> There are lots - look at the ones that Abilene peers with
>
> http://international.internet2.edu/partners/
> http://abilene.internet2.edu/peernetworks/international.html
>
>
>
>> 2) Do you know of other lists, similar to http:// 
>> www.multicasttech.com/status/asn_expand.txt  ? Maybe a longer  
>> description or a www related to an AS would help the method I use  
>> to create EDU_description. Do you think the strings I use in my  
>> search are appropriate?
>>
>>
> Try
> http://bgp.potaroo.net/as1221/asnames.txt
>
> Note that there are errors all over the place here; these lists  
> will not agree perfectly.
> My lists come from the rwhois data, but I correct for obvious  
> errors (some of which I have
> sent back to the list maintainers). There are others I am sure that  
> I have not caught, and my corrections are undoubtedly not perfect.  
> I am
> sure that the other maintainers of such lists could tell similar  
> tales.
>
> You could start polling rwhois yourself, and I would in doubtful  
> cases.
>
>>
>>
>> *AS relationships*
>>
>> Another approach is to exploit the AS relationships. Most of you  
>> agree that usually EDU ASes are not providers for COM customers.  
>> This suggests a way to detect false positives in EDU_Abilene and  
>> EDU_description (or in their union). For every EDU node check how  
>> many COM customers it has, i.e., EDU provider --- COM customer  
>> relationship. I used the AS graphs with inferred relationships  
>> provided by CAIDA (http://as-rank.caida.org/data/2006/). This  
>> method works well to find good candidates for false positive, but  
>> they should not be blindly accepted. For instance AS 7132 (AT&T)  
>> has the highest number of COM customers (615) and should obviously  
>> belong to COM (it is a member of EDU_Abilene). In contrast, a big  
>> component of the EDU backbone, AS 11537 (Abilene) has 66 COM  
>> customers! In general there are about 50 EDU nodes with more than  
>> 10 COM customers each.
>>
>>
>
> Not a bad approach.
>>
>>
>> 3) What other “automatic” or “manual” approaches would you  
>> suggest? Or improvements of the ones just described?
>
>
> Again, I don't know what you are trying to do. What I have found  
> useful is what you are doing - make lots of lists, and cross  
> reference, and
> see what passes multiple tests.
>>
>>
>>
>>
>> I will appreciate even the briefest comments and suggestions,
>>
>> Maciej Kurant
>>
>>
>>
>>
>
> Hope this helps.
>
> Regards
> Marshall
>
>>
>>
>> From: Maciej Kurant [mailto:maciej.kurant at epfl.ch]
>> Sent: mercredi, 15. novembre 2006 18:46
>> To: 'nanog at merit.edu'
>> Subject: How to get a list of research and academic ISP ?
>>
>>
>>
>> Dear all,
>>
>>
>>
>> I am a PhD student at EPFL, Switzerland. My recent research  
>> interest is in large scale differences between the commercial and  
>> academic parts of the Internet.
>>
>>
>>
>> Of course, in order to perform this kind of studies I need a way  
>> to distinguish between these two worlds. I’ve learnt that Abilene  
>> does not provide commercial connectivity. This means that BGP  
>> prefixes and AS paths announced by Abilene BGP routers should lead  
>> only to research and academic destinations. I have extracted (from  
>> the BGP tables at http://abilene.internet2.edu/observatory) a list  
>> of all such destinations and obtained 1333 ASes (for data form  
>> July 2006). The number looks reasonable, but I would like to be  
>> sure that I am not making a mistake. Therefore I would be grateful  
>> if you could answer the following questions:
>>
>>
>>
>> 1)       Is this approach to obtain a list of research and  
>> academic ISPs correct?
>>
>> 2)       Do you maybe know of such lists compiled before?
>>
>> 3)       If I keep not only the destination ASes, but also all  
>> ASes on the AS paths towards these destination I obtain a list of  
>> about 1400 ASes. How should I understand this? Does it mean that  
>> some research and academic destinations are reachable from Abilene  
>> only by traversing the commercial Internet?
>>
>> 4)       Of course, research and academic ASes are often well  
>> connected to the commercial Internet. My guess is that in most  
>> cases their peering relationship is “customer-provider”, where  
>> commercial ASes are providers. Is it possible that an academic AS  
>> is a provider for some commercial ASes? If so, does it happen often?
>>
>>
>>
>> Thank you in advance for your comments.
>>
>> Maciej Kurant
>>
>>
>>
>>
>>
>>
>>
>> =============================================
>>
>>
>>
>> EPFL IC ISC LCA3
>>
>> Maciej Kurant
>>
>> PhD Student
>>
>> CH-1015 Lausanne, Switzerland
>>
>>
>>
>> web site:  http://lcawww.epfl.ch/kurant
>>
>>
>>
>> =============================================
>>
>>
>>
>>
>




More information about the NANOG mailing list