How to get a list of research and academic ISP ?

Maciej Kurant maciej.kurant at epfl.ch
Mon Nov 20 20:13:39 UTC 2006


Dear All,

 

 

Thank you very much for numerous and quick replies for my email. I must say
that nanog list is really highly responsive. 

 

I needed some time to digest your comments and try some new ideas. I share
the preliminary results with you now, begging for further comments.

 

The problem was (and still is) to find a good heuristic to distinguish
between commercial (COM) and educational/research/academic (EDU) ASes. 

 

*EDU_Abilene*

My first approach (see my original email) was to extract a list of all
destinations announced by Abilene. (The assumption is that Abilene generally
does not announce commercial prefixes.) This results in a list, call it
"EDU_Abilene", of 1333 ASes. 

 

 

*EDU_description*

Some of you suggested looking at the names and descriptions of ASes. I used
the AS list available at: 

 <http://www.multicasttech.com/status/asn_expand.txt>
http://www.multicasttech.com/status/asn_expand.txt 

and searched the last column ("Organization") for the following strings:

"Universit|Univerz|Universida|research|education|science|scientif|academic|c
ollege|institut|laborator|school|ecole|

edu|R&D|library|academy|Etudes"

This approach finds 1796 "educational" ASes, call this set
"EDU_description".

 

Of course, these two lists overlap, but less than I expected. In particular:

len(EDU_Abilene)=1333

len(EDU_description)=1796

union(EDU_Abilene, EDU_description)=2269

intersection(EDU_Abilene, EDU_description)=860

 

 

For many reasons, these lists are far from being very precise. For instance
EDU_Abilene contains AS 7132 (AT&T) and AS 8075 (Microsoft). Therefore I
need further data sets or filtering methodology. This raises some questions:

 

1) What other EDU networks (preferably with BGP tables available in the web)
can I take as examples of ASes that (generally) do not announce commercial
prefixes? Based on them I could construct lists similar in spirit to
EDU_Abilene. I guess, the more the better. 

 

2) Do you know of other lists, similar to
<http://www.multicasttech.com/status/asn_expand.txt>
http://www.multicasttech.com/status/asn_expand.txt  ? Maybe a longer
description or a www related to an AS would help the method I use to create
EDU_description. Do you think the strings I use in my search are
appropriate?

 

 

*AS relationships*

Another approach is to exploit the AS relationships. Most of you agree that
usually EDU ASes are not providers for COM customers. This suggests a way to
detect false positives in EDU_Abilene and EDU_description (or in their
union). For every EDU node check how many COM customers it has, i.e., EDU
provider --- COM customer relationship. I used the AS graphs with inferred
relationships provided by CAIDA ( <http://as-rank.caida.org/data/2006/>
http://as-rank.caida.org/data/2006/). This method works well to find good
candidates for false positive, but they should not be blindly accepted. For
instance AS 7132 (AT&T) has the highest number of COM customers (615) and
should obviously belong to COM (it is a member of EDU_Abilene). In contrast,
a big component of the EDU backbone, AS 11537 (Abilene) has 66 COM
customers! In general there are about 50 EDU nodes with more than 10 COM
customers each. 

 

 

3) What other "automatic" or "manual" approaches would you suggest? Or
improvements of the ones just described? 

 

 

I will appreciate even the briefest comments and suggestions,

Maciej Kurant

 

 

 

  _____  

From: Maciej Kurant [mailto:maciej.kurant at epfl.ch] 
Sent: mercredi, 15. novembre 2006 18:46
To: 'nanog at merit.edu'
Subject: How to get a list of research and academic ISP ?

 

Dear all,

 

I am a PhD student at EPFL, Switzerland. My recent research interest is in
large scale differences between the commercial and academic parts of the
Internet. 

 

Of course, in order to perform this kind of studies I need a way to
distinguish between these two worlds. I've learnt that Abilene does not
provide commercial connectivity. This means that BGP prefixes and AS paths
announced by Abilene BGP routers should lead only to research and academic
destinations. I have extracted (from the BGP tables at
http://abilene.internet2.edu/observatory) a list of all such destinations
and obtained 1333 ASes (for data form July 2006). The number looks
reasonable, but I would like to be sure that I am not making a mistake.
Therefore I would be grateful if you could answer the following questions: 

 

1)       Is this approach to obtain a list of research and academic ISPs
correct?

2)       Do you maybe know of such lists compiled before? 

3)       If I keep not only the destination ASes, but also all ASes on the
AS paths towards these destination I obtain a list of about 1400 ASes. How
should I understand this? Does it mean that some research and academic
destinations are reachable from Abilene only by traversing the commercial
Internet?

4)       Of course, research and academic ASes are often well connected to
the commercial Internet. My guess is that in most cases their peering
relationship is "customer-provider", where commercial ASes are providers. Is
it possible that an academic AS is a provider for some commercial ASes? If
so, does it happen often?

 

Thank you in advance for your comments.

Maciej Kurant

 

 

 

=============================================

 

EPFL IC ISC LCA3

Maciej Kurant

PhD Student

CH-1015 Lausanne, Switzerland

 

web site:   <http://lcawww.epfl.ch/kurant> http://lcawww.epfl.ch/kurant

 

=============================================

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20061120/1b5a52a8/attachment.html>


More information about the NANOG mailing list