Rocketfuel vs. topology (was Re: Risk of Internet collapse grows)

Tue Dec 3 00:52:43 UTC 2002

On Monday, Dec 2, 2002, at 11:07 Europe/London, 
Michael.Dillon at radianz.com wrote:

> I had a look at your map of Ebone Europe through the browse button on 
> your
> website. This displayed a messy meshy network that connected all the 
> major
> cities of Europe. However, in fact, Ebone's network was a nice clean
> ringed network connecting all the major cities of Europe.

"subtended ring architecture" is the expression you are looking for -:)

The graph displayed on the rocketfuel page is extremely plausible
given their methodology.  One key thing to note is: "we find roughly
seven times more routers and links in our area of focus than Skitter",
which is symptomatic of the true problem of this kind of topology 
discovery:
the Internet is fundamentally anisotropic.

*NO* set of measurements short of brute-force any-to-any will ever
discover all the possible paths even in a 100% static Internet, simply
because of the natures of aggregation (hides information), and 
hop-by-hop
forwarding (conceals 2ndary paths).   The two combine in surprising 
ways.

Dynamic routing change also will reveal false paths
(thanks to ttl processing during transient loop/blackhole behaviours
which are UNAVOIDABLE with vector-based routing protocols like BGP).
In other words, more measurements can mean more paths, which
seems good, except that some of those paths may be the result of
chronic route flutter, which happens (just ask yer box about flapping).

Hash-based load-balancing can further obscure connectivity,
although the Rocketfuel people embrace the reduction of
fully-equivalent paths, anyway.  However, there are "equal-cost"
L3 paths which are widely separated at lower layers.

However, far from naively expecting isotropy and a full discovery
of information, their very first footnote admits traceroute's 
shortcomings,
and their methodology somewhat resembles X-Ray crystallography,
which has developed some techniques for analysing complex anisotropic
structures (like proteins).

While this is a clever approach, they did miss a chance to
try to eliminate spurious links apparently introduced by
path asymmetry (which I believe is extremely commonplace,
particularly around ring-shaped structures
with traffic-direction bias (e.g., U.S.->Europe is much larger
than Europe->U.S. by bps and pps), at least to the extent
that the LSRR IP header option is allowed across networks
and handled by hops between their measurement vantage points
at the edge.

A larger set of observation points in Central/Eastern/Southern
Europe might also have revealed some of these biases, and heuristics
at least could weed out spurious links.

Clearly, though, their approach to reducing the set of observations
among a group of vantage points is novel, and they did deliberately
seek out a much larger group of vantage points than other studies;
they admit that they are "scratching the surface" of automated
map construction, and they do not claim to have produced the most 
accurate automatically-generated map possible. What they have done, 
however, is much better than their well-known predecessors, at least 
when I compare their
results with what I know about the L0/L1 constructions of a couple
of their targets.   This is good science.

>
> I just don't see how an outside probe can determine the true topology 
> of a network.
>

What *is* the true topology of a IP network?

If you accept that the topology is a graph comprising vertices (routers)
at which a packet is forwarded to other vertices (with a line drawn 
between
any pair of vertices where the forwarding is possible), then it is 
possible
to describe a close match with observations taken from beyond the edges
of such a network.   Rocketfuel is a neat approach, but finds too many
false paths, again probably because of transient routing changes
and asymmetries of varying duration and severity.

Attempting to filter out the dynamic noise is probably possible; Vern 
Paxson's
massive traceroute work some years ago illustrated its existence, and 
if you
can see it, maybe you filter it out. :-)

However, if your target for "topology" requires line parts of the graph
as a complete set of viable L2 links, or even worse, as lower-layer
components across which some networks build single L2 paths (POS, PPP)
and some networks build large numbers of them (FR, ATM, MPLS), then the 
work
gets much harder, mostly because there is in some routers no practical
way of filtering out internal delay at the responding hop from the RTT 
measurement
using traceroute.   (NTP synchronization with the routers in the path 
might help in correlating very stable  (L3 path, delay) tuples to the 
correct L0 path.)

In other words, MPLSD lets you hallucinate lots of direct 
router-to-router
links which are phantoms.  The packets still transit routers (sorry, 
"LSRs"),
but detecting them seems hard.

> If a researcher wants to do analysis of real network topologies they
> either need to get the real maps from the ISPs in question or else they
> need to ally themselves with someone like Telegeography who have this
> information for at least some ISPs.

I think that trying to treat the Internet as something (nearly fully) 
self-documenting,
if you only write the correct "man page formatter", is extremely neat.

	Sean.