non-provider aggregation, was: IPv6 news

Wed Oct 19 21:40:52 UTC 2005

On 17-okt-2005, at 14:18, Jeroen Massar wrote:

>>> Another alternative is to force-align allocation and topology in  
>>> some
>>> way /other/ than by "Providers" (geographical allocation in whatever
>>> hierarchy, IX allocation, whatever), such that networks were easily
>>> aggregatable. Lots of objections though (the "providers and  
>>> geography
>>> don't align" one though is ultimately slightly bogus, because with
>>> non-provider-aligned allocation policies in place it would be in
>>> providers interests to align their peering to match the allocation
>>> policy).

The current assumption is that all aggregation happens on ISP.  
Replacing that with the assumption that all aggregation will happen  
on geography isn't all that useful. The important thing here is that  
you can aggregate on pretty much anything: hair color, router vendor,  
market capitalization, you name it. In the end, you always aggregate  
on the way the addresses are given out, which may or may not be  
meaningful.

Aggregating on provider is the most powerful because the aggregate  
leads you fairly directly to the place where you need to go as long  
as the destination is single homed.

But suppose at some point we end up with a routing table consisting  
of 10 million PI blocks from multihomers and some unimportant stuff  
that disappears in the error margin (i.e., those 5000 IPv6 /20s for  
huge ISPs). Also suppose that it's possible to build a reasonably  
cost effective router that handles 1M routes, but this router  
technology doesn't scale to the next order of magnitude.

The simple solution is to build a big router that actually consists  
of 11 small ones: 10 sub-routers that each hold one tenth of the  
global routing table, and an 11th sub-router that distributes packets  
to the sub-router that holds the right part of the global routing table.

So sub-router 1 has the part of the global IPv6 routing table that  
falls within 2000::/6, sub-router 2 has 2400::/6, sub-router 3  
2800::/6 and so on.

So we're aggregating here, but not really "on" something. This has  
the unpleasant side effect that we now have to spend 11 times more  
money to keep a 10 times larger routing table.

Alternatively, we can trade hardware costs for bandwidth, by having  
10 routers that are present in the network anyway each handle part of  
the global routing table. So a router in Boston would handle  
everything under 2000::/6, a router in Chicago 2400::/6, one in  
Seattle 2800::/6 and so on. Obviously this isn't great if you're in  
Boston and your address is 2800::1, but it doesn't require additional  
hardware.

This scheme can be optimized by aligning addressing and geography to  
a reasonable degree. So if you're in Boston, you'd get 2000::1 rather  
than 2800::1. But that doesn't magically shrink the routing table to  
one route per city. In the case of Boston, it's likely that the  
source and destination ISPs for a certain packet don't interconnect  
within the city itself. So someone sitting in New York probably won't  
see much difference: he or she still has to carry all the routes for  
multihomers in Boston. Some of these will point to her own customers  
in Boston, some to peers in New York, others to peers in DC, and so on.

However, as distance increases the difference between "this packet  
needs to go to a customer in Boston", "this packet needs to go to a  
peer in New York" and "this packet needs to go to a peer in DC"  
becomes meaningless, so it's possible to replace a large number of  
individual routes by a single city or region aggregate.

So even without magic interconnection dust, aggregation based on  
geographical addressing can have benefits. However, it has several  
limitations. An important one is that early exit routing is replaced  
by late exit routing. Also, when someone multihomes by connecting to  
ISPs in Miami and Tokyo you don't get to aggregate. But worst case,  
you just don't get to aggregate, either because people multihome in  
weird ways, for traffic engineering reasons or because of lack of  
interconnection (however as interconnects become really sparse the  
savings go up again) so you're no worse off than today. But if and  
when the routing tables explode and routers can't keep up, having  
geographical addressing in place for multihoming allows for a plan B  
that we don't have today.

>> I think we need a researcher to sit down and
>> figure out exactly what this would look like
>> in a sample city and a sample national provider.

> There has been quite some research on it, there where ideas, there was
> even talk of a vendor going to implement it, but it never happened. It
> won't work because of cash reasons (read: telco/transit don't want it)

I'm not familiar with that... Do you have a reference?

> For your 'city data' check:
> http://unstats.un.org/unsd/demographic/default.htm

> or for pre-processed files:
> http://arneill-py.sacramento.ca.us/ipv6mh/ under "Geographical data".

Note that this page hasn't been updated in more than two years. When  
Michel started this initiative the IETF multihoming in IPv6 (multi6)  
working group was pretty much dead and it certainly wasn't  
considering any input. However, our efforts resulted in the wg coming  
back to life again, considering input, rejecting most of it, and  
start work on a solution in a new wg: shim6.

(Paul Jakma wrote something to the effect that I am involved with  
shim6 so that says something about other options. It doesn't, as far  
as I'm concerned. But shim6 is a worthy pursuit in its own right.)

For anyone who wants to read the latest version of all of this (still  
two years old, though): http://www.muada.com/drafts/draft-van-beijnum- 
multi6-isp-int-aggr-01.txt

> especially:
> http://arneill-py.sacramento.ca.us/ipv6mh/geov6.txt

> Which indeed seems quite reasonable. The problem with this is:
>  'who is paying for which traffic and to whom'

Just because I press the "up" button for the elevator doesn't mean  
I'm going to the top floor. Still, having one button for "down" and  
one for "up" rather than having a different one for each floor seems  
to work well for this initial part. Once you get inside the elevator  
you still have to pick a floor, of course.

Iljitsch