Internet core scale and market-based address allocation

Sean M.Doran smd at cesium.clock.org
Mon May 5 16:18:33 UTC 2003


On Friday, May 2, 2003, at 15:40 Europe/London, Daniel Golding wrote:

> - Processor speeds have increased dramatically
> - Memory is dirt cheap in a way almost unthinkable in 1996. [but see 
> smd's p.s.]

Incredible scale in one fewer dimension is *good*.

Divide and conquer, within router systems, has been
enormously successful.   Moreover, putting relatively
less work onto the general computing part of a modern
large router has given us one less thing to worry about
compared to the general computing industry.  Given
the amount of work it takes to build IP-forwarding hardware
and nowadays specialized RAMs, this is a good thing.

The amount of memory in large router systems has
increased substantially, but not as substantially as
the pps switching power of the same platforms.

Rough numbers, not taking into account buffering,
code bloat, architectural issues, bank holiday laziness,
the hangover from faak.subnet.dk, and so on -

1996: 8-slot x ~50 Mbps, ~256 MBytes RAM per system
...
2003: 16-slot x 40-Gbps, ~16 GBytes RAM per system

Fewer doublings.

Bandwidth scaling, system: 500 Mbps vs. 500 Gbps -> 3 orders of 
magnitude, base ten  (10 in base 2)
Memory scaling, system: 320 MBytes vs 16 GBytes -> 2 orders (base 10) 
(6 in base 2)

System memory has scaled less than typical city-to-city core bandwidth 
builds:
1996: ~OC3 x 1
2003: ~OC192 x 8 (aka OC3 x 512 - 9 orders of magnitude, base two)
           Yes, this *does* require interesting POP geometry, but so did 
~155 without POS.

Memory has also needed to become much faster, of course,
to handle the arrival, storage, manipulation, and departure of the all 
bits
through various parts of a system (often with an internal speedup
over the max slot/interface bandwidths).

In-system memory these days is used more by specialized pps
hardware, not by general-purpose CPUs.   Most development
work has therefore gone into these chips
to allow for more operations-per-packet (mmm, pipelining) and
more packets per second, while the expectation is that
cooler, stabler general purpose CPUs and ordinary RAMs
will suffice for constructing the data structures these forwarding
engines operate upon.

More importantly than keeping the "routing brain" underpowered
compared to run-of-the-mill PeeCees, constraining the amount
of state in the network gives us some system slack in making
time-space tradeoffs within a routing system (and parts thereof).
As an industry, we can cope with memory speed increases
levelling off, OR with specialized chip production slowing down.
Increasing the amount of state in the network destroys
a very important belts-and-braces approach to growth.

> So, a little additional routing table bloat hurts no one. Yes, yes, 
> this
> is heresy.

If all systems scaled in proportion to the "routing brain" ONLY,
with the "routing brain" needing to be essentially just a PeeCee
or Mac, you would be right.

Unfortunately, this is not the case in the core, and the core
is an expensive contribution to your monthly bills.   However,
it's nowhere near as large a contribution as what you likely
are paying your "recovering monopolist" local access provider.

Core costs get constrained because, as an industry, they must.
If you stop the LAP being more than 50% of the price of connecting
to the Internet, then yep, we can afford to be much more lax
with respect to the routing table, even if it results in much more
expensive core routing devices.

> - Carriers have no incentive to change their filters

Except the bloody noses one gets in the press from time to time.

> - The current length filters work quite nicely.

Thanks.   I think so too, still.
>

> If, by a routing prefix market, you mean that folks with lots of 
> prefixes
> get to pay folks to carry their data, then its a DOA idea. Current
> Settlement Free Peering arrangements work fine - no one is looking to
> upset the apple cart.

Except at least ITU-T study group 3 (google on SG3 D.50 and *be 
concerned*).

I personally would like a time-machine to go back and implement a web 
form
that would allow one to use a credit card to blow holes in the 
prefix-length
filter for a reasonable monthly fee, as opposed to effectively 
$INFINITY,
and open-source the software.

Note for motivated coders: it's not too late for this...  As drc wrote:

>> If someone can figure out how to get the ISPs of the world to
>> participate in a routing prefix market, then it might be worth
>> revisiting this idea.  Note that there is nothing stopping 
>> establishing
>> a routing prefix market now, so it could be done prior to changing
>> address allocation policies

There are certainly other approaches than this, but
I believe this would have solved most of the problems in CIDRs 
childhood,
and eliminated quite a few we are starting to see as it approaches 
puberty.

	Sean.

P.S.: I guess I have a different view of "unthinkable price reductions".
         Do you pine for 1990's $6/hour 9600bps dialup IP too?

         In mid 1997 the price per Mbps per month ("non-overbooked") in
         Europe was about USD 17000.   Pricing is now in many cases
         around USD 170. (I don't take into account inflation and 
exchange rate...)
         Trans-Atlantic *traffic* has gone from about a Gbps to perhaps 
a hundred.
         The Internet industry's ability to deal with the market with 
only a few mostly
         parent-company-inflicted organizational collapses is pretty 
astonishing.

         This, however, does not mean we should make it HARDER over the 
next
         few years.  The "boring lull" some tech-focused people have 
complained
         about is a cyclical thing (T1 -> nxT1; T3 -> nxT3; OC3-nxOC12;
         OC48-nxOC192 periods for example), and is probably approaching 
its end.
         Remember the fun of SSPs, CIDR collapse, POS, gated vs 
"everyone else's" BGP,
         FIB switching, BFR/GSR, JunOS, packet over WDM?   Step changes 
take
         a bit of advance planning and development; shifting the target 
in the wrong
         direction just before one is publically unleashed is, well, 
something I'd rather
         you not do.




More information about the NANOG mailing list