too many routes

Sean M. Doran smd at clock.org
Thu Sep 11 00:33:09 UTC 1997


"Joseph T. Klein" <jtk at titania.net> writes:

> Having hopelessly screwed up my facts ... I was trying to make a point here.
> So the router was worse than I thought. Retaing policies that exclude
> new players because of AGS+'s inability to handle large routing flaps
> just does not cut it.
> 
> Sprint imposed this at a time when 7000s with 64M of memory where available.
> Will /19 remain policy when the majors are running with Cisco 12000 and GRFs?

I am not sure what point you are trying to make here.

If you feel like grinding this axe again, I am more than
willing to play in my fleeting spare moments.

An AGS+ with a CSC/4 has exactly the same CPU as a 7000
with an RP.  In fact, AGS+ performance is slightly higher
in some cases because of some interesting design features
of the 7000 and the other AGS+ downgrade path routers
(notably the 7500 series).

I did not put in a filter on /18s (yes it was /18s
initially, and got changed to /19s after much discussion
with the registries, especially Daniel Karrenberg at RIPE,
in an attempt to harmonize Sprint's filters with
slow-start allocation policies) because of the AGS+
difficulties; all the routers that were carrying full
routing at the time had 64Mbytes of RAM, and the two
remaining AGS+es were there to implement historical things
done principally for ICM (like a STUN connection and the
PANAMSAT router).

What triggered the filter was the observation that in 
the blocks freshly allocated by all three registries were
very poorly aggregated.  More annoyingly, those allocated
to Sprint's principal peers (most notably Internet MCI)
demonstrated the worst aggregation; in one case a /14 was
announced almost exclusively as prefixes no shorter than
19 bits.

After spending some time trying to chase this down -- with
some success, as in the case of PSI's then newest blocks,
but not in the case of Internet MCI, who did nothing -- I
decided to issue a warning that once the /8 that the
InterNIC was using had filled up (and after some
discussion, once RIPE and APNIC proceeded to allocate from
new /8s), I would begin filtering all new unicast
addresses to ignore Sprintward announcements of any prefix
longer than 18 bits.  Moreover, I also announced that I
would filter out any subnets of historically classful As
and Bs.

The warning was several months old when people started
noticing that they couldn't reach things behind
Sprintlink, and alot of time was spent explaining to
people that this shouldn't have surprised them at all.

Some changes happened, notably I dropped down to 19 bits,
the registries began to explain to people that anything
longer than that almost certainly would not be routable,
and that allocation != routing.

This measurably flattened the growth curve of the number
of prefixes seen by default-free routers, changing it from
a nearly exponential function to a linear one, with the
slope below that of Moore's law.

In other words, it probably as much as the initial
introduction of supernetting as a concept acted to keep
the Internet scalable while it continued to use the
current set of routing protocols.

> If aggregation is the goal then mechanisms should be developed
> for exchanging CIDR blocks so the address space can be
> re-packed.

It is time for everyone to learn a term that unfortunately
I did not invent: IPv4ever.

NAT and other clever gatewaying effectively provides a
mechanism to extend the address lifetime expectancy not
only of the IPv4 unicast address space in general, but of
any given host in particular.

That is, there are now mechanisms which can hide address
changes from hosts that deal with address changes badly,
while at the same time there is increasingly good software
to assist with renumbering hosts.

There are mechanisms evolving which ultimately should lead
to nearly any given unicast subnet of 0/0 to be perceived by
everything else as having a different number than things
within that subnet believe.  Moreover, there are also
mechanisms evolving which will cause nearly any given unicast
subnet of 0/0 to renumber itself so that all the numbered
entities under that subnet renumber into a different
unicast subnet of 0/0.

This alone should give rise to maximal aggregation, and if
combined with schemes which overload some addresses or
which simply compress sparsely populated large subnets
into densely populated smaller ones, should eliminate a
large percentage of address waste.

In other words, the mechanism(s) you allude to are being
worked on.  I would like to see them applied to the swamp
within the next year or two.

The "IP addresses never change within the lifetime of a
session" and "IP addresses are end-to-end" crowds who have
misthought a number of protocols will probably fight tooth
and nail to see this never happen.  Mind you, they are
mostly the same people who fought tooth and nail against
the idea of renumbering in the first place, so one can
expect roughly the same type of "discussions".

> The /19 policy is archaic. It creates an obstacles and
> only partly resolves the problem. Fixing holes in CIDR
> blocks, exchanging fragmented blocks for contiguous
> blocks, and cleaning up "The Swamp" can do more for the
> stability and size of the routing table.

If you have an implementation of something Sprint and its
competitors who now do precisely the same filtering can
buy that can cause the swamp to be aggregated into a
small handful of prefixes from their point of view, then I
can point you at people who would be happy to sign a
cheque.  

The only real problem I saw in the implementation of the
/19 filter was the bad press generated by people who
refused to listen to registries' warnings that long
prefixes probably would not be globally routable, and
possibly the lack of a tariff which would have allowed
people with money to purchase exceptions in the Sprint
filters.

> BTW - If you use a route server to do the dampening and calculation of peer
> routes you can even make a wimpy CPUed 7000 handle backbone traffic.

The wimpy 7000 still has to receive at least one copy of
the NLRI, and process changes into the forwarding
table(s).

As the number of prefixes increase, even if the level of
"background" noise (the rate at which a large set of
prefixes demonstrate instability that is considerably less
than that which would be prevented even by very aggressive
route dampening) were to remain constant, you require more
CPU even in the simple case of receiving and installing
modified forwarding tables.   In the absence of any
feedback mechanism that holds down the total number of
globally visible prefixes, the increase in CPU
requirements could easily outstrip Moore's law and
overwhelm even state-of-the-art processors in a matter of
time.

Note that the liklihood of keeping up with the economics
of dealing with things which are CPU bound and ill suited
to parallel processing and which grow along the same slope
or on a slightly greater slope than Moore's law is small.

This describes the amount of BGP processing required prior
to the installation of the first prefix-length filters at
Sprint's border routers.

I was always open to suggestions that would accomplish the
same result, and helped push Cisco to develop two of them
(a large cleanup of some of their BGP implementation's
processing and an implementation of something very close
to Curtis Villamizar's route flap dampening algorithm),
however I have yet to see suggested something that
eliminates the need for such filters that is readily
deployable and which will keep the slope of processing
requirements below that of processing capability.

I still am, I belive my successors at Sprint and
like-minded people at other ISPs who implement
prefix-length filtering are too.

Until such a thing emerges, however, I continue to believe
that inbound prefix-length filtering is a good policy that should
be implemented universally.

	Sean.



More information about the NANOG mailing list