Cogent service

Sat Sep 21 01:40:56 UTC 2002

On Fri, 20 Sep 2002, Stephen Stuart wrote:

> Regarding CPU cycles for route lookups:
> 
> Back in the mid-1990s, route lookups were expensive. There was a lot
> of hand-wringing, in fact, about how doing route lookups at every hop
> in larger and larger FIBs had a negative impact on end-to-end
> performance. One of the first reasons for the existence of MPLS was to
> solve this problem.

This was a totally bogus reason from the very beginning. Given that real
backbones carry no prefixes longer than 24 bits the "long" lookup in
16-4-4-4-4 bit radix tree takes 5 memory reads.  Given that a lot of
routes are aggregated, and the ubiquity of fast data caches (which can
safely hold 3 top levels of full FIB) the average number of memory reads
needed by a general-purpose CPUs available in mid-90s to do route lookup
is (surprise) - about 1.2

In fact, full-featured IP forwarding (including Fair Queueing and packet
classification) at 120 kpps was demonstrated using a 133MHz Pentium MMX.  
Did beat crap out of 7000's SSE.  Wasn't that hard, too; after all it is
more than 1000 CPU cycles per packet.  The value proposition of ASIC-based
packet routing (and use of CAMs) was always quite dicey.

For arithmetically inclined, I'd offer to do the same calculation for P-4
running at 2 Ghz, assuming 1000 cycles per packet, and average packet size
of 200 bytes.  Then estimate cost of that hardware and spend some time
wondering about prices of OC-192 cards and about virtues of duopoly :)  
Of course, doing "line rate" forwarding of no-payload Christmas-tree
packets is neat, but does anyone really need it?

> In the case of modern hardware, forwarding delay due to route lookup
> times is probably not a contributing factor to latency.

In the real life (as opposed to benchmark tests) nobody cares about 
forwarding delay.  Even with dumb implementations it is 2 orders of 
magnitude smaller than light of speed delays in long-haul circuits (and on 
short hops end-host performance is limited by bus speed anyway).

> "More hops" can mean many other things in terms of delay, though - in
> both the "more delay" and "less delay" directions, and countering the
> "more hops means more delay" perception will (I think) always be a
> challenge.

"More hops" means more queues, i.e. potential congestion points.  With max
queue size selected to be RTT*BW, the maximal packet latency is,
therefore, Nhops*RTT.  In a steady mode, with all network uniformly loaded
the mean latency is, therefore, (Nhops*MeanQueueLength + 1)*RTT.  (MQL is
from 0 (empty queue) to 1 (full queue)). In the real networks, queueing
delays tend to contribute mostly to delay variance, not to the mean delay
(in an underloaded network the average queue size is < 1 packet).

In other words, a large number of hops makes life a lot less pleasant for 
interactive applications; file transfers generally don't care.

--vadim