Scalability issues in the Internet routing system

Thu Oct 27 14:20:47 UTC 2005

Blaine Christian wrote:
>>> I did read your comment on BGP lending itself to SMP.  Can you   
>>> elaborate on where you might have seen this?  It has been a  pretty  
>>> monolithic implementation for as long as I can remember.    In fact,  
>>> that was why I asked the question, to see if anyone had  actually  
>>> observed a functioning multi-processor implementation of  the BGP  
>>> process.
>>>
>> I can make the SMP statement with some authority as I have done the  
>> internal
>> design of the OpenBGPd RDE and my co-worker Claudio has implemented  
>> it.  Given
>> proper locking of the RIB a number of CPU's can crunch on it and  
>> handle neighbor
>> communication indepently of each other.  If you look at Oracle  
>> databases they
>> manage to scale performance with factor 1.9-1.97 per CPU.  There is  
>> no reason
>> to believe we can't do this with the BGP 'database'.
>>
> Neat!  So you were thinking you would leave the actual route  selection 
> process monolithic and create separate processes per peer?   I have seen 
> folks doing something similar with separate MBGP routing  instances.  
> Had not heard of anyone attempting this for a "global"  routing table 
> with separate threads per neighbor (as opposed to per  table).  What do 
> you do if you have one neighbor who wants to send  you all 2M routes 
> though?  I am thinking of route reflectors  specifically but also 
> confederation EIBGP sessions.
 > I think you hit the nail on the head regarding record locking.  This  is
 > the thing that is going to bite you if anything will.  I have  heard
 > none of the usual suspects speak up so I suspect that either  this
 > thread is now being ignored or no one has heard of an  implementation
 > like the one you just described.

In BGP there is no 'global' route (actually path) selection in BGP.
Everything is done per prefix+path.  In the RIB you can just lock the prefix,
insert the new path and recalculate which one wins.  Then issue the update
to the FIB, if any.  Work done.  Statistically there is very little
contention on the prefix and the path records.  For contention two updates
for the same prefix would have to arrive at the same time from two different
peers handled by different CPU's.  I'd guess the SMP scaling factor for BGP
is around 1.98.  The 0.02 go lost for locking overhead and negative caching
effects.  Real serialization happens only at the FIB change queue.  However
serializing queues can be handled very efficiently on SMP too.

-- 
Andre