Did your BGP crash today?

Thomas Mangin thomas.mangin at exa-networks.co.uk
Sun Aug 29 20:12:35 UTC 2010


> It would seem to me that there should actually be a better option, e.g.
> recognizing the malformed update, and simply discarding it (and sending the
> originator an error message) instead of resetting the session.
> 
> Resetting of BGP sessions should only be done in the most dire of
> circumstances, to avoid a widespread instability incident.


I had the same thought before giving up on it. 

Negotiating a new error message could be a per peer option. BGP has capabilities for this exact reason.

However to make sense you would need to find a resynchronisation point to only exclude the one faulty message. Initially I thought that the last received KEEPALIVE (for the receiver of the error message) could do - but you find yourselves with races conditions - so perhaps two KEEPALIVE back ?
Each TCP packet can contain multiple message, so the messages would have to be then split and ACK individually to find the faulty one and then ACK individually. EOR could be used for that purpose.

Still it adds lots of complexity in the conversation - are we not going to introduce bug in that not much used and tested code path as well ?
Unless you have a new "ACK" capability for each message - another idea but  those are clearly a discussions for outside NANOG.

Thomas







More information about the NANOG mailing list