Did your BGP crash today?

James Hess mysidia at gmail.com
Mon Aug 30 03:32:38 UTC 2010

On Sun, Aug 29, 2010 at 3:12 PM, Thomas Mangin
<thomas.mangin at exa-networks.co.uk> wrote:
> However to make sense you would need to find a resynchronisation point to only exclude the one faulty message. Initially I thought that the last received KEEPALIVE (for the receiver of the error message) could do - but you find yourselves with races conditions - so perhaps two KEEPALIVE back ?
> Each TCP packet can contain multiple message, so the messages would have to be then split and ACK individually to find the faulty one and then ACK individually. EOR could be used for that purpose.

Every BGP message header has a portion that starts with  16
all-bits-1  octets,  for compatibility.
This is distinctive enough an implementation can guess where the next
message starts.
However,  suppose you have an attacker.. if for example, a BGP speaker
passes on too short a length value for an attribute...
and  the attacker knows what length will be sent instead of the right one.

Places an entry into the Data portion,  that will  appear to the other
peer to be
 "the rest"  of the malformed update,  Result: the "malformed"  update
 is received and appears to be perfectly valid.
The next thing the attacker inserts into the data portion of the
attribute is the  16  all-bits-1 octets, BGP header, update message,
and their malicious update.

This will appear properly formed, when the buggy BGP speaker sends it.
As far as the buggy BGP speaker is concerned,  it has propagated 1 route update.

As far as  the buggy BGP speaker's other peers  are concerned,  they
have received  3 messages from the buggy speaker.
* The update  "completed"  in the attribute data section.   (This is
"malformed",  but  intentionally not detectable as malformed)
* The maliciously injected route.    (This isn't supposed to exist.
The buggy speaker is unaware of its existence,   there is a
disagreement between peers about how the message is interpreted)
* A malformed message that does not make any sense.

If the injection were perfect,  nothing would be detectable as malformed.
But alas, the attacker does not know exactly what other attributes or
prepending buggy router will add to the message before passing it on.
They could work this out through trial and error, however,  some admin
will hopefully notice all the CEASEs, before the attacker achieved
complete success.

In this case, by the time   the other speakers detect  something as
malformed,  the two preceding updates are already in the table,   and
possibly even propagated further.
A "CEASE"  rolls this back, by rolling back the entire session.

Peers could  (perhaps) safely re-synchronize in this case is  if there
was an extension to  partially roll back some of the updates
in a session and request a portion of the messages to be resent.

Or if an extension such as authentication is used to make it
impossible to inject BGP messages within the value of an attribute.
Through data quarantine:    requiring all BGP speakers to disallow the
 all-bits-1 sequence in any attribute value.

Or  through peer-specific authentication mechanisms, or  checksums and
digital signature, in the message header portion of each BGP message.


More information about the NANOG mailing list