Did your BGP crash today?

Lucy Lynch llynch at civil-tongue.net
Fri Aug 27 18:42:17 UTC 2010


FYI:

----------------------------------------------------------------------
Dear Colleagues,

On Friday 27 August, from 08:41 to 09:08 UTC, the RIPE NCC Routing
Information Service (RIS) announced a route with an experimental BGP
attribute. During this announcement, some Internet Service Providers
reported problems with their networking infrastructure.

Investigation
--------------

Immediately after discovering this, we stopped the announcement and
started investigating the problem. Our investigation has shown that the
problem was likely to have been caused by certain router types
incorrectly modifying the experimental attribute and then further
announcing the malformed route to their peers. The announcements sent
out by the RIS were correct and complied to all standards.

The experimental attribute was part of an experiment conducted in
collaboration with a group from Duke University. This involved
announcing a large (3000 bytes) optional transitive attribute, using a
modified version of Quagga. The attribute used type code 99. The data
consisted of zeros. We used the prefix 93.175.144.0/24 for this and
announced from AS 12654 on AMS-IX, NL-IX and GN-IX to all our peers.

Reports from affected ISPs showed that the length of the attribute in
the attribute header, as seen by their routers, was not correct. The
header stated 233 bytes and the actual data in their samples was 237
bytes. This caused some routers to drop the session with the peer that
announced the route.

We have built a test set-up which is running identical software and
configurations to the live set-up. From this set-up, and the BGP packet
dumps as made by the RIS, we have determined that the length of the data
in the attribute as sent out by the RIS was indeed 3000 bytes and that
all lengths recorded in the headers of the BGP updates were correct.

Beyond the RIS systems, we can only do limited diagnosis. One possible
explanation is that the affected routers did not correctly use the
extended length flag on the attribute. This flag is set when the length
of the attribute exceeds 255 bytes i.e. when two octets are needed to
store the length.

It may be that the routers may not add the higher octet of the length to
the total length, which would lead, in our test set-up, to a total
packet length of 236 bytes. If, in addition, the routers also
incorrectly trim the attribute length, the problem could occur as
observed. It is worth noting that the difference between the reported
233 and 237 bytes is the size of the flags, type code and length in the
attribute.

We will be further investigating this problem and will report any
findings. We regret any inconvenience caused.

Kind regards,

Erik Romijn

Information Services
RIPE NCC
_______________________________________________
tech-l mailing list
tech-l at ams-ix.net
http://melix.ams-ix.net/mailman/listinfo/tech-l



- Lucy

On Fri, 27 Aug 2010, Grzegorz Janoszka wrote:

> On 27-08-10 19:31, Valdis.Kletnieks at vt.edu wrote:
>> On Fri, 27 Aug 2010 19:27:06 +0200, Kasper Adel said:
>>> Havent seen a thread on this one so thought i'd start one.
>>> 
>>> Ripe tested a new attribute that crashed the internet, is that true?
>> 
>> If it in fact "crashed the internet", as opposed to "gave a few buggy 
>> routers
>> here and there indigestion", you wouldn't be posting to NANOG looking for
>> confirmation. :)
>
> https://www.ams-ix.net/statistics/
>
> Not whole internet, but a part. And the "few buggy routers here and there" 
> were mostly Cisco CRS-1's which didn't understand the new attribute and sent 
> a malformed message to all peers, causing them to close the BGP session.
>
> I think most of the impact was limited to Europe, especially Amsterdam area.
>
>




More information about the NANOG mailing list