Did your BGP crash today?

Thomas Mangin thomas.mangin at exa-networks.co.uk
Fri Aug 27 18:56:35 UTC 2010

So much for "better left off public mailing lists" ! sigh !


On 27 Aug 2010, at 19:42, Lucy Lynch wrote:

> FYI:
> ----------------------------------------------------------------------
> Dear Colleagues,
> On Friday 27 August, from 08:41 to 09:08 UTC, the RIPE NCC Routing
> Information Service (RIS) announced a route with an experimental BGP
> attribute. During this announcement, some Internet Service Providers
> reported problems with their networking infrastructure.
> Investigation
> --------------
> Immediately after discovering this, we stopped the announcement and
> started investigating the problem. Our investigation has shown that the
> problem was likely to have been caused by certain router types
> incorrectly modifying the experimental attribute and then further
> announcing the malformed route to their peers. The announcements sent
> out by the RIS were correct and complied to all standards.
> The experimental attribute was part of an experiment conducted in
> collaboration with a group from Duke University. This involved
> announcing a large (3000 bytes) optional transitive attribute, using a
> modified version of Quagga. The attribute used type code 99. The data
> consisted of zeros. We used the prefix for this and
> announced from AS 12654 on AMS-IX, NL-IX and GN-IX to all our peers.
> Reports from affected ISPs showed that the length of the attribute in
> the attribute header, as seen by their routers, was not correct. The
> header stated 233 bytes and the actual data in their samples was 237
> bytes. This caused some routers to drop the session with the peer that
> announced the route.
> We have built a test set-up which is running identical software and
> configurations to the live set-up. From this set-up, and the BGP packet
> dumps as made by the RIS, we have determined that the length of the data
> in the attribute as sent out by the RIS was indeed 3000 bytes and that
> all lengths recorded in the headers of the BGP updates were correct.
> Beyond the RIS systems, we can only do limited diagnosis. One possible
> explanation is that the affected routers did not correctly use the
> extended length flag on the attribute. This flag is set when the length
> of the attribute exceeds 255 bytes i.e. when two octets are needed to
> store the length.
> It may be that the routers may not add the higher octet of the length to
> the total length, which would lead, in our test set-up, to a total
> packet length of 236 bytes. If, in addition, the routers also
> incorrectly trim the attribute length, the problem could occur as
> observed. It is worth noting that the difference between the reported
> 233 and 237 bytes is the size of the flags, type code and length in the
> attribute.
> We will be further investigating this problem and will report any
> findings. We regret any inconvenience caused.
> Kind regards,
> Erik Romijn
> Information Services
> _______________________________________________
> tech-l mailing list
> tech-l at ams-ix.net
> http://melix.ams-ix.net/mailman/listinfo/tech-l
> - Lucy
> On Fri, 27 Aug 2010, Grzegorz Janoszka wrote:
>> On 27-08-10 19:31, Valdis.Kletnieks at vt.edu wrote:
>>> On Fri, 27 Aug 2010 19:27:06 +0200, Kasper Adel said:
>>>> Havent seen a thread on this one so thought i'd start one.
>>>> Ripe tested a new attribute that crashed the internet, is that true?
>>> If it in fact "crashed the internet", as opposed to "gave a few buggy routers
>>> here and there indigestion", you wouldn't be posting to NANOG looking for
>>> confirmation. :)
>> https://www.ams-ix.net/statistics/
>> Not whole internet, but a part. And the "few buggy routers here and there" were mostly Cisco CRS-1's which didn't understand the new attribute and sent a malformed message to all peers, causing them to close the BGP session.
>> I think most of the impact was limited to Europe, especially Amsterdam area.

More information about the NANOG mailing list