Did your BGP crash today?
llynch at civil-tongue.net
Fri Aug 27 18:59:45 UTC 2010
sorry - found via google...
On Fri, 27 Aug 2010, Thomas Mangin wrote:
> So much for "better left off public mailing lists" ! sigh !
> On 27 Aug 2010, at 19:42, Lucy Lynch wrote:
>> Dear Colleagues,
>> On Friday 27 August, from 08:41 to 09:08 UTC, the RIPE NCC Routing
>> Information Service (RIS) announced a route with an experimental BGP
>> attribute. During this announcement, some Internet Service Providers
>> reported problems with their networking infrastructure.
>> Immediately after discovering this, we stopped the announcement and
>> started investigating the problem. Our investigation has shown that the
>> problem was likely to have been caused by certain router types
>> incorrectly modifying the experimental attribute and then further
>> announcing the malformed route to their peers. The announcements sent
>> out by the RIS were correct and complied to all standards.
>> The experimental attribute was part of an experiment conducted in
>> collaboration with a group from Duke University. This involved
>> announcing a large (3000 bytes) optional transitive attribute, using a
>> modified version of Quagga. The attribute used type code 99. The data
>> consisted of zeros. We used the prefix 18.104.22.168/24 for this and
>> announced from AS 12654 on AMS-IX, NL-IX and GN-IX to all our peers.
>> Reports from affected ISPs showed that the length of the attribute in
>> the attribute header, as seen by their routers, was not correct. The
>> header stated 233 bytes and the actual data in their samples was 237
>> bytes. This caused some routers to drop the session with the peer that
>> announced the route.
>> We have built a test set-up which is running identical software and
>> configurations to the live set-up. From this set-up, and the BGP packet
>> dumps as made by the RIS, we have determined that the length of the data
>> in the attribute as sent out by the RIS was indeed 3000 bytes and that
>> all lengths recorded in the headers of the BGP updates were correct.
>> Beyond the RIS systems, we can only do limited diagnosis. One possible
>> explanation is that the affected routers did not correctly use the
>> extended length flag on the attribute. This flag is set when the length
>> of the attribute exceeds 255 bytes i.e. when two octets are needed to
>> store the length.
>> It may be that the routers may not add the higher octet of the length to
>> the total length, which would lead, in our test set-up, to a total
>> packet length of 236 bytes. If, in addition, the routers also
>> incorrectly trim the attribute length, the problem could occur as
>> observed. It is worth noting that the difference between the reported
>> 233 and 237 bytes is the size of the flags, type code and length in the
>> We will be further investigating this problem and will report any
>> findings. We regret any inconvenience caused.
>> Kind regards,
>> Erik Romijn
>> Information Services
>> RIPE NCC
>> tech-l mailing list
>> tech-l at ams-ix.net
>> - Lucy
>> On Fri, 27 Aug 2010, Grzegorz Janoszka wrote:
>>> On 27-08-10 19:31, Valdis.Kletnieks at vt.edu wrote:
>>>> On Fri, 27 Aug 2010 19:27:06 +0200, Kasper Adel said:
>>>>> Havent seen a thread on this one so thought i'd start one.
>>>>> Ripe tested a new attribute that crashed the internet, is that true?
>>>> If it in fact "crashed the internet", as opposed to "gave a few buggy routers
>>>> here and there indigestion", you wouldn't be posting to NANOG looking for
>>>> confirmation. :)
>>> Not whole internet, but a part. And the "few buggy routers here and there" were mostly Cisco CRS-1's which didn't understand the new attribute and sent a malformed message to all peers, causing them to close the BGP session.
>>> I think most of the impact was limited to Europe, especially Amsterdam area.
More information about the NANOG