Did your BGP crash today?

Sun Aug 29 05:43:03 UTC 2010

On BB, so top posting. Apologies.

It seems that creating a worst case BGP test suite for all kinds of nastiness (in light of the recent RIPE thing) might not be a bad idea - so that we can all test the implementation ourselves before we deploy new code.

Like all funky attributes, all funky AS SETs... With knobs for 1 to mem exhaust (for long data sets, etc). 

Unless BGP is massively more complicated than I remember, its not a very advanced CS grad project.
I'm thinking a quagga or perl BGP talker would be a good place to start.

Deepak

----- Original Message -----
From: Christopher Morrow <morrowc.lists at gmail.com>
To: Florian Weimer <fw at deneb.enyo.de>
Cc: nanog at nanog.org <nanog at nanog.org>
Sent: Sun Aug 29 01:12:00 2010
Subject: Re: Did your BGP crash today?

On Sat, Aug 28, 2010 at 6:14 AM, Florian Weimer <fw at deneb.enyo.de> wrote:
> * Christopher Morrow:
>
>> (you are asking your vendors to run full bit sweeps of each protocol
>> in a regimented manner checking for all possible edge cases and
>> properly handling them, right?)
>
> The real issue is that both spec and current practice say you need to
> drop the session as soon as you encounter any unexpected data.  That's

sorry, I conflated two things... or didn't mean to but did anyway.

1) users of gear that does BGP really need to ask loudly and longly
(and then go test for themselves) that their BGP speakers do the
'right thing' when faced with oddball scenarios. If someone sends you
a previously unknown attribute... don't corrupt it and pass it on,
pass if transitive, drop if not.

2) some thought and writing and code-changes need to go into how the
bgp-speakers of the world deal with bad-behaving bgp speakers. Is
'send notify and reset' the right answer? is there one 'right answer'
? Should some classes of fugly exchange end with a 'dropped that
update, moved along' and some end with 'pull eject handle!' ?

it's doubtful that 2 can get solved here (nanog, though certainly some
operational thought on the right thing would be great as guidance). i
would hope that 1 can get some traction here (via folks going back to
their vendors and asking: "Did you run the Mu-security/Oolu-univ/etc
fuzzing test suites against this code? can I see the results? I hope
they match the results I'm going to be getting from my folks in
~2wks... or we'll be having a much more structured/loud
conversation..."

another poster had a great point about 'all the world can screw with
you, you have no protections other than trust that the next guy won't
screw you over (inadvertently)'. There are no protections available to
you if someone sets (example) bit 77 in an ipv4 update message to 1
when it should by all accounts be 0. Or (apparently) if they send a
previously unknown attribute on a route :( You can put in max-prefix
limits, as-path limits (length and content), prefix-filters.. but
internal-message-content you are stuck hoping the vendors all followed
the same playbook. With everyone saying together: "Please
appropriately test your implementation for all boundary cases" maybe
we can get to where these happen less often (or nearly never) - every
3 months is a little tedious.

-chris