Did your BGP crash today?

Gary Buhrmaster gary.buhrmaster at gmail.com
Mon Aug 30 19:39:16 UTC 2010

On Mon, Aug 30, 2010 at 15:55, Jack Bates <jbates at brightok.net> wrote:
> As good a place to break in on the thread as any, I guess. Randy and others
> believe more testing should have been done. I'm not completely sure they
> didn't test against XR. They very likely could have tested in a 1 on 1
> connection and everything looked fine.
> I don't know the full details, but at what point did the corruption appear,
> and was it visible? We know that it was corrupt on the output which caused
> peer resets, but was it necessarily visible in the router itself?
> Do we require a researcher to setup a chain of every vender BGP speaker in
> every possible configuration and order to verify a bug doesn't cause things
> to break? In this case, one very likely would need an XR receiving and
> transmitting updates to detect the failure, so no less than 3 routers with
> the XR in the middle.
> What about individual configurations? Perhaps the update is received and
> altered by one vendor due to specific configurations, sent to the next
> vendor, accepted and altered (due to the first alteration, where as it
> wouldn't be altered if the original update had been received) which causes
> the next vendor to reset. Then we add to this that it may pass silently
> through several middle vendor routers without problems and we realize the
> scope of such problems and why connecting to the Internet is so
> unpredictable.

I am not aware that anyone has provided the complete details at
this point which would include any test plans that may have been
performed.  From what I have been able to discern, it does seem
likely that a test plan that would have caught this almost had to
know of the specific issue in advance.  More testing would have
been better, but there is just too much variability out there to
assure you can do a complete test.

I am also not aware that the introduction of the attribute was
announced to the usual operational lists in advance "just in
case" (Ok, in this case, I mean NANOG).  This, is my mind,
 is actually the bigger faux pas.  An "Oh S***" moment has
happened to most of us.  It probably will happen again to
many of us.  But letting people know in advance of scheduled
changes is the important thing.

I would hope that in the future researchers will commit to
test plans to (at least) all the major vendor BGP speakers
(which, I admit, would likely not have caught this issue),
and that before introducing such "new" attributes into the
"Internet", they would announce it to the usual operational
lists, again, "just in case".  But my hopes are often dashed.


More information about the NANOG mailing list