Resilience: faults, causes, statistics, open issues
David Andersen
dga at lcs.mit.edu
Fri Jan 28 18:43:51 UTC 2005
On Jan 28, 2005, at 5:30 AM, András Császár (IJ/ETH) wrote:
>
> Just some comments about the root causes of BGP related problems,
> maybe you find something useful from the research perspective,
> although probably this is not going to be new for you.
>
> I found a few author groups with very related and useful papers:
>
> - Tim Griffin and co.
> - Nick Feamster and co.
> - Jennifer Rexford and co.
> - Lixin Gao and co.
Yup. That particular group you mentioned has a lot of interplay.
> These people often have joint publications but sometimes separate as
> well. Also, Craig Labovitz and co have some very useful papers in the
> area of routing convergence time.
Yes. There's also Morley Mao's convergence work.
>
>
> As I see things now, in case of BGP, routing divergence, configuration
> and policies have a very strong correlation.
>
> A high level conclusion (what you probably can expect from half year
> paper- and presentation-reading research) is that the first root cause
> of BGP problems is the absence of a >>widely deployed and practical<<
> formal language for policies. Since there is no formal language, there
> is
> no compiler, and so you have unwanted anomalies resulting from your
> config.
In a sense. I think that this is one of the root causes, but it's
perhaps not the only one. I think we can group it into two areas:
a) Fundamental BGP problems
(e.g., the convergence/flap damping issues, etc.). By
"fundamental" I don't mean uncorrectable - I simply mean that they're
"features" of the protocol as it exists today. Some may be fundamental
trade-offs in global routing; I don't know.
b) The abovementioned policy issue
Some of the issues in (a) can be corrected through (b) - for example,
the Gao/Rexford examination of what policies can be permitted if you
want to ensure stable routing. Given that BGP is a strongly
policy-driven beast, many, many of its problems do arise from this.
> So, in the end, although we can possibly identify the root causes
> behind BGP problems, I'm not sure they can ever be fully ceased. OK, I
> can imagine a formal language and config compiler, and one can find
> verification tools as well, but I can hardly imagine e.g. the sharing
> of policies (although some papers write about methods how to infer the
> necessary knowledge from measurements).
Agreed. I think we'll make steps, though, and I think that groups of
collaborating providers can probably implement some of the solutions
between themselves in ways that make sense.
> p.s. Sorry for the long mail :) :)
No worries - quite interesting. (to me, at least!)
-Dave
More information about the NANOG
mailing list