<br>

Access point movie goes whizzing past very quickly<br>

as Bill Fenner narrates.<br>

Lets you see where people are congregating, and<br>

which talks are more interesting, and when people<br>

migrate out of talks; could feed into the survey<br>

to tell the program comittee which talks are of<br>

more interesting.<br>

<br>

netdisco, collects data from network elements,<br>

plots them, put a front end on it;<br>

<br>

If you opted in, by emailing him you MAC address,<br>

it would render a map with your location on it.<br>

<br>

has RSS feeds of your location as well.<br>

<br>

fenner at <a href="http://research.att.net">research.att.net</a><br>

<br>

<br>

2006.02.15 An Inter-domain Consistency Management Layer<br>

Nate Kushman, MIT<br>

<br>

Steve Feldman, welcome back, Nate Kushman is up first<br>

to talk about routing consistency.<br>

<br>

Transient BGP loops <br>

was with akamai, now at MIT<br>

srikanth kandula, dina katabi, john wroclawski<br>

<br>

Do loops matter?<br>

can we do something about them?<br>

<br>

what is a transient BGP loop?<br>

slide showing loop forming.<br>

<br>

How common are "transient BGP loops"<br>

<br>

Sprint study, IMC 2003, IMW 2002<br>

looked at packet traces from the sprint backbone<br>

up to 90% of the observed packet loss was caused by<br>

 routing loops<br>

60-100% could be attributed to BGP<br>

<br>

Is it true on internet?<br>

<br>

Routing loop damage<br>

<br>

20 fvantage points with BGP feeds<br>

did pings, traceroutes, watch for loops.<br>

<br>

correlated on BGP updates, and ttl exceeded<br>

on ping, traceroute.<br>

<br>

In fact, all loops were within 100seconds of<br>

 BGP updates.<br>

10-15% of all BGP updates caused routing loops!!<br>

<br>

Collateral damage.<br>

they cause impacts on congestions that are part of<br>

the loop, causing loss to non-rerouted networks<br>

from non-rerouted-to source networks.<br>

<br>

traceroute to see which links were part of the<br>

loop, see which other traces shared a link in<br>

common with the loop.<br>

there is a marked increase in packet loss in<br>

the 100second window around the BGP loop.<br>

<br>

Prefixes sharing a loopy link see 19% packet loss<br>

in general.<br>

<br>

What should be done?<br>

We need to prevent forwarding loops.<br>

<br>

A loop occurs because:<br>

one AS pushes a route update to the data plane, but<br>

other ASes are not yet aware of that route change.<br>

<br>

What about telling everyone about the change before<br>

the change actually happens?<br>

<br>

Suspension:<br>

continue to route traffic<br>

tell control system not to propagate the route<br>

FIB stays same for now, RIB doesn't send route.<br>

<br>

downstream networks only update forwarding tables<br>

once upstreams have acknowledged the path change.<br>

<br>

More generally:<br>

we have proven:<br>

 loops are prevented in general case<br>

 convergence properties similar to normal BGP<br>

<a href="http://nsm.lcs.mit.edu/~nkushman/">http://nsm.lcs.mit.edu/~nkushman/</a><br>

incrementally deployable.<br>

<br>

feedback<br>

<br>

Clearly:<br>

 works well for planned maintenace.  We can delay move<br>

 to backup path during those events, at least.<br>

  20% of update events caused by planned maintenance<br>

 Link up events also cause loops, no way to plan for<br>

  them smoothly now.<br>

What about:<br>

 unplanned link down events<br>

 trade-off between loss on current path and collateral damage<br>

<br>

Are we willing to do this in general, to avoid impacting<br>

stable prefixes from unstable prefixes.<br>

<br>

In short: routing loops are a significant performance<br>

 concern.<br>

<br>

Bill Norton--hidden question: what is the time domain<br>

 during which these traffic impacts are seen?  Will<br>

 the propagation path take 10, 20, 30 seconds? <br>

A. one event causes many, many loops rippling out,<br>

 so one update may cause packet loss for many seconds,<br>

 up to tens of seconds total.<br>

Q. you're talking about adding MORE state information<br>

 into the network.  Also adding latency to update<br>

 acknowledgements.<br>

<br>

Jared notes that router software bugs tend to<br>

exacerbate routing loop issues.  You can tune configs<br>

to try to minimize the number of loops seen, as well<br>

as upgrading to "fixed" code to get better results<br>

without more state.<br>

<br>

Patrick Gilmore asks jared, does tuning help internal<br>

sessions or external sessions?  Both, it really controls<br>

*when* the updates are sent out (immediately vs batched,<br>

etc.).  Jared notes the internet is being used<br>

<br>

Someone (Bill?) asks if convergence times are similar to <br>

current model, as the slide claims; is that within<br>

a few seconds? convergence in the lab is similar, yes.<br>

<br>

Matt Petach asks about details of convergence; it<br>

basically puts you at mercy of the slowest, farthest<br>

away router on the network, since it has to get the<br>

message, realize it has nobody to send to, and then<br>

acknowledge back before anyone else can update FIB;<br>

yes, true, so you'd want to put timers in to limit<br>

how long you wait; basically, like "wait 5 seconds,<br>

and either hear an ACK, or go ahead and update FIB"<br>

type timeout, so you don't wait forever for a<br>

non-conformant device on the other side of the world.<br>

<br>

Riverdomain question--with suspension, you're basically<br>

in passive mode, listening but not updating, is that<br>

correct?  Yes, with respect to the links/prefixes in<br>

question.<br>

<br>

<br>

<br>