bfd-like mechanism for LANPHY connections between providers

Sudeep Khuraijam skhuraijam at liveops.com
Thu Mar 17 05:33:39 UTC 2011


On Mar 16, 2011, at 6:05 PM, Jeff Wheeler wrote:

>>There a difference of several orders of magnitude  between BFD keepalive intervals  (in ms) and BGP (in seconds) with generally configurable multipliers vs. >>hold  timer.
>>With Real time media and ever faster last miles, BGP hold timer may find itself inadequate, if not in appropriate in some cases."

>For eBGP peerings, your router must re-converge to a good state in < 9
>seconds to see an order of magnitude improvement in time-to-repair.
>This is typically not the case for transit/customer sessions."



Not so, if your goal is peer deactivation and failover.    Also you miss the point.   Once the event is detected the rest of the process starts.  I am talking about
event detection.    One may  want longer than a  30 second hold-timer but  peer state deactivated instantly on link failure.  If thats the design goal AND link state is not passed through, then
   BFD BGP deactivation is a good choice.

>To make a risk/reward choice that is actually based in reality, you
>need to understand your total time to re-converge to a good state, and
>how much of that is BGP hold-time.  You should then consider whether
>changing BGP timers (with its own set of disadvantages) is more or
>less practical than using BFD.



Yes I see that and  I mentioned  "in some cases" not all or most cases.


>Let's put it another way: if CPU/FIB convergence time were not a
>significant issue, do you think vendors would be working to optimize

  This goes orthogonal to my point.  The Table size taxes, best path algorithms and the speed with
  which you can re-FIB  &rewrite the ASICs are constant in both the cases.  But thats post event.
>this process, that we would have concepts like MPLS FRR and PIC, and

Those are out of scope in the context of this thread and have completely different roles.

>that each new router product line upgrade comes with a yet-faster CPU?


For things they can sell more licenses for such as 3DES,  keying algorithms , virtual instances, other things on BGP, stuff that allow service providers to charge a lot more money
while running on common infrastructure such as MPLS  & FRR and zillion other things like stateful redundancy, higher housekeeping needs, inservice upgrades and anything else with a list price.   And its cheaper than the old cpu.

>Of course not.  Vendors would just have said, "hey, let's get
>together on a lower hold time for BGP."


Because it would be horrible code design.  Link detection is a common service.  Besides BGP process threads can run longer than min intervals for link.  Vendors would have to write checkpoints within BGP
   code to come up and service link state machine.   And wait its a user configurable checkpoint!!   So came BFD.  Write a simple state machine and make it available to all protocols.


>As I stated, I'll change my opinion of BFD when implementations
>improve.  I understand the risk/reward situation.  You don't seem to
>get this, and as a result, your overly-simplistic view is that "BGP
>takes seconds" and "BFD takes milliseconds."

 I have no doubt that you understand your risk/reward but you don't for every other environments.

For event detection leading to a state change leading to peer deactivation,  "my overly-simplistic view"  is the fact ( not as you put it, but as it was written unedited).  How you want to act in response is dependent on design.
>is that "BGP
>takes seconds" and "BFD takes milliseconds."

Thats what you read not what I wrote.   I was comparing the speed of event detection.

Now like I said for speed of deactivation  "BGP hold timer may find itself inadequate, if not in appropriate in some cases" in this same context.  But as I mentioned , we don't know the pain we are trying to solve for the requirements thats drove this thread in the first place.  So I simply put the facts and a business driver.


   BFD is no different than deactivating a peer based on link failure.  Your view is that there is no case for it.  My point is - it arrived yesterday,  its just a damn hard thing to monetize upstream in transit.


>>For a provider to require a vendor instead of RFC compliance is sinful.

>Many sins are more practical than the alternatives.
Few maybe.


--
Jeff S Wheeler <jsw at inconcepts.biz<mailto:jsw at inconcepts.biz><mailto:jsw at inconcepts.biz<mailto:jsw at inconcepts.biz>>>
Sr Network Operator  /  Innovative Network Concepts













More information about the NANOG mailing list