link monitoring and BFD in SDN networks
skhuraijam at liveops.com
Thu Jan 22 19:32:10 UTC 2015
We need to separate the context of fast reroute via control plane topology
map vs local link protection with OAM at mac/phy sub-layer and time frames
at which they are relevant.
There are efforts going on at the media level but then there are current
solutions that are media and encapsulation independent which need to be
juxtaposed to the SDN paradigm.
Going back to the original question that Glen posed, it is more a question
on implementation complexity.
The more state machines that are pushed down to the Nodes in SDN network
away from the control plane, the more cost and barriers to entry for OEM
products, inter-op issues etc.
Now looking squarely at BFD, the popular application is bootstrapping BFD
link state to routing topology and peer pathway which may traverse
multiple nodes/switches/media and encapsulations.
BFD is a next hop communication failure detection mechanism which may
itself rely (bootstrap) on routing topology to find alternate paths and is
therefore a larger time frame event than a phy/mac sub layer protection,
and is media/encapsulation independent. And the fact that such a state
change will have a high probability to trigger a topology/network wide
event (if not less need to run BFD) makes it a controller centric state
which it needs to bootstrap its routing services on. Link layer OAM on
the other hand may be a mechanism that protects the BFD event from
Further, BFD will enable faster end to end connectivity
communication/reachability detection than hold down timers allow on
hardware that do not support OAM features. Finally the scale at which BFD
is used is far less than the number of links. I.e if you have a 10K port
network, you are likely using BFD on a few tens maybe (for Datacenters)
and the timescale is typically in the 100s ms which any control plane
software module can handle at large scale and should be run just like any
hello protocols for routing services. Link layer state machines on the
nodes on the other hand operate in the sub 1ms timeframe. It is an
overhead, but an insignificantly small tax.
On 1/21/15, 3:14 PM, "Nitin Sharma" <nitinics at gmail.com> wrote:
>On Wed, Jan 21, 2015 at 12:22 PM, Ronald van der Pol <
>Ronald.vanderPol at rvdp.org> wrote:
>> On Mon, Jan 19, 2015 at 22:55:04 +0000, Dave Bell wrote:
>> > "http://www.rvdp.org/presentations/SC11-SRS-8021ag.pdf"
>> The 802.1ag code used is open source and available on:
>> > Of course if you want fast failover, you need to send packets very
>> > rapidly. Every 250ms is not unreasonable. This is going to cause the
>> > control plane to get very chatty. Typically on high end routers,
>> > processes such as BFD are actually ran on line cards as opposed to on
>> > the routing engine. When a failure is detected this reports up into
>> > the control plane to trigger a reconvergence event. I see no reason
>> > why this couldn't occur using SDN.
>> Exactly. This is something you want to do in hardware, especially
>> if you want to do fast reroute with the OpenFlow group table.
>> Problem is that many 1U OpenFlow switches do not support 802.1ag.
>> We made the propotype mentioned above to show and investigate the
>> benefits of OAM. The closed "open" networking foundation is supposed
>> to be working on this, but I don't know the status because their
>> mailing lists are closed.
>> In SDN/OpenFlow I think a couple of things are needed:
>> - configure 802.1ag on the interfaces (via ofconfig?)
>> - configure OpenFlow paths (e.g. primary and backup) and also create
>> forwarding entries for 802.1ag datagrams along those paths
>> - configure fast reroute with the group table (ofconfig?)
>Fast reroute (in the form of fast failover) is supported in the OF spec
>(1.3+), using Group Tables.
>> By doing this detection and failover are handled in hardware.
>Data plane reachability could be performed in SDN/OpenFlow networks using
>BFD/ Ethernet CFM (802.1ag), Y.1731, preferably on silicon if there is
>support (which i believe every silicon vendor should work on). It would
>be ideal if these OAM frames are forwarded to a central controller. Today
>I think it is done on some form of software layer (ovs, sdks) that reside
>on these OF switches.
More information about the NANOG