Juniper BGP Convergence Time

Adam Kajtar akajtar at wadsworthcity.org
Wed May 30 19:49:53 UTC 2018


“I'm running two Juniper MX104s. Each MX has 1 ISP connected running
BGP(full routes). iBGP is running between the routers via a two port 20G
lag. When one of the ISPs fails, it can take upwards of 2 minutes for
traffic to start flowing correctly. The router has the correct route in the
routing table, but it doesn't install it in the forwarding table for the
full two mins.”



I finished my testing and concluded that I would continue running full
routes without any fanciness. I will detail some tests and what the
outcomes were as well as explain why I decided to keep running full routes.



*Receiving Full Routes*

Convergence time was 180 seconds. The routing table updated and showed the
correct path in under a minute but the forwarding table took 180 seconds
for most the routes to update.



*BGP Multipath*

There was no effect on convergence speed. I think paths between eBGP
neighbors are preferred over iBGP. Therefore, no routes are ever equal in
this case.



*BFD*

The slower to converge ISP refused my request to setup BFD between our
routers. This option is out of the question.



*BGP Timers*

I adjusted the BGP hold timer to 30 seconds and the stale route timer to 5
seconds. This change appeared to have no effect on convergence speed.



*Receiving Full Routes with a Default*

I suspected receiving a default route would fix the issue because the only
route that would need to be updated in the forwarding table for traffic to
flow. I assumed that it would process the lowest binary route first(
0.0.0.0/0) Once the full table was updated traffic would take the optimal
path(This would avoid customer complaints due to latency with VPNs and
Voice traffic). I also suspected exporting the default BGP default route
into OSPF would speed up OSPF convergences avoiding a generated default
route based on neighbor state.



Unfortunately, it appears like the forwarding table of the MX104 converges
abruptly instead of slowly as router processes them. Also, Traffic would
fail as the ISP connection came back up due to BGP exporting the route into
OSPF.



*Receiving Full Routes with forwarding engine commands*

After I completed the above tests, I concluded the forwarding engine would
need to speed up, and some sort of hack was in order. I tested the
following commands.



https://www.juniper.net/documentation/en_US/junos/topics/concept/use-case-for-bgp-pic-for-inet-inet6-lu.html



https://www.juniper.net/documentation/en_US/junos/topics/topic-map/forwarding-indirect-next-hop.html



With these commands enabled equal cost routes installed into the forwarding
table. Failover on equal cost routes was 40 – 50 seconds and 180 seconds on
non-equal-cost routes. This was unacceptable because most of the routes are
preferred out one ISP over the other.



I disabled ECMP and the router began installing all routes into the
forwarding table including the secondary route. The router would dump
sections of the forwarding table and act very flakey.





*Receiving Default Only*

I tested filtering out all routes besides the default route. The speed of
convergence was 30 - 45 seconds depending on which upstream ISP connection
I disconnected. This solution was unacceptable due to the traffic not
taking the optimal path outbound.



I concluded that 180 seconds was an acceptable failover time given that I
exhausted all other resources. I would prefer to have a more reliable
failover mechanism than a faster one. Also, everyday speed and usability
are more important that failover speed(which rarely happens and almost
never during peak hours) in my use case.



Thank you to anyone who gave me suggestions on this issue. It helped me
understand and accept the outcome.












On Sat, May 26, 2018 at 12:15 PM Baldur Norddahl <baldur.norddahl at gmail.com>
wrote:

> Add a static default route on both routers. This will be invalidated as
> soon the interface goes down. Should be faster than relying on the BGP
> process on withdrawing the route. Also does not require any config changes
> at your upstreams.
>
> Regards
> Baldur
>
>
> ons. 16. maj 2018 18.52 skrev Adam Kajtar <akajtar at wadsworthcity.org>:
>
> > Erich,
> >
> > Good Idea. I can't believe I didn't think of that earlier. Simple and
> > effective. I will go ahead and request the defaults from my ISP and
> update
> > the thread of the findings.
> >
> > Thanks!
> >
> > On Wed, May 16, 2018 at 10:03 AM Kaiser, Erich <erich at gotfusion.net>
> > wrote:
> >
> > > A last resort route (default route) could still be good to take from
> your
> > > ISP(s) even if you still do full routes, as the propagation is
> happening
> > on
> > > the internet side, you should at least have a path inbound through the
> > > other provider.  The default route at least would send the traffic out
> if
> > > it does not see the route locally.  Just an idea.
> > >
> > >
> > >
> > > On Wed, May 16, 2018 at 8:22 AM, Adam Kajtar <
> akajtar at wadsworthcity.org>
> > > wrote:
> > >
> > > > I could use static routes but I noticed since I moved to full routes
> I
> > > > have had a lot fewer customer complaints about latency(especially
> when
> > it
> > > > comes to Voice and VPN traffic).
> > > >
> > > > I wasn't using per-packet load balancing. I believe juniper default
> is
> > > per
> > > > IP.
> > > >
> > > > My timers are as follows
> > > >  Active Holdtime: 90
> > > >  Keepalive Interval: 30
> > > >
> > > > Would I be correct in thinking I need to contact my ISP to lower
> these
> > > > values?
> > > >
> > > > An interesting note is when I had both ISPs connected into a single
> > MX104
> > > > the failover was just a few seconds.
> > > >
> > > > Thanks again.
> > > >
> > > >
> > > >
> > > > On Tue, May 15, 2018 at 8:42 PM Ben Cannon <ben at 6by7.net> wrote:
> > > >
> > > >> Have you checked your timeouts ?
> > > >>
> > > >> -Ben
> > > >>
> > > >> > On May 15, 2018, at 4:09 PM, Kaiser, Erich <erich at gotfusion.net>
> > > wrote:
> > > >> >
> > > >> > Do you need full routes?  What about just a default route from
> BGP?
> > > >> >
> > > >> > Erich Kaiser
> > > >> > The Fusion Network
> > > >> > erich at gotfusion.net
> > > >> > Office: 815-570-3101
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >> On Tue, May 15, 2018 at 5:38 PM, Aaron Gould <aaron1 at gvtc.com>
> > > wrote:
> > > >> >>
> > > >> >> You sure it doesn't have something to do with 60 seconds * 3 =
> 180
> > > >> secs of
> > > >> >> BGP neighbor Time out before it believes neighbor is dead and
> > remove
> > > >> routes
> > > >> >> to that neighbor?
> > > >> >>
> > > >> >> Aaron
> > > >> >>
> > > >> >>> On May 15, 2018, at 9:10 AM, Adam Kajtar <
> > akajtar at wadsworthcity.org
> > > >
> > > >> >> wrote:
> > > >> >>>
> > > >> >>> Hello:
> > > >> >>>
> > > >> >>> I'm running two Juniper MX104s. Each MX has 1 ISP connected
> > running
> > > >> >>> BGP(full routes). iBGP is running between the routers via a two
> > port
> > > >> 20G
> > > >> >>> lag. When one of the ISPs fails, it can take upwards of 2
> minutes
> > > for
> > > >> >>> traffic to start flowing correctly. The router has the correct
> > route
> > > >> in
> > > >> >> the
> > > >> >>> routing table, but it doesn't install it in the forwarding table
> > for
> > > >> the
> > > >> >>> full two mins.
> > > >> >>>
> > > >> >>> I have a few questions if anyone could answer them.
> > > >> >>>
> > > >> >>>  - What would a usual convergence time be for this setup?
> > > >> >>>  - Is there anything I could do speed this process up? (I tried
> > > >> >> Multipath)
> > > >> >>>  - Any tips and tricks would be much appreciated
> > > >> >>>
> > > >> >>> Thanks in Advance
> > > >> >>> --
> > > >> >>> Adam Kajtar
> > > >> >>> Systems Administrator
> > > >> >>> City of Wadsworth
> > > >> >>> akajtar at wadsworthcity.org
> > > >> >>> -----------------------------------------------------
> > > >> >>> http://www.wadsworthcity.com
> > > >> >>>
> > > >> >>> Facebook <http://www.facebook.com/cityofwadsworth>* |* Twitter
> > > >> >>> <https://twitter.com/CityOfWadsworth> *|* Instagram
> > > >> >>> <https://www.instagram.com/cityofwadsworth/> *|* YouTube
> > > >> >>> <https://www.youtube.com/channel/UCymlH-AZgvxTaHtgp3-AmDQ>
> > > >> >>
> > > >> >>
> > > >>
> > > >
> > > >
> > > > --
> > > > Adam Kajtar
> > > > Systems Administrator, Safety Services
> > > > City of Wadsworth
> > > > Office 330.335.2865
> > > > Cell 330.485.6510
> > > > akajtar at wadsworthcity.org
> > > > -----------------------------------------------------
> > > > http://www.wadsworthcity.com
> > > >
> > > > Facebook <http://www.facebook.com/cityofwadsworth>* |* Twitter
> > > > <https://twitter.com/CityOfWadsworth> *|* Instagram
> > > > <https://www.instagram.com/cityofwadsworth/> *|* YouTube
> > > > <https://www.youtube.com/channel/UCymlH-AZgvxTaHtgp3-AmDQ>
> > > >
> > >
> >
> >
> > --
> > Adam Kajtar
> > Systems Administrator, Safety Services
> > City of Wadsworth
> > Office 330.335.2865
> > Cell 330.485.6510
> > akajtar at wadsworthcity.org
> > -----------------------------------------------------
> > http://www.wadsworthcity.com
> >
> > Facebook <http://www.facebook.com/cityofwadsworth>* |* Twitter
> > <https://twitter.com/CityOfWadsworth> *|* Instagram
> > <https://www.instagram.com/cityofwadsworth/> *|* YouTube
> > <https://www.youtube.com/channel/UCymlH-AZgvxTaHtgp3-AmDQ>
> >
>


-- 
Adam Kajtar
Systems Administrator, Safety Services
City of Wadsworth
Office 330.335.2865
Cell 330.485.6510
akajtar at wadsworthcity.org
-----------------------------------------------------
http://www.wadsworthcity.com

Facebook <http://www.facebook.com/cityofwadsworth>* |* Twitter
<https://twitter.com/CityOfWadsworth> *|* Instagram
<https://www.instagram.com/cityofwadsworth/> *|* YouTube
<https://www.youtube.com/channel/UCymlH-AZgvxTaHtgp3-AmDQ>



More information about the NANOG mailing list