tools and techniques to pinpoint and respond to loss on a path
ikiris at gmail.com
Mon Jul 15 21:38:47 UTC 2013
Personally I would never expect simple routed connectivity across the
public internet to be such a high level of reliability, without at least
diverse path tunnels running route protocols internally.
While any provider will attempt to fix peer / upstream issues as they can,
any SLA you would have is between two points on their private network, not
from point A to point Z that they have no control over across multiple
peers and the public internet itself. The much more common design is using
a single provider for each thread between sites. Then at least you have an
end-to-end SLA in effect, as well as a single entity that is responsible
for the entire link in question.
This sounds like you're trying to achieve private link IGP / FRR level site
to site failover/convergence across the public internet. Perhaps you should
rethink your goals here or your design?
On Mon, Jul 15, 2013 at 4:18 PM, Andy Litzinger <
Andy.Litzinger at theplatform.com> wrote:
> Does anyone have any recommendations on how to pinpoint and react to
> packet loss across the internet? preferably in an automated fashion. For
> detection I'm currently looking at trying smoketrace to run from inside my
> network, but I'd love to be able to run traceroutes from my edge routers
> triggered during periods of loss. I have Juniper MX80s on one end- which
> I'm hopeful I'll be able to cobble together some combo of RPM and event
> scripting to kick off a traceroute. We have Cisco4900Ms on the other end
> and maybe the same thing is possible but I'm not so sure.
> I'd love to hear other suggestions and experience for detection and also
> for options on what I might be able to do when loss is detected on a path.
> In my specific situation I control equipment on both ends of the path that
> I care about with details below.
> we are a hosted service company and we currently have two data centers, DC
> A and DC B. DC A uses juniper MX routers, advertises our own IP space and
> takes full BGP feeds from two providers, ISPs A1 and A2. At DC B we have a
> smaller installation and instead take redundant drops (and IP space) from a
> single provider, ISP B1, who then peers upstream with two providers, B2 and
> We have a fairly consistent bi-directional stream of traffic between DC A
> and DC B. Both of ISP A1 and A2 have good peering with ISP B2 so under
> normal network conditions traffic flows across ISP B1 to B2 and then to
> either ISP A1 or A2
> oversimplified ascii pic showing only the normal best paths:
> -- ISP A1----------------------ISP B2--
> DC A--|
> |--- ISP B1 ----- DC B
> -- ISP A2----------------------ISP B2--
> with increasing frequency we've been experiencing packet loss along the
> path from DC A to DC B. Usually the periods of loss are brief, 30 seconds
> to a minute, but they are total blackouts.
> I'd like to be able to collect enough relevant data to pinpoint the
> trouble spot as much as possible so I can take it to the ISPs and request a
> solution. The blackouts are so quick that it's impossible to log in and
> get a trace- hence the desire to automate it.
> I can provide more details off list if helpful- I'm trying not to vilify
> anyone- especially without copious amounts of data points.
> As a side question, what should my expectation be regarding packet loss
> when sending packets from point A to point B across multiple providers
> across the internet? Is 30 seconds to a minute of blackout between two
> destinations every couple of weeks par for the course? My directly
> connected ISPs offer me an SLA, but what should I reasonably expect from
> them when one of their upstream peers (or a peer of their peers) has
> issues? If this turns out to be BGP reconvergence or similar do I have any
> many thanks,
More information about the NANOG