Routing issues to AWS environment.

Job Snijders job at ntt.net
Thu May 9 15:40:48 UTC 2019


Dear Nick,

I sympathize with you plight, network debugging can be quite a test of
character at times.

I am snipping some text as I can't comment on on specific details in
this case, but you do raise two excellent questions which I can maybe
help with.

On Thu, May 09, 2019 at 03:05:43PM +0000, Nick Ellermann wrote:
> Is ignoring AS prepending common?

It is not common, but yes it does happen. Some cloudproviders and CDNs
have broken away from the traditional BGP best path selection and use
SDN controllers to steer traffic. I don't know if in play here or not.

> Given my example issue, what direction would you normally take? 

Your issue reminds me of an issue I encountered some years ago. A member
of the Dutch community reported that seemingly random pairs of IP
addresses could not reach each other across an Internet Exchange fabric.
It drove this person crazy because none of the involved parties could
find anything wrong within their domain. The debugging process was hard
because the person had to ask for pingsweeps, traceroutes, would get
information back without timestamps, didn't have the ability to alter
source and destination ports on packets sent for debugging.
It turned out to be a faulty linecard, that under specific circumstances
would hash traffic into a blackhole. It took WEEKS to find this.

So, I identified a need for a more advanced debugging platform - one
that wouldn't require human-to-human interaction to help operators debug
things, in other words it seemed to make sense to stand up linux shell
servers in lots of networks and share access with each other. This
project is the NLNOG RING and I'd recommend you to participate.

An introduction can be found here
https://www.youtube.com/watch?v=TlElSBBVFLw and a nice use case video is
available here https://www.youtube.com/watch?v=mDIq8xc2QcQ

NTT, Amazon, and many others are part of it, and I assume that you have
SSH access to the problematic destination so I hope you can use tcpdump
there to verify if you can or can't receive packets coming from NLNOG
RING nodes.

You mentioned that altering your announcements (deaggregating,
prepending) resolves the issue, this strongly suggests that something
somewhere is broken and it is a matter of triangulating until you've
find the shortest path that exhibits the problem. Perhaps you can find
something like "Between these two nodes, when I use source port X,
protocol Y, destport Z, traffic doesn't arrive".

    Website: https://ring.nlnog.net/

There also is an IRC channel where people perhaps can help you make the
best use of this tool.

Kind regards,

Job



More information about the NANOG mailing list