freedman at freedman.net
Tue Oct 27 03:08:00 UTC 2015
> I had an idea to create a product where we would have a host on every EyeBall network. Customers could then connect to these hosts and check connectivity back to their network. For instance you may want to see what the speed is like from CableVision in central NJ to your network in South Florida or the latency etc. I go large scale I wanted to know how much demand there was for such a service.
Another approach to take is to enable monitoring of your infrastructure,
and then do active tests on top to web servers and other end points.
Passive instrumentation gives you the even bigger advantage of giving
you insight into issues actually affecting your users' traffic.
Just did a talk about this at NANOG 65:
If you set up a tap or SPAN and grab a box with Intel (or many other kinds
of NICs), you can use PF_RING and nprobe to monitor at 100gig+ speeds.
For nprobe in particular as an "agent", some of the extended/augmented
data you can get via NetFlow includes:
[NFv9 57595][IPFIX 35632.123] %CLIENT_NW_DELAY_MS Network latency client <-> nprobe (msec)
[NFv9 57596][IPFIX 35632.124] %SERVER_NW_DELAY_MS Network latency nprobe <-> server (residual msec)
[NFv9 57597][IPFIX 35632.125] %APPL_LATENCY_MS Application latency (msec)
[NFv9 57581][IPFIX 35632.109] %RETRANSMITTED_IN_PKTS Number of retransmitted TCP flow packets (src->dst)
[NFv9 57582][IPFIX 35632.110] %RETRANSMITTED_OUT_PKTS Number of retransmitted TCP flow packets (dst->src)
[NFv9 57583][IPFIX 35632.111] %OOORDER_IN_PKTS Number of out of order TCP flow packets (dst->src)
[NFv9 57584][IPFIX 35632.112] %OOORDER_OUT_PKTS Number of out of order TCP flow packets (dst->src)
[NFv9 57585][IPFIX 35632.113] %UNTUNNELED_PROTOCOL Untunneled IP protocol byte
The NANOG PPT shows an example of some of the slicing and dicing
you can then do (focused around retransmitted TCP packets, which
is what most of our customers are interested in focusing on as a
simple proxy metric for 'network performance'). Not soliciting
flames on what the magic metrics should be - store them all and
use the ones that best correlate for you :)
Luca/ntop are actively working on nprobe, so I'm sure you could
get him to add throughput and other metrics as ell.
The same approach should work with Cisco AVC on ASRs, though it's
something we're just starting to test and may only work with
specific sets of filters (vs blanket apply to 40gig of traffic
through an ASR).
Definitely curious if anyone in the NANOG community has tried AVC?
Or any other switch/router-layer performance instrumentation?
We've been interested in putting an agent on some of the Linux white
box switches, but the Broadcom chips in the current gens don't
allow 'flow sampling' - getting all headers or none for a flow,
for a % of flows matching a profile. And that's needed to do
retransmit/OOO/latency tracking (vs just seeing samples of packets
Again, pointers to switches that have that capability and can run
*nix apps would be appreciated :)
avi at kentik dot com
More information about the NANOG