sFlow vs netFlow/IPFIX

Peter Phaal peter.phaal at gmail.com
Thu Mar 3 15:26:46 UTC 2016


While it would be nice if the Nexus switches supported ingress
sampling, you can get exactly the same result at the receiving end by
dropping the egress samples. The following sflowtool output shows some
of the metadata contained in the packet sample:

startSample ----------------------
sampleType_tag 0:1
sampleType FLOWSAMPLE
sampleSequenceNo 1022129
sourceId 0:7
meanSkipCount 128
samplePool 130832512
dropEvents 0
inputPort 7
outputPort 10

The two fields of interest are the sourceId (0:7) indicating that this
measurement came from a data source of type ifIndex (0) and that the
ifIndex of the data sources is 7. The inputPort is the ifIndex of the
port that received the packet. In this case because the dataSource
ifIndex and the inputPort ifIndex are the same, this is an ingress
sampled packet. A simple filter along the lines:

if ( sourceId.split(':')[1] != inputPort) return;

would allow your sFlow analyzer to eliminate the unwanted samples. You
could also enable / disable ports on your switches to ensure that each
path is sampled once, but that does limit the types of analysis you
can do with the data. A better approach is to simply add additional
input filters to specify which edge data sources you want to include /
exclude in your traffic accounting application since this would allow
the full sFlow feed to be used for other purposes as well (identifying
traffic on busy links, etc.)

The overhead of enabling sFlow on all ports and all devices is
generally quite small since packets are sampled in hardware and
production sampling rates tend to be in range (1,000 - 50,000) so very
little traffic measurement traffic is actually generated. A more
important consideration is operational complexity. If you have
thousands of switches, designing customized configurations for each
one doesn't make a lot of sense. It's much better if the intelligence
is applied at the collecting end. Taking this approach and including
sensible defaults in the agents can get the sFlow agent configuration
down to something as simple as:

sflow {
  DNSSD = off
  collector {
    ip = 10.0.0.162
  }
}

And you could go even simpler if you use DNS SRV records to identify
the sFlow collector(s)

sflow {
  DNSSD=on
}

These configurations are from Cumulus Linux.

One of the trends in merchant silicon based platforms is inclusions of
the ONIE boot loader. If you don't like the network operating system,
you can install a different operating system to better suite your
requirements without ripping and replacing hardware. There are many
virtually identical switches built around the Broadcom ASICs, giving a
lot of choice in hardware and network operating system vendor.

On Thu, Mar 3, 2016 at 3:53 AM, Nick Hilliard <nick at foobar.org> wrote:
> Peter Phaal wrote:
>> I think "pathologically broken" somewhat overstates the case.
>> Bidirectional sampling is allowed by the sFlow spec and other vendors
>> have made that choice. Another vendor used to implement egress only
>> sampling (also allowed) but unusual. I agree that ingress is the most
>> common and easiest to deal with, but a decent sFlow analyzer should be
>> able to handle all three cases without over / under counting.
>
> Bidirectional sampling doesn't allow you to define an sampling perimeter
> on your switch topology.  This means that if you if you have anything
> other than a trivial topology, you will end up double-counting your
> traffic.  The only way to work around this is to get the collector to
> discard 50% of the samples or otherwise write down the amount of traffic
> by 50%, assuming a standard accounting perimeter configuration.  This is
> broken.
>
> The thing is, this is ridiculously easy to fix in code.  The hooks are
> already there.
>
> Nick



More information about the NANOG mailing list