Polling Bandwidth as an Aggregate

Fri Jan 20 15:32:20 UTC 2012

In a message written on Fri, Jan 20, 2012 at 12:16:14AM -0600, Jimmy Hess wrote:
> Except Cacti/RRDTOOL is really just a great visualization tool, while you
> can build stacks, it is not something that accurately meters data for
> billing purposes.   The right kind of tool to use would be a netflow or
> network tap-based billing tool,  that  actually meters/samples specific
> datapoints at a specific interval and applies the billing business logic
> for reporting based on sampled data points,  instead of  smoothed averages
> of approximations.

To suggest Netflow is more accurate than rrdtool seems rather strange
to me.   It can be as accurate, but is not the way most people
deploy it.

RRDTool pulls the SNMP counters from an interface and records them to a
file.  With no aggregation, and assuming your device has accurate SNMP,
this should be 100% accurate.  While you are right that the defaults for
RRDTOOL aggregate data (after a day, week, and month, approximately)
those aggregates can be disabled keeping the raw data.  I know several
ISP's that keep the raw data and use it for billing using these tools.

Netflow often suffers right at the source.  If you want to bill off
netflow data 1:1 netflow is almost required, while most ISP's do sampled
Netflow at 1:100 or 1:1000.  Those sampling levels produce more
inaccuracy than RRDTool's aggregation function.  What's more, once the
data is put into the Netflow collector, they all do aggregation as well,
just like RRDTool.  Again, you can disable much of it with careful
configuration.

But let's compare apples to apples.  Let's consider RRDTool configured
to not aggregate with 1:1 netflow configured to not aggregate.  RRDTool
polls a monotonically increasing counter.  Should a poll be missed no
data is lost about the total number of bytes transferred.  Thus you can
bill by the number of bytes transferred with 100% accuracy, even with
missed polls.  If you bill by the bit-rate, you can interpolate a single
missing data point which high accuracy as well.

Netflow is a continuous stream of UDP across the network.  If a UDP
packet is lost between the router and the collector there is no way to
reconstruct that data, and it is lost forever.  Thus any network events
means you won't have the data to bill your customer, and you're pretty
much stuck always underbilling them with the data actually collected.

> If data is not gathered using a mechanism that communicates timestamp to
> the poller, datapoints will still be imprecise, SNMP would be an example
> --  the cacti application may assume the SNMP response is current data, but
> possibly on the actual hardware, the internal MIB on the device was
> actually updated 10 seconds ago,  which means there will be  small spikes
> in traffic rate graphs that do not represent actual spikes in traffic.

Most of the large ISP's I know of moved away from both of the solutions
above to propretary, custom solutions.  They SNMP poll the counters and
store that data in a database with high resolution counters, forever,
never aggregated.  The necessary perl/python/ruby code to do that and
stick it in mysql or postgres is only a few pages long and easy to
audit.

-- 
       Leo Bicknell - bicknell at ufp.org - CCIE 3440
        PGP keys at http://www.ufp.org/~bicknell/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 826 bytes
Desc: not available
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20120120/963ecd27/attachment.sig>