Pinging a Device Every Second

Stephen Satchell list at satchell.net
Sat Dec 15 16:48:50 UTC 2018


On 12/15/18 7:48 AM, Colton Conor wrote:
> How much compute and network resources does it take for a NMS to:
> 
> 1. ICMP ping a device every second
> 2. Record these results.
> 3. Report an alarm after so many seconds of missed pings.
> 
> We are looking for a system to in near real-time monitor if an end
> customers router is up or down. SNMP I assume would be too resource
> intensive, so ICMP pings seem like the only logical solution.
> 
> The question is once a second pings too polling on an NMS and a consumer
> grade router? Does it take much network bandwidth and CPU resources from
> both the NMS and CPE side?
> 
> Lets say this is for a 1,000 customer ISP.

What problem are you trying to solve, exactly?  That more than anything
will dictate what you do.

Short answer: about 1500 bits of bandwidth, and the CPU loading on the
remote device is almost invisible.  Remember the only real difference
between ping and SNMP monitoring (UDP) is the organization of the bits
in the packet and the protocol number in the IP header.  It's still one
packet pair exchanged, unless you get really ambitious with your SNMP
OID list.

When I was in a medium-sized hosting company, I developed an SNMP-based
monitoring system that would query a number of load parameters (CPU,
disk, network, overall) on a once a minute schedule, and would keep
history for hours on the monitoring server.  The boss fretted about the
load such monitoring would impose.  He never saw any.

For pure link monitoring, which is what I'm hearing you want to do, in
my experience I found that a six-second ping cycle gives lots of early
warning for link failures.  Again, it depends on the specifications and
detection targets.

Some things to consider:

1.  Router restarts take a while.  Consumer-grade routers can take a
minute or more to complete a restart to the point where it will respond
to ping.  Carrier-grade routers are more variable but in general have so
many options built into them that it takes longer to complete a restart
cycle.  Since you are talking consumer-grade gear, you probably don't
want to be sensitive to CP power sags.

2.  Depending on the technology used on the link, you may get some
short-term outages, on the order of seconds, so doing "rapid" pings do
nothing for you.  During my DSL time, ATM would drop out for short
intervals -- so watch out for nuisance trips.

3.  Some routers implement ping limiting, so you have to balance your
monitoring sample rate against DoS susceptibility. Offhand, I don't know
the granularity of consumer router ping limiting, as I've never had that
question pop up.

4.  How large a monitoring server are you willing to devote to such a
system?  My web host monitoring used a 400-MHz Pentium II box, and it
didn't even breathe hard.  (A 1U Cobalt box, repurposed with Red Had
Linux, pulled from a junk pile.)  I was monitoring about 150 web host
servers. Extraolatuing the system load on that Cobalt box, I could have
handled 1500 web host servers and more.




More information about the NANOG mailing list