Pinging a Device Every Second

Sun Dec 16 01:21:13 UTC 2018

Hi

Customers do not usually complain about 2 minutes of downtime unless it is
a repeating event. We will therefore offer such customers to put their line
on monitor mode, which means we will add them to smokeping. You could also
start the ping once a second thing, which would be no problem if it is only
a few customers on monitor mode.

However 2 minutes of downtime is a symptom of bad wifi more often than the
internet connection.

Regards,

Baldur

On Sat, Dec 15, 2018 at 7:33 PM Colton Conor <colton.conor at gmail.com> wrote:

> The problem I am trying to solve is to accurately be able to tell a
> customer if their home internet connection was up or down.  Example,
> customer calls in and says my internet was down for 2 minutes yesterday. We
> need to be able to verify that their internet connection was indeed down.
> Right now we have no easy way to do this.  Getting metrics like packet loss
> and jitter would be great too, though I realize ICMP data path does not
> always equal customer experience as many network device prioritize ICMP
> traffic. However ICMP pings over the internet do usually accurately tell if
> a customers modem is indeed online or not.
>
> Most devices out in the field like ONT's and DSL modems do not support
> SNMP but rather use TR-069 for management. Most of these devices only check
> into the TR-069 ACS server once a day.
> If the consumer device does support SNMP, they usually have weak broadcom
> or qualcom SoC processors, outdated linux kernel embedded operating
> systems, limited ram, and storage. Most of these can't handle SNMP walks
> every minute let alone every 5. We are talking about sub $100 routers here
> not Juniper, Cisco, Arista, etc.
>
> Most all of these consumer devices are connected to an carrier aggregation
> device like a DSLAM, OLT, ethernet switch, or wireless access point. These
> access devices do support SNMP, but most manufactures recommend only 5
> minute SNMP poling, so a 2 minute outage would not easily be detected. Plus
> its hard to correlate that consumer X is on port Y on access switch, and
> get that right for a tier 1 CSR.
>
> The only two ways I think I can accomplish this is:
> 1. ICMP pings to a device every so many seconds. Almost every device
> supports responding to WAN ICMP pings.
> or
> 2. IPFIX sampling at core router, and then drilling down by customer IP. I
> think this will tell me if any data was flowing to this customers IP on a
> second by second basis, but won't necessarily give us an up or down
> indicator. Requires nothing from the consumer's router.
>
>
>
>
>
> On Sat, Dec 15, 2018 at 10:51 AM Stephen Satchell <list at satchell.net>
> wrote:
>
>> On 12/15/18 7:48 AM, Colton Conor wrote:
>> > How much compute and network resources does it take for a NMS to:
>> >
>> > 1. ICMP ping a device every second
>> > 2. Record these results.
>> > 3. Report an alarm after so many seconds of missed pings.
>> >
>> > We are looking for a system to in near real-time monitor if an end
>> > customers router is up or down. SNMP I assume would be too resource
>> > intensive, so ICMP pings seem like the only logical solution.
>> >
>> > The question is once a second pings too polling on an NMS and a consumer
>> > grade router? Does it take much network bandwidth and CPU resources from
>> > both the NMS and CPE side?
>> >
>> > Lets say this is for a 1,000 customer ISP.
>>
>> What problem are you trying to solve, exactly?  That more than anything
>> will dictate what you do.
>>
>> Short answer: about 1500 bits of bandwidth, and the CPU loading on the
>> remote device is almost invisible.  Remember the only real difference
>> between ping and SNMP monitoring (UDP) is the organization of the bits
>> in the packet and the protocol number in the IP header.  It's still one
>> packet pair exchanged, unless you get really ambitious with your SNMP
>> OID list.
>>
>> When I was in a medium-sized hosting company, I developed an SNMP-based
>> monitoring system that would query a number of load parameters (CPU,
>> disk, network, overall) on a once a minute schedule, and would keep
>> history for hours on the monitoring server.  The boss fretted about the
>> load such monitoring would impose.  He never saw any.
>>
>> For pure link monitoring, which is what I'm hearing you want to do, in
>> my experience I found that a six-second ping cycle gives lots of early
>> warning for link failures.  Again, it depends on the specifications and
>> detection targets.
>>
>> Some things to consider:
>>
>> 1.  Router restarts take a while.  Consumer-grade routers can take a
>> minute or more to complete a restart to the point where it will respond
>> to ping.  Carrier-grade routers are more variable but in general have so
>> many options built into them that it takes longer to complete a restart
>> cycle.  Since you are talking consumer-grade gear, you probably don't
>> want to be sensitive to CP power sags.
>>
>> 2.  Depending on the technology used on the link, you may get some
>> short-term outages, on the order of seconds, so doing "rapid" pings do
>> nothing for you.  During my DSL time, ATM would drop out for short
>> intervals -- so watch out for nuisance trips.
>>
>> 3.  Some routers implement ping limiting, so you have to balance your
>> monitoring sample rate against DoS susceptibility. Offhand, I don't know
>> the granularity of consumer router ping limiting, as I've never had that
>> question pop up.
>>
>> 4.  How large a monitoring server are you willing to devote to such a
>> system?  My web host monitoring used a 400-MHz Pentium II box, and it
>> didn't even breathe hard.  (A 1U Cobalt box, repurposed with Red Had
>> Linux, pulled from a junk pile.)  I was monitoring about 150 web host
>> servers. Extraolatuing the system load on that Cobalt box, I could have
>> handled 1500 web host servers and more.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20181216/4d1de9c9/attachment.html>