NOC Automation / Best Practices

Wed Sep 8 15:54:20 UTC 2010

  NOGGERS,

The recent thread on ISP port blocking practice mentioned a way to 
identify infected machines through a highly automated manner. This got 
me thinking about other ways to automate aspects of network/system 
operations when it comes to tier-1 end user support (is it plugged in/is 
your wireless working etc) and  tier-2/3 NOC support (abuse 
desk/incident response/routing issues etc) .

I'm putting in a very high degree of monitoring/healing in place to 
reduce the amount of end user support calls that come in, and only 
bother a human when it's a real issue.

I'm in the process of launching a small regional wireless ISP / ad 
delivery network in Los Angeles CA. I have a small staff (I'm the only 
full time engineer,  I have a couple NOC techs and 1 help desk tech who 
will provide escalation for any serious issues).

My initial thoughts/questions on the matter:
1) Are people integrating their PBX with their OSS/CRM systems? So when 
a call comes in the tech has all the relevant information? (perhaps even 
things like traceroute/port scan/AV/security health status based on 
their phone number or customer number?). This way if I take a user 
offline because they are spewing spam/virii the tech can refer them to 
our IT support partner organization to clean up their PC. :)

2) What sort of automated alerting/reporting/circuit turn down/RADIUS 
lock out is done in regards to alerting customers or even taking them 
offline when they have a security issue?

3) What are folks doing in terms of frontline offloading?  Do you have 
your PBX set to play a different recording when you have an outage so 
the NOC techs phones don't go crazy and leave them free to deal with the 
issue?

4) Your comments here. :)

The way I see it, an ounce of prevention is worth a pound of cure. Along 
those lines, I'm putting in some mitigation techniques are as follows 
(hopefully this will reduce the number of incidents and therefore calls 
to the abuse desk). I would appreciate any feedback folks can give me.

A) Force any outbound mail through my SMTP server with AV/spam filtering.
B) Force HTTP traffic through a SQUID proxy with SNORT/ClamAV running 
(several other WISPs are doing this with fairly substantial bandwidth 
savings. However I realize that many sites aren't cache friendly. Anyone 
know of a good way to check for that? Look at HTTP headers?).  Do the 
bandwidth savings/security checking outweigh the increased support calls 
due to "broken" web sites?
C) Force DNS to go through my server. I hope to reduce DNS hijacking 
attacks this way.

Thanks!