NOC Automation / Best Practices
Charles N Wyble
charles at knownelement.com
Wed Sep 8 10:54:20 CDT 2010
The recent thread on ISP port blocking practice mentioned a way to
identify infected machines through a highly automated manner. This got
me thinking about other ways to automate aspects of network/system
operations when it comes to tier-1 end user support (is it plugged in/is
your wireless working etc) and tier-2/3 NOC support (abuse
desk/incident response/routing issues etc) .
I'm putting in a very high degree of monitoring/healing in place to
reduce the amount of end user support calls that come in, and only
bother a human when it's a real issue.
I'm in the process of launching a small regional wireless ISP / ad
delivery network in Los Angeles CA. I have a small staff (I'm the only
full time engineer, I have a couple NOC techs and 1 help desk tech who
will provide escalation for any serious issues).
My initial thoughts/questions on the matter:
1) Are people integrating their PBX with their OSS/CRM systems? So when
a call comes in the tech has all the relevant information? (perhaps even
things like traceroute/port scan/AV/security health status based on
their phone number or customer number?). This way if I take a user
offline because they are spewing spam/virii the tech can refer them to
our IT support partner organization to clean up their PC. :)
2) What sort of automated alerting/reporting/circuit turn down/RADIUS
lock out is done in regards to alerting customers or even taking them
offline when they have a security issue?
3) What are folks doing in terms of frontline offloading? Do you have
your PBX set to play a different recording when you have an outage so
the NOC techs phones don't go crazy and leave them free to deal with the
4) Your comments here. :)
The way I see it, an ounce of prevention is worth a pound of cure. Along
those lines, I'm putting in some mitigation techniques are as follows
(hopefully this will reduce the number of incidents and therefore calls
to the abuse desk). I would appreciate any feedback folks can give me.
A) Force any outbound mail through my SMTP server with AV/spam filtering.
B) Force HTTP traffic through a SQUID proxy with SNORT/ClamAV running
(several other WISPs are doing this with fairly substantial bandwidth
savings. However I realize that many sites aren't cache friendly. Anyone
know of a good way to check for that? Look at HTTP headers?). Do the
bandwidth savings/security checking outweigh the increased support calls
due to "broken" web sites?
C) Force DNS to go through my server. I hope to reduce DNS hijacking
attacks this way.
More information about the NANOG