OS, Hardware, Network - Logging, Monitoring, and Alerting

Paul Armstrong psa at otoh.org
Fri Jun 27 05:34:18 UTC 2008


At 2008-06-26T02:22-0700, Rev. Jeffrey Paul wrote:
> Other stuff we really need to keep an eye on is hardware - redundant
> PSU status in our 7204s and Dells, temperatures and voltages 

Do yourself a favor, monitor temp in C. Most stuff only does C, people
burn routers if there's a mix of C and F (I set the alarm to 90, why
didn't it shut down? Well, you should have set it to 30, the router only
understands C).

> 1) Is SNMP the best way to do this?  Obviously some of the data (service
> checks) will need to be collected other ways.
 
Pretty much.
Particularly with NetSNMP, you can hook in external commands etc.

Check out
http://www.net-snmp.org/docs/man/snmpd.conf.html
Arbitrary Extension Commands

If you don't use SNMP for everything, you're going to be stuck with
hooking SNMP into whatever you do use so that all your networking kit
and environmental monitors can be monitored.

> 2) Is there any good solution that does both logging/trending of this
> data and also notification/monitoring/alerting?  I've used both Nagios
> and Cacti in the past, and, due to the number of individual things being
> monitored (3-5 items per OS instance, 5-10 items per physical server,
> 10-50 things per network device), setting them both up independently
> seems like a huge pain.  Also, I've never really liked Nagios that much.

Take a look at OpenNMS....

> There's got to be a better way.  What do you guys use?
 
We wrote our own, but that's a company culture thing.

Paul
 
-- 
End dual-measurement, let's finish going metric!
http://gometric.us/
http://www.metric.org/




More information about the NANOG mailing list