OS, Hardware, Network - Logging, Monitoring, and Alerting
psa at otoh.org
Fri Jun 27 05:34:18 UTC 2008
At 2008-06-26T02:22-0700, Rev. Jeffrey Paul wrote:
> Other stuff we really need to keep an eye on is hardware - redundant
> PSU status in our 7204s and Dells, temperatures and voltages
Do yourself a favor, monitor temp in C. Most stuff only does C, people
burn routers if there's a mix of C and F (I set the alarm to 90, why
didn't it shut down? Well, you should have set it to 30, the router only
> 1) Is SNMP the best way to do this? Obviously some of the data (service
> checks) will need to be collected other ways.
Particularly with NetSNMP, you can hook in external commands etc.
Arbitrary Extension Commands
If you don't use SNMP for everything, you're going to be stuck with
hooking SNMP into whatever you do use so that all your networking kit
and environmental monitors can be monitored.
> 2) Is there any good solution that does both logging/trending of this
> data and also notification/monitoring/alerting? I've used both Nagios
> and Cacti in the past, and, due to the number of individual things being
> monitored (3-5 items per OS instance, 5-10 items per physical server,
> 10-50 things per network device), setting them both up independently
> seems like a huge pain. Also, I've never really liked Nagios that much.
Take a look at OpenNMS....
> There's got to be a better way. What do you guys use?
We wrote our own, but that's a company culture thing.
End dual-measurement, let's finish going metric!
More information about the NANOG