What NMS do you use and why?

Wed Aug 15 16:37:34 UTC 2018

As a small operator, we mainly use Icinga for the reasons Chuck mentioned.

The API allows us to do updates based on configuration parameters we've
created in a custom MySQL database.

Peter

Peter Harrison
CTO, Colovore LLC

On Wed, Aug 15, 2018 at 9:19 AM, Chuck Anderson <cra at wpi.edu> wrote:

> On Wed, Aug 15, 2018 at 08:49:12AM -0500, Colton Conor wrote:
> > We are looking for a new network monitoring system. Since there are so
> many
> > operators on this list, I would like to know which NMS do you use and
> why?
> > Is there one that you really like, and others that you hate?
> >
> > For free options (opensouce), LibreNMS and NetXMS come highly recommended
> > by many wireless ISPs on low budgets. However, I am not sure the
> commercial
> > options available nor their price points.
>
> For monitoring network device/interface data plane reachability with
> ping, we are still using an ancient piece of open source software
> called Autostatus.  I find it invaluable for notifying us about
> reachability issues with it's simple to understand parent/child
> relationships and graph-based fping methodology.  It isn't perfect--it
> doesn't scale very well, it doesn't have HA/clustering, it has no
> fancy dependencies (just basic parent-child) and no event correlation,
> no contact scheduling, no API, etc. but it is very easy to understand
> why you are getting an alert or not and boiling that down to a single
> point of failure and as such it provides reliable, trustable
> information about data plane reachability from one vantage point on
> the network.
>
> For monitoring server & network service availability,
> device/environmental health, etc. we are currently using Nagios.  My
> problems with it are that it has complex rules for how/when to perform
> a specific health check and send or suppress a notification (and
> perhaps bugs in our old version that never ever seems to send any Host
> notifications except when it does) and the whole idea of "suppress the
> Host check unless all Service checks for all services on the host are
> down" doesn't really fit well with the idea of monitoring
> device/interface reachability on routers & switches that make up a
> complex graph of dependencies.  Trying to shoehorn Nagios into
> alerting on just the one IP address/device/interface that is causing
> all the others behind it to be unreachable doesn't work very well.
> You can't use Host Depenencies because Host checks are suppressed by
> default, and Host Dependencies don't affect Service
> Checks/notifications.  Forcing Host checks to always run causes
> performance problems.  Creating a "Ping" service for every host
> requires creating manual Service Dependencies between all the "Ping"
> services on every Host.  Then you end up with a complex configuration
> that is very hard to understand.  But for things like telling you when
> a power supply or fan has died, or if the web service crashed, it
> works well.
>
> We did a survey of a bunch of open source tools to replace Nagios and
> have settled on Icinga for it's APIs, dynamic rules with pattern
> matching and boolean logic, and compatibility with Nagios plugins.
> But it still doesn't change the basic architectural choices of the
> Nagios core engine and hence isn't a good fit for network
> device/interface reachability monitoring IMO.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20180815/ae43d4c8/attachment.html>