recommendations for external montioring services?

Mark Gauvin MGauvin at dryden.ca
Wed Dec 14 01:43:35 UTC 2011


Solar winds as you send in the specific mib required to monitor and a  
week later it's general release


Sent from my iPhone

On 2011-12-13, at 7:11 PM, "Robert Brockway"  
<robert at timetraveller.org> wrote:

> On Mon, 12 Dec 2011, Eric J Esslinger wrote:
>
>> I'm not looking to monitor a massive infrastructure: 3 web sites, 2  
>> mail
>> servers (pop,imap,submission port, https webmail), 4 dns servers
>> (including lookups to ensure they're not listening but not  
>> talking), and
>> one inbound mx. A few network points to ping to ensure connectivity
>> throughout my system. Scheduled notification windows (for example,
>> during work hours I don't want my phone pinged unless it's everything
>> going offline. Off hours I do. Secondary notifications if problem
>> persists to other users, or in the event of many triggers. That  
>> sort of
>> thing). Sensitivity settings (If web server 1 shows down for 5 min,
>> that's not a big deal. Another one if it doesn't respond to repeated
>> queries within 1 minute is a big deal) A Weekly summary of issues  
>> would
>> be nice. (especially the 'well it was down for a short bit but we  
>> didn't
>> notify as per settings') I don't have a lot of money to throw at  
>> this. I
>
> Hi Eric.  The feature set you are describing should be in any  
> monitoring
> system worthy of the name.  I've used Nagios to good effect for the  
> best
> part of the last 12 years or so.  Before that I used Big Brother,  
> which
> sucked in various ways.
>
> I did an evaluation on a wide variety of FOSS monitoring systems 2-3  
> years
> ago and Nagios won at the time (again).  Generally I found the
> alternatives had problems that I considered to be quite serious  
> (such as
> being overly complicated or doing checks so frequently that they  
> loaded
> the systems they were supposed to be monitoring[1]).
>
> I'm currently trialing Icinga, a fork of Nagios.
>
> Puppet can be set up to manage Nagios/Icinga config which cuts down  
> on the
> admin overhead.
>
> Nagios/Icinga can be hooked up to Collectd to provide performance  
> data as
> well as alert monitoring.
>
> One concern about external monitoring services is the level of  
> visibility
> they need to have in to your network to adequately monitor them.
>
> My recommendation is to do a proper risk assessment on the available
> options.
>
>> DO have detailed internal monitoring of our systems but sometimes  
>> that
>> is not entirely useful, due to the fact that there are a few 'single
>> points of failure' within our network/notification system, not to
>> mention if the monitor itself goes offline it's not exactly going  
>> to be
>> able to tell me about it. (and that happened once, right before the  
>> mail
>> server decided to stop receiving mail).
>
> There are a couple of ways to deal with this.  Some monitoring
> applications can fail-over to a standby server if the primary  
> fails.  But
> this isn't even really necessary.  You will arguably gain higher
> reliability by running multiple _independent_ monitors and have them
> monitor each other[2].  I have often used this approach.
>
> The principal aim here is to guarantee that you are alerted to any  
> single
> failure (a production service, system or a monitor).  Multiple
> simultaneous failures could still produce a blackspot.  It is  
> possible to
> design a system that will discover multiple simultaneous failures,  
> but it
> takes more effort and resources.
>
>
> [1] Sometimes I wonder if the people developing certain systems have  
> any
> operational experience at all.
>
> [2] A system designed to fail-over on certain conditions may fail to
> fail-over, ah, so to speak.
>
> Cheers,
>
> Rob
>
> -- 
> Email: robert at timetraveller.org        Linux counter ID #16440
> IRC: Solver (OFTC & Freenode)
> Web: http://www.practicalsysadmin.com
> Director, Software in the Public Interest (http://spi-inc.org/)
> Free & Open Source: The revolution that quietly changed the world
> "One ought not to believe anything, save that which can be proven by  
> nature and the force of reason" -- Frederick II (26 December 1194 –  
> 13 December 1250)


More information about the NANOG mailing list