recommendations for external montioring services?
Mark Gauvin
MGauvin at dryden.ca
Wed Dec 14 01:43:35 UTC 2011
Solar winds as you send in the specific mib required to monitor and a
week later it's general release
Sent from my iPhone
On 2011-12-13, at 7:11 PM, "Robert Brockway"
<robert at timetraveller.org> wrote:
> On Mon, 12 Dec 2011, Eric J Esslinger wrote:
>
>> I'm not looking to monitor a massive infrastructure: 3 web sites, 2
>> mail
>> servers (pop,imap,submission port, https webmail), 4 dns servers
>> (including lookups to ensure they're not listening but not
>> talking), and
>> one inbound mx. A few network points to ping to ensure connectivity
>> throughout my system. Scheduled notification windows (for example,
>> during work hours I don't want my phone pinged unless it's everything
>> going offline. Off hours I do. Secondary notifications if problem
>> persists to other users, or in the event of many triggers. That
>> sort of
>> thing). Sensitivity settings (If web server 1 shows down for 5 min,
>> that's not a big deal. Another one if it doesn't respond to repeated
>> queries within 1 minute is a big deal) A Weekly summary of issues
>> would
>> be nice. (especially the 'well it was down for a short bit but we
>> didn't
>> notify as per settings') I don't have a lot of money to throw at
>> this. I
>
> Hi Eric. The feature set you are describing should be in any
> monitoring
> system worthy of the name. I've used Nagios to good effect for the
> best
> part of the last 12 years or so. Before that I used Big Brother,
> which
> sucked in various ways.
>
> I did an evaluation on a wide variety of FOSS monitoring systems 2-3
> years
> ago and Nagios won at the time (again). Generally I found the
> alternatives had problems that I considered to be quite serious
> (such as
> being overly complicated or doing checks so frequently that they
> loaded
> the systems they were supposed to be monitoring[1]).
>
> I'm currently trialing Icinga, a fork of Nagios.
>
> Puppet can be set up to manage Nagios/Icinga config which cuts down
> on the
> admin overhead.
>
> Nagios/Icinga can be hooked up to Collectd to provide performance
> data as
> well as alert monitoring.
>
> One concern about external monitoring services is the level of
> visibility
> they need to have in to your network to adequately monitor them.
>
> My recommendation is to do a proper risk assessment on the available
> options.
>
>> DO have detailed internal monitoring of our systems but sometimes
>> that
>> is not entirely useful, due to the fact that there are a few 'single
>> points of failure' within our network/notification system, not to
>> mention if the monitor itself goes offline it's not exactly going
>> to be
>> able to tell me about it. (and that happened once, right before the
>> mail
>> server decided to stop receiving mail).
>
> There are a couple of ways to deal with this. Some monitoring
> applications can fail-over to a standby server if the primary
> fails. But
> this isn't even really necessary. You will arguably gain higher
> reliability by running multiple _independent_ monitors and have them
> monitor each other[2]. I have often used this approach.
>
> The principal aim here is to guarantee that you are alerted to any
> single
> failure (a production service, system or a monitor). Multiple
> simultaneous failures could still produce a blackspot. It is
> possible to
> design a system that will discover multiple simultaneous failures,
> but it
> takes more effort and resources.
>
>
> [1] Sometimes I wonder if the people developing certain systems have
> any
> operational experience at all.
>
> [2] A system designed to fail-over on certain conditions may fail to
> fail-over, ah, so to speak.
>
> Cheers,
>
> Rob
>
> --
> Email: robert at timetraveller.org Linux counter ID #16440
> IRC: Solver (OFTC & Freenode)
> Web: http://www.practicalsysadmin.com
> Director, Software in the Public Interest (http://spi-inc.org/)
> Free & Open Source: The revolution that quietly changed the world
> "One ought not to believe anything, save that which can be proven by
> nature and the force of reason" -- Frederick II (26 December 1194 –
> 13 December 1250)
More information about the NANOG
mailing list