Monitoring system recommendation

Crier, Brent Brent.Crier at nsight.com
Tue Jun 7 12:32:50 UTC 2016


We use Zabbix here pretty heavily. Monitoring roughly 10,000 hosts 13,000 interfaces and a mirage of services.

-Brent


> On Jun 7, 2016, at 2:42 AM, Mikael Falkvidd <mikael.falkvidd at op5.com> wrote:
> 
>> 
>> On Monday, June 6, 2016, Manuel Marín <mmg at transtelco.net> wrote:
>> 
>>> Dear Nanog community
>>> 
>>> We are currently planning to upgrade our monitoring system (Opsview) due
>> to
>>> scalability issues and I was wondering what do you recommend for
>> monitoring
>>> 5000 hosts and 35000 services. We would like to use a monitoring system
>>> that is compatible with the nagios plugin format, however we are not sure
>>> if systems like Icinga/Shinken/Op5 are the way to go.
>>> 
>>> Is someone using systems like Op5 or Icinga2 for monitoring > 5000 hosts?
>>> Would you recommend commercial systems like Sevone, Zabbix, etc instead
>> of
>>> open source ones?
>> 
> 
> We (op5) have customers running > 50,000 hosts and > 300,000 services. So
> 5,000 hosts is generally not a problem.
> 
> As mentioned by Jeff, the forking model *can* become a problem. Small
> binaries
> that don't load a lot of libraries fork pretty fast. A test we made some
> time ago
> showed a 15 minute load peak at 3.89 (on 24 cores/hyperthreads) when
> checking
> 100,000 services every 5 minutes. Check latencies were 0.8 seconds max and
> 0.002 seconds avg. Average cpu load was 15%.
> 
> Specs for the machine used:
> Dell PowerEdge R620
> 2x Intel Xeon E5-2620
> 24 GB ram
> Dell PERC H710 hardware RAID card
> RAID10 on 4x300GB 15kRPM SAS drives
> 
> So a single (now almost vintage) server can handle 300 plugin executions per
> second without breaking a sweat. Scaling up is definitely a possibility, but
> scaling out (using mod gearman, mk or merlin, all open source) is available
> as
> well.
> 
> Complex plugins, for example check_vmware_api which loads the large VMware
> perl SDK can get you in trouble though. I suggest you run a test with the
> plugin
> mix you are planning to use.
> 
> If scaling out is not an option, and you want to stay in the nagios/naemon
> world,
> a custom worker can be developed to get rid of the loading overhead.
> Documentation is available at
> http://www.naemon.org/documentation/developer/workers.html
> 
> Full disclosure: I work as development team lead at op5
> 
> best regards
> Mikael Falkvidd

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20160607/a35e4f66/attachment.sig>


More information about the NANOG mailing list