SLA monitoring and reporting to customers

Mon Mar 19 10:01:08 UTC 2007

On Sun, 18 Mar 2007, Rubens Kuhl Jr. wrote:

>
>> > What open-source or low-budget tools are operators using for SLA
>> > monitoring when the reports (current state and historical) should be
>> > available to customers ?
>> 
>> Please define SLA in terms of monitoring.
>
> - 99.x% availability (defined by packet loss and response time) monthly
> - A certain number of hours from service interruption to service recovery

So what you're looking for is a number for a monthly report to be 
calculated based on known downtime as measured by monitoring software.

>> > Looking at NANOG archives, NAGIOS is the most prevalent tool, but its
>> > authorization mechanisms are somewhat below I would like so customers
>> > could not change anything both in configuration and in SLA software
>> > state
>> 
>> You can setup so that customer only sees the data on status of the
>> services he or she has access to by adding customer into as a contact
>> for host or services.
>
> There are 2 main issues on my reading of
> http://nagios.sourceforge.net/docs/2_0/cgiauth.html
> - Users can issue commands for hosts/services they are contact for.
> They could acknowledge an outage even when we should know about it.

If they acknowledge an outage you'll know about it (acknowledgement
notification). I also don't necessarily see it as bad that user for
some service to acknowledge that certain service (say HTTP) that you
monitore is down and tells that they purposely took apache down.

But I guess what you're asking for is additional permission list for 
nagios users for view-only access...

> - Some devices of interest to a customer are not specific to a
> customer: a switch, a router. If they are considered contact for such
> devices, they can issue commands for it.

Depends on how you set it up. The setup that I use is that each
router & switch port is separate service and can have separate
list of associated users and they will see no other data about
the switch or issue commands for anything other then that switch.

>> Do you think that your customers should or
>> should not have such access to your central nagios system?
>
> That's something I woud like to hear opinions on, but even with NAGIOS
> such an issue could be solved by having one NOC-only NAGIOS and one
> customers-only NAGIOS. Using NagiosQL would be probably make
> replication easier.

Yes that can be done. But maintaining separate parallel systems is
actually a pain. I also would like to hear options on if more complex
user permission systems is good to have for nagios web interface
and if so what those permissions should be.

>> > I'm looking for something more like Cacti, where customers can be
>> > contained to only see some of the generated graphs.
>> 
>> Would you be satisfied with graphing extension to nagios that is
>> tied replicates nagios security mechanism where customer can see
>> graphs for the service he/she is listed as contact for?
>
> Is it http://nagiosgraph.sourceforge.net/ ? Can a user be a
> nagiosgraph contact without being a NAGIOS contact ?

I'm actually asking because I wrote my own web interface (see 
ngraph.cgi at http://www.elan.net/~william/nagios/) originally
for nagiosgrapher but it is now being decoupled from particular
graphing package and I plan to have it support multiple nagios
data collection & backend systems.

The next step on TODO list is user access & authenication which
is supposed to replicate how nagios itself does it by allowing
only authenticated users who are contacts for the service to see
the graphs, BUT you do have opportunity here to tell what else
such interface should support as far as user access rights control.
(BTW, the current cgi does support specifying users who would have
access to graphs but not nagios itself - however user would have
access to see all graphs then...)

-- 
William Leibzon
Elan Networks
william at elan.net