HE.net, Fremont-2 outage?
Robert Mathews (OSIA)
mathews at hawaii.edu
Wed Nov 4 23:32:02 CST 2009
Alex Rubenstein wrote:
>> Yup. Related: "100% availability" is a marketing person's dream; it
>> sounds good in theory but is unattainable in practice, and is a
>> reliable sign of non-100%-reliability.
> You are confusing two different things.
> Availability != Reliability.
Pardon the interruption...
In the aforementioned statement, there appears an intense/flagrant -
compartmentalization/separation of terms without sufficient
explanation. Note that in being available, 'a' criteria to ensure
reliability is met. If one has the desire to delve into some of the
nuanced operational perspective, see: http://ow.ly/zmQg (pdf) or
http://ow.ly/zmTB (web friendly). The article is also available
through the IEEE Portal at http://ow.ly/zn3a (if one of the other links
appear to be unavailable, anytime).
> For instance, an airplane is designed to be 100% reliable, but much less available. To keep a 747 from not crashing (100% reliability) it needs significant downtime (not 100% available).
This explanation, aside from being unsatisfactory, is misleading.
Operating times and maintenance times are very much separate quantities.
>> And even for those who follow best practices... You can inspect and
>> maintain things until you're blue in the face. One day a contractor
>> will drop a wrench into a PDU or UPS or whatever and spectacular things
>> will happen.
> That's were policies, procedures and methods come in (read: SAS70)
For the operationally minded -- on one hand, there is an assumption here
that 'accidents' are not preventable; on the other hand, there is at
least an assumption being made here that SAS 70 is the curative for
'accidents.' To be brief, accounting for human behavior as an
underlying contributor to accidents can be a backbreaking and immensely
messy endeavor. In this respect, SAS 70 can only be assistive.
All the best,
More information about the NANOG