Converged Networks Threat (Was: Level3 Outage)
David Meyer
dmm at 1-4-5.net
Wed Feb 25 19:19:16 UTC 2004
Petri,
>> I think it has been proven a few times that physical fate sharing is
>> only a minor contributor to the total connectivity availability while
>> system complexity mostly controlled by software written and operated by
>> imperfect humans contribute a major share to end-to-end availability.
Yes, and at the very least would seem to match our
intuition and experience.
>> From this, it can be deduced that reducing unneccessary system
>> complexity and shortening the strings of pearls that make up the system
>> contribute to better availablity and resiliency of the system. Diversity
>> works both ways in this equation. It lessens the probablity of same
>> failure hitting majority of your boxes but at the same time increases
>> the knowledge needed to understand and maintain the whole system.
No doubt. However, the problem is: What constitutes
"unnecessary system complexity"? A designed system's
robustness comes in part from its complexity. So its not
that complexity is inherently bad; rather, it is just
that you wind up with extreme sensitivity to outlying
events which is exhibited by catastrophic cascading
failures if you push a system's complexity past some
point; these are the so-called "robust yet fragile"
systems (think NE power outage).
BTW, the extreme sensitivity to outlying events/catastrophic
cascading failures property is a signature of class of
dynamic systems of which we believe the Internet is an
example; unfortunately, the machinery we currently have
(in dynamical systems theory) isn't yet mature enough to
provide us with engineering rules.
>> I would vote for the KISS principle if in doubt.
Truly. See RFC 3439 and/or
http://www.1-4-5.net/~dmm/complexity_and_the_internet. I
also said a few words about this topic at NANOG26
where we has a panel on this topic (my slides on
http://www.maoz.com/~dmm/NANOG26/complexity_panel).
Dave
More information about the NANOG
mailing list