Resilience: faults, causes, statistics, open issues

András Császár (IJ/ETH) Andras.Csaszar at
Thu Jan 27 11:39:32 UTC 2005

Hi people!

I've begun research on (carrier-grade, aka telecom-grade) resiliency in IP transport networks. The first step would be to collect possible failure events, their causes and consequences, statistics about downtimes (mean time to repair) and mean times between failures, and I would like to identify which of the problems are most typical (HW bug, SW bug, cable cut through, plugged out (link going down), severe misconfiguration).

I think this is the perfect forum to get some feedback from real network-operational experience.

Is anyone out there who has some statistics/documents that would help me in any way?

Also, do you have any suggestions on open research issues to be solved in the area?

Any thoughts on your mind or comments would be most welcome!



More information about the NANOG mailing list