Do you care about "gray" failures? Can we (network academics) help? A 10-min survey

Thu Jul 8 11:57:14 UTC 2021

Dear NANOG,

Detecting whole-link and node failures is relatively easy nowadays (e.g., using BFD). But what about detecting gray failures that only affect a *subset* of the traffic, e.g. a router randomly dropping 0.1% of the packets? Does your network often experience these gray failures? Are they problematic? Do you care? And can we (network researchers) do anything about it?”

Please help us out to find out by answering our short (<10 minutes) anonymous survey.

Survey URL: https://forms.gle/v99mBNEPrLjcFCEu8

## Context:

When we think about network failures, we often think about a link or a network device going down. These failures are "obvious" in that *all* the traffic crossing the corresponding resource is dropped. But network failures can also be more subtle and only affect a *subset* of the traffic (e.g. 0.01% of the packets crossing a link/router). These failures are commonly referred to as "gray" failures. Because they don't drop *all* the traffic, gray failures are much harder to detect.

Many studies revealed that cloud and datacenter networks routinely suffer from gray failures and, as such, many techniques exist to track them down in these environments (see e.g. this study from Microsoft Azure https://www.microsoft.com/en-us/research/wp-content/uploads/2017/06/paper-1.pdf). What is less known though is how much gray failures affect *other* types of networks such as Internet Service Providers (ISPs), Wide Area Networks (WAN), or Enterprise networks. While the bug reports submitted to popular routing vendors (Cisco, Juniper, etc.) suggest that gray failures are pervasive and hard to catch for all networks, we would love to know more about first-hand experiences.

## About the survey:

The questionnaire is intended for network operators. It has a total of 15 questions and should take at most 10 minutes to complete. The survey and the collected data are totally anonymous (so please do not include information that may help to identify you or your organization). All questions are optional, so if you don't like a question or don't know the answer, just skip it.

Thank you so much in advance, and we look forward to read your responses!

Laurent Vanbever, ETH Zurich

PS: Of course, we would be extremely grateful if you could forward this email to any operator you might know who may not read NANOG ( assuming those even exist? :-) )!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20210708/f0f2133d/attachment.html>