Open letter to Level3 concerning the global routing issues on June 12th

Hank Nussbacher hank at efes.iucc.ac.il
Sat Jun 13 18:54:54 UTC 2015


At 17:32 12/06/2015 +0200, Martin Millnert wrote:

Interesting that Level3 is a member of http://www.routingmanifesto.org/

or see

http://www.internetsociety.org/news/network-operators-around-world-demonstrate-their-commitment-secure-and-resilient-internet

to quote Level3
"As one of the most connected Internet providers in the world, security of 
the Internet is top-of-mind at Level 3 Communications. We are dedicated to 
supporting and protecting the Internet ecosystem and work each day to 
safeguard customers' critical communications. The Internet is a shared 
responsibility, and only through these important collaborative efforts can 
we continue to ensure the protection of this collective infrastructure."

-Hank

>Dear Level3,
>
>The Internet is a cooperative effort, and it works well only when its
>participants take constructive actions to address errors and remedy
>problems.
>Your position as a major Internet Carrier bestows upon you a certain
>degree of responsibility for the correct operation of the Internet all
>across (and beyond) the planet. You have many customers. Customers will
>always occasionally make mistakes. You as a major Internet Carrier have
>a responsibility to limit, not amplify, your customers' mistakes.
>Other major carriers implement technical measures that severely limits
>the damages from customer mistakes from having global impact.
>Other major carriers also implement operational procedures in addition
>to technical measures.
>In combination, these measures drastically reduce the outage-hours as a
>result of customer configuration errors.
>
>At 08:44 UTC on Friday 12th of June, one of your transit customers,
>Telekom Malaysia (AS4788) began announcing the full Internet table back
>to you, which you accepted and propagated to your peers and customers,
>causing global outages for close to 3 hours.
>[ https://twitter.com/DynResearch/status/609340592036970496 ]
>During this 3 hour window, it appears (from your own service outage
>reports) that you did nothing to stop the global Internet outage, but
>that Telekom Malaysia themselves eventually resolved it. This lack of
>action on your end, and your disregard for the correct operation of the
>global Internet is astonishing. These mistakes do not need to happen.
>AS4788 under normal circumstances announces ~1900 IPv4 prefixes to the
>Internet. You accepted multiple hundred thousand prefixes from them - a
>max prefix setting would have severely limited the damage. We expect
>that these are your practices as well, but they failed. When they do, it
>should not take ~3 hours to shut down the session(s).
>
>Many operators, in despair, turned down their peering sessions with you
>once it was clear you were causing the outages and no immediate fix was
>in sight. This improved the situation for some - but not all did. Had
>you deployed proper IRR-filtering to filter the bad announcements the
>impact would've been far less critical.
>
>As a direct consequence of your ~3 hours of inaction, as a local
>example, Swedish payment terminals were experiencing problems all over
>the country. The Swedish economy was directly affected by your inaction.
>There were queues when I was buying lunch! Imagine the food rage. The
>situation was probably similar at other places around the globe where
>people were awake.
>
>Operators around the planet are curious:
>   - Did Level3 not detect or understand that it was causing global
>Internet outages for ~3 hours?
>   - If Level3 did in fact detect or understand it was causing global
>Internet outages, why did it not properly and immediately remedy the
>situation?
>   - What is Level3 going to do to address these questions and begin work
>on restoring its credibility as a carrier?
>
>We all understand that mistakes do happen (in applying customer
>interface templates, etc.). However the Internet is all too pervasive in
>everyday life today for anything but swift action by carriers to remedy
>breakage after the fact. It is absolutely not sufficient to let a
>customer spend 3 hours to detect and fix a situation like this one. It
>is unacceptable that no swift action was taken on your end to limit the
>global routing issues you caused.
>
>Sincerely,
>Martin Millnert
>Member of Internet Community - no carrier / ISP affiliation.




More information about the NANOG mailing list