<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: arial,helvetica,sans-serif; font-size: 10pt; color: #000000'>Sure, but I don't care how busy your router is, it shouldn't take hours to withdraw routes.<div><br><div><span name="x"></span><br><br>-----<br>Mike Hammett<br>Intelligent Computing Solutions<br>http://www.ics-il.com<br><br>Midwest-IX<br>http://www.midwest-ix.com<span name="x"></span><br></div><br><hr id="zwchr"><div style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Saku Ytti" <saku@ytti.fi><br><b>To: </b>"Martijn Schmidt" <martijnschmidt@i3d.net><br><b>Cc: </b>"Outages" <outages@outages.org>, "North American Network Operators' Group" <nanog@nanog.org><br><b>Sent: </b>Wednesday, September 2, 2020 2:15:46 AM<br><b>Subject: </b>Re: [outages] Major Level3 (CenturyLink) Issues<br><br>On Wed, 2 Sep 2020 at 10:00, Martijn Schmidt via NANOG <nanog@nanog.org> wrote:<br><br>> I suppose now would be a good time for everyone to re-open their Centurylink ticket and ask why the RFO doesn't address the most important defect, e.g. the inability to withdraw announcements even by shutting down the session?<br><br>The more work the BGP process has the longer it takes to complete that<br>work. You could try in your RFP/RFQ if some provider will commit on<br>specific convergence time, which would improve your position<br>contractually and might make you eligible for some compensations or<br>termination of contract, but realistically every operator can run into<br>a situation where you will see what most would agree pathologically<br>long convergence times.<br><br>The more BGP sessions, more RIB entries the higher the probability<br>that these issues manifest. Perhaps protocol level work can be<br>justified as well. BGP doesn't have concept of initial convergence, if<br>you have lot of peers, your initial convergence contains massive<br>amount of useless work, because you keep changing best route, while<br>you keep receiving new best routes, the higher the scale the more<br>useless work you do and the longer stability you require to eventually<br>~converge. Practical devices operators run may require hours during<br>_normal operation_ to do initial converge.<br><br>RFC7313 might show us way to reduce amount of useless work. You might<br>want to add signal that initial convergence is done, you might want to<br>add signal that no installation or best path algo happens until all<br>route are loaded, this would massively improve scaled convergence as<br>you wouldn't do that throwaway work, which ultimately inflates your<br>work queue and pushes your useful work far to the future.<br><br>The main thing as a customer I would ask, how can we fix it faster<br>than 5h in future. Did we lose access to control-plane? Could we<br>reasonably avoid losing it?<br>-- <br>  ++ytti<br></div><br></div></div></body></html>