Telia Not Withdrawing v6 Routes

adamv0025 at netconsultings.com adamv0025 at netconsultings.com
Wed Nov 18 12:58:32 UTC 2020


> Saku Ytti
> Sent: Tuesday, November 17, 2020 6:55 AM
> 
> On Tue, 17 Nov 2020 at 03:40, Sabri Berisha <sabri at cluecentral.net> wrote:
> 
> Hey Sabri,
> 
> > Also, in the case that I described it wasn't a Junos device. Makes me
> > wonder how bugs like that get introduced. One would expect that after
> > 20+ years of writing BGP code, handling a withdrawl would be easy-peasy.
> 
> I don't think this is related to skill, that there was some hard programming
> problem that DE couldn't solve. These are honest mistakes.
> I've not experienced in my tenure the frequency of these bugs change at all,
> NOS are as common now as they were in the 90s.
> 
> I put most of the blame on the market, we've modelled commercial router
> market so that poor quality NOS is good for business and good quality NOS is
> bad for business, I don't think this is in anyone's formal business plan or that
> companies even realise they are not even trying to make good NOS. I think it's
> emergent behaviour due to the market and people follow that market demand
> unknowingly.
> If we suddenly had one commercial NOS which is 100% bug free, many of their
> customers would stop buying support, would rely on spare HW and Internet
> forums for configuration help. Lot of us only need contracts to deal with novel
> bugs all of us find on a regular basis, so good NOS would immediately reduce
> revenue. For some reason Windows, macOS or Linux almost never have novel
> bugs that the end user finds and when those are found, it's big news. While we
> don't go a month without hitting a novel bug in one of our NOS, and no one
> cares about it, it's business as usual.
> 
> I also put a lot of blame on C, it was a terrific language when compiling had to
> be fast. Basically macro assembler. Now the utility of being 'close to HW' is
> gone, as the CPU does so much C compiler has no control over, it's not really
> even executing the same code as-written anymore. MSFT estimated >70% of
> their bugs are related to memory safety. We could accomplish significant
> improvements in software quality if we'd ditch C and allow the computer to do
> more formal correctness checks at compile time and design languages which
> lend towards this.
> 
> 
> We constantly misattribute problems (like in this post) to config or HW, while
> most common reasons for outages are pilot error and SW defect, and very little
> engineering time is spent on those. And often the time spent improving the two
> first increases the risk of the two latter, reducing mean availability over time.
> 
I agree with everything but the last statement. 
>From my experience, most of the SPs spend a considerable time testing for SW defects on features (and combinations of features) that will be used and at scale intended, that's how you identify most of the bugs. What you're left with afterwards are special packets of death or some slow memory leaks (basically the more exotic stuff).
 
adam
 



More information about the NANOG mailing list