BGP Experiment

Thu Jan 31 09:16:55 UTC 2019

> From: Saku Ytti <saku at ytti.fi>
> Sent: Friday, January 25, 2019 7:59 AM
> 
> On Thu, 24 Jan 2019 at 18:43, <adamv0025 at netconsultings.com> wrote:
> 
> > We fight with that all the time,
> > I'd say that from the whole Design->Certify->Deploy->Verify->Monitor
> service lifecycle time budget, the service certification testing is almost half of
> it.
> > That's why I'm so interested in a model driven design and testing approach.
> 
> This shop has 100% automated blackbox testing, and still they have to cherry-
> pick what to test. 
>
Sure one tests only for the few specific current and near future use cases.

> Do you have statistics how often you find show-stopper
> issues and how far into the test they were found? 
>
I don't keep those statistics, but running bug scrubs in order to determine the code for regression testing is usually good starting point to avoid show-stoppers, what is then found later on during the testing is usually patched -so yes you end up with a brand new code and several patches related to your use cases (PEs, Ps, etc..)
   
> I expect this to be
> exponential curve, like upgrading box, getting your signalling protocols up,
> pushing one packet in each service you sell is easy and fast, I wonder will
> massive amount of work increase confidence significantly from that. 
>
Yes it will.

> The
> issues I tend to find in production are issues which are not trivial to recreate
> in lab, once we know what they are, which implies that finding them a-priori
> is bit naive expectation. So, assumptions:
>
This is because you did your due diligence during the testing. 
Do you have statistics on the probability of these "complex" bugs occurrence?
    
> Hopefully we'll enter NOS future where we download NOS from github and
> compile it to our devices. Allowing whole community to contribute to unit
> testing and use-cases and to run minimal bug surface code in your
> environment.
>
Not there yet, but you can compile your own routing protocols and run those on vendor OS.

> I see very little future in blackbox testing vendor NOS at operator site,
> beyond quick poke at lab. Seems like poor value. Rather have pessimistic
> deployment plan, lab => staging => 2-3 low risk site =>
> 2-3 high risk site => slow roll up
> 
Yes that's also a possibility -one of the strong arguments for massive disaggregation at the edge, to reduce the fallout of a potential critical failure.
Depends on the shop really.

> > I really need to have this ever growing library of test cases that the automat
> will churn through with very little human intervention, in order to reduce the
> testing from months to days or weeks at least.
> 
> Lot of vendor, maybe all, accept your configuration and test them for
> releases. I think this is only viable solution vendors have for blackbox, gather
> configs from customers and test those, instead of try to guess what to test.
> I've done that with Cisco in two companies, unfortunately I can't really tell if it
> impacted quality, but I like to think it did.
> 
Did that with juniper partners and now directly with Juniper. 
The thing is though they are using our test plan...

adam