BGP Experiment

Fri Jan 25 07:58:52 UTC 2019

On Thu, 24 Jan 2019 at 18:43, <adamv0025 at netconsultings.com> wrote:

> We fight with that all the time,
> I'd say that from the whole Design->Certify->Deploy->Verify->Monitor service lifecycle time budget, the service certification testing is almost half of it.
> That's why I'm so interested in a model driven design and testing approach.

This shop has 100% automated blackbox testing, and still they have to
cherry-pick what to test. Do you have statistics how often you find
show-stopper issues and how far into the test they were found? I
expect this to be exponential curve, like upgrading box, getting your
signalling protocols up, pushing one packet in each service you sell
is easy and fast, I wonder will massive amount of work increase
confidence significantly from that. The issues I tend to find in
production are issues which are not trivial to recreate in lab, once
we know what they are, which implies that finding them a-priori is bit
naive expectation. So, assumptions:

a) blackbox testing has exponentially diminishing returns, quickly you
need to expand massively more efforts to gain slightly more confidence
b) you can never say 'x works' you can only say 'i found way to
confirm x is not broken in this very specific case', the way x will
end up being broken may be very complex
c) if recreating issues you know about is hard, then finding issues
you don't know about is massively more difficult
d) testing likely increases more your comfort to deploy than
probability of success

Hopefully we'll enter NOS future where we download NOS from github and
compile it to our devices. Allowing whole community to contribute to
unit testing and use-cases and to run minimal bug surface code in your
environment.
I see very little future in blackbox testing vendor NOS at operator
site, beyond quick poke at lab. Seems like poor value. Rather have
pessimistic deployment plan, lab => staging => 2-3 low risk site =>
2-3 high risk site => slow roll up

> I really need to have this ever growing library of test cases that the automat will churn through with very little human intervention, in order to reduce the testing from months to days or weeks at least.

Lot of vendor, maybe all, accept your configuration and test them for
releases. I think this is only viable solution vendors have for
blackbox, gather configs from customers and test those, instead of try
to guess what to test.
I've done that with Cisco in two companies, unfortunately I can't
really tell if it impacted quality, but I like to think it did.

-- 
  ++ytti