Your opinion on network analysis in the presence of uncertain events

Thu Jan 17 08:06:54 UTC 2019

Hi Adam/Mel,

Thanks for chiming in!

My understanding was that the tool will combine historic data with the MTBF datapoints form all components involved in a given link in order to try and estimate a likelihood of a link failure.

Yep. This could be one way indeed. This likelihood could also be taking the form of intervals in which you expect the true value to lies (again, based on historical data). This could be done both for link/devices failures but also for external inputs such as BGP announcements (to consider the likelihood that you receive a route for X in, say, NEWY). The tool would then to run the deterministic routing protocols (not accounting for ‘features’ such as prefer-oldest-route for a sec.) on these probabilistic inputs so as to infer the different possible forwarding outcomes and their relative probabilities. For now we had something like this in mind.

One can of course make the model more and more complex by e.g. also taking into account data plane status (to model gray failures). Intuitively though, the more complex the model, the more complex the inference process is.

Heck I imagine if one would stream a heap load of data at a ML algorithm it might draw some very interesting conclusions indeed -i.e. draw unforeseen patterns across huge datasets while trying to understand the overall system (network) behaviour. Such a tool might teach us something new about our networks.
Next level would be recommendations on how to best address some of the potential pitfalls it found.

Yes. I believe some variants of this exist already. I’m not sure how much they are used in practice though. AFAICT, false positives/negatives is still a big problem. Non-trivial recommendation system will require a model of the network behavior that can somehow be inverted easily which is probably something academics should spend some time on :-)

Maybe in closed systems like IP networks, with use of streaming telemetry from SFPs/NPUs/LC-CPUs/Protocols/etc.., we’ll be able to feed the analytics tool with enough data to allow it to make fairly accurate predictions (i.e. unlike in weather or markets prediction tools where the datasets (or search space -as not all attributes are equally relevant) is virtually endless).

I’m with you. I also believe that better (even programmable) telemetry will unlock powerful analysis tools.

Best,
Laurent

PS: Thanks a lot to those who have already answered our survey! For those who haven’t yet: https://goo.gl/forms/HdYNp3DkKkeEcexs2 (it only takes a couple of minutes).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20190117/58c28353/attachment.html>