few big monolithic PEs vs many small PEs

Fri Jun 21 07:10:17 UTC 2019

Hey Saku,

> From: Saku Ytti <saku at ytti.fi>
> Sent: Thursday, June 20, 2019 7:04 AM
> 
> On Wed, 19 Jun 2019 at 23:25, <adamv0025 at netconsultings.com> wrote:
> 
> > The conclusion I came to was that *currently the best approach would
> > be to use several medium to small(fixed) PEs to replace a big
> > monolithic chasses based system.
> 
> For availability I think it is best approach to do many small edge devices.
> Because software is terrible, will always be terrible. People are bad at
> operating the devices and will always be. Hardware is is something we think
> about lot when we think about redundancy, but it's not that common reason
> for an outage.
> With more smaller boxes the inevitable human cockup and software defects
> will affect fewer customers. Why I believe this to be true, is because the
> events are sufficiently rare and once those happen, we find solution or at
> very least workaround rather fast. With full inaction you could argue that
> having A3 and B1+B2 is same amount of aggregate outage, as while outage in
> B affects fewer customers, there are two B nodes with equal probability of
> outage. But I argue that the events are not independent, they are
> dependent, so probability calculation isn't straightforward. Once we get
> some rare software defect or operator mistake on  B1, we usually solve it
> before it triggers on B2, making the aggregate downtime of entire system
> lower.
>
Yup I agree, 
Just on the human cockups though, we're putting more and more automation in to help address the problem of human imperfections.
But automation can actually go both ways, some say it helps with the small day to day problems but occasionally creates a massive one.
So considering the B1 & B2 correlation if operations on these are automated then, depending on how the automation system is designed/operated, one might not get the chance to reflect/assess on B1 before B2 is touched -so this might further complicate the equation for the aggregate system downtime computation.

> > Yes it will cost a bit more (router is more expensive than a LC)
> 
> Several of my employees have paid only for LC. I don't think the CAPEX
> difference is meaningful, but operating two separate devices may have
> significant OPEX implications in electricity, rack space, provisioning,
> maintenance etc.
> 
> > And yes there is the "node-slicing" approach from Juniper where one
> > can offload CP onto multiple x86 servers and assign LCs to each server
> > (virtual
> > node) - which would solve my chassis full problem -but honestly how
> > many of you are running such setup? Exactly. And that's why I'd be
> > hesitant to deploy this solution in production just yet. I don't know
> > of any other vendor solution like this one, but who knows maybe in 5
> > years this is going to be the new standard. Anyways I need a
> > solution/strategy for the next 3-5 years.
> 
> Node slicing indeed seems like it can be sufficient compromise here between
> OPEX and availability. I believe (not know) that the shared software risks are
> meaningfully reduced and that bringing down whole system is sufficiently
> rare to allow availability upside compared to single large box.
> 
I tend to agree, though as you say it's a compromise nevertheless.
If one needs to switch to a new version of fabric in order to support new line-cards or upgrade code on the base system for that matter - the whole thing (NFVI) needs to be power-cycled. 

adam