shawnl at up.net
Wed Jun 17 23:04:56 UTC 2020
We _always_ have at least one spare, or something that could be (relatively) easily pressed into service as one.
Even in the Midwest, we've had times where 'guaranteed next day replacement' is more like 2nd or third day due to weather conditions, the carrier routing it weird, or just plain the plane didn't come today issues. We generally laugh when they try to offer us 4 hour contracts -- we know there's 0 chance they can meet them, and they never want to refund you when you need it and they can't.
From: "Warren Kumari" <warren at kumari.net>
Sent: Wednesday, June 17, 2020 6:50pm
To: "Owen DeLong" <owen at delong.com>
Cc: nanog at nanog.org
Subject: Re: Router Suggestions
On Tue, Jun 16, 2020 at 5:28 PM Owen DeLong <[ owen at delong.com ]( mailto:owen at delong.com )> wrote:
> On Jun 16, 2020, at 1:51 PM, Mark Tinka <[ mark.tinka at seacom.mu ]( mailto:mark.tinka at seacom.mu )> wrote:
> On 16/Jun/20 22:43, Owen DeLong wrote:
>> Covering them all under vendor contract doesn’t necessarily guarantee that
>> the vendor does, either. In general, if you can cover 10% of your hardware
>> failing in the same 3-day period, you’re probably not going to do much better
>> with vendor support.
> In my experience, our vendors have been able to abide by their
> obligations when we've had successive failures in a short period of
> time, as long as our subscription is up-to-date.
> I am yet to be disappointed.
Count your blessings… I once faced a situation where a vendor had shipped a batch of defective power supplies (10s of thousands of them). It wasn’t just my network facing successive failures
in this case, but widespread across their entire customer base… By day 2, all of their depots were depleted and day 3 involved mapping out “how non-redundant can we make the power in our
routers to cover the outages that we’re seeing without causing more outages than we solve?”
It was a genuine nightmare.
Huh, was this in the early to mid 1990’s?
I had an incident in NYC area where one of the large (at the time) datacenter/IXPs had a power outage, and their transfer switch failed to switch over. Customers were annoyed, so they promised another test, which also failed, dropping power to the facility again... now customers were hopping mad...
The next test was *just* of the generator, but with all of the work they had done they had (somehow) gotten the transfer switch *really* confused / hardwired into an odd state. This resulted in the facility being powered by both the street power and the generator (at least for a few seconds until the generator went “Nope!”)
These were of course not synchronized, and so 120V equipment saw 0V, then 240V, then some weird harmonic, then other surprising values. .. most supplies kind of dealt with this OK, but one of the really common models of router, from the largest vendor upped and died. This resulted in a few hundred dead routers and way exceeded the vendors spares strategies.
A number of customers (myself included) had 4 hour replacement contracts, which the vendor really could not meet - so we agreed to take a new, much larger/better model as a replacement.
I’ve had other situations involving early failures of just released line cards and such as well.
As I said, YMMV, but I’m betting your vendor doesn’t stock a second copy of every piece of covered equipment in the local depot. They’re playing the statistical probabilities just
like anyone else stocking their own spares pool. The biggest difference is that they’re
spreading the risk across a (potentially) much wider sample size which may better normalize
I don't think the execution is relevant when it was obviously a bad idea in the first place.
This is like putting rabid weasels in your pants, and later expressing regret at having chosen those particular rabid weasels and that pair of pants.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NANOG