A case against vendor-locking optical modules

Saku Ytti saku at ytti.fi
Sat Dec 6 09:51:56 UTC 2014


On (2014-11-17 19:11 +0100), Jérôme Nicolle wrote:

> What are other arguments against vendor lock-in ? Is there any argument
> FOR such locks (please spare me the support issues, if you can't read
> specs and SNMP, you shouldn't even try networking) ?
> 
> Did you ever experience a shift in a vendor's position regarding the use
> of compatible modules ?

Your points are valid, I actually prefer 3rd party, even if it's more
expensive than 1st party, just to have simpler sparing reducing OPEX.

On RFP all vendors consistently have replied that they won't forbid using 3rd
party optics, and won't deny support contract because of them. They may add
that if optic is suspected, we need to replace it to 1st party, before TAC
continues to work on a case.

1st party does do more testing on the optics than what we typically do, so
some obvious problems will be avoided by buying 1st party. But if you are
somewhat careful at sourcing your 3rd party, you should be quite safe.

I have two examples of major problems caused by 3rd party.

a) one particular optic had slow i2c, vendor polled it more aggressively than
it could respond. Vendor polling code didn't handle errors reading from i2c,
but instead crashed whole linecard control-plane.
Vendor claimed it's not bug, because it didn't happen on their optic. I tried
to explain to them, they cannot guarantee that I2C reads won't fail on their
own optics, and it's serious problem, but was unable to convince them to fix
it.
Now I am in possession of good bunch of SFP I can stick to your routers in
colo, have them crash, and you won't have any clue why they crashed.

b) particular vendor had bug in their SFP microcontroller where after 2**31
1/100 of a seconds had passed, it started to write its uptime to a location
where DDM temperature measurements are read. This was obvious from graphs,
because it went linearily from -127 ... 127, then jumped back to -127.
These optics when seated on Vendor1 caused no problems, when seated on Vendor2
they caused link flapping, even two boxes away! (A-B-C, A having problematic
optic, B-C might flap). Coincidentally Vendor2 is same as in case a), they
didn't consider this was bug in their code.
This was particularly funny, if you rebooted 100 boxes in a maintenance
window, then the bug would trigger at same moment after 2**31 1/100th of a
second, causing potentially major outage.


If you source from bad broker, and you hit issue b), you're screwed, because
the broker can't tell you which optic you have are impacted by this issue,
because bad brokers don't keep tracks on where they source optics, what serial
numbers they are, what parts are inside the optics etc.
If you source from 1st party, and you hit issue b), 1st party will be able to
tell exactly which serial number range is impacted. But this is also true if
you use professional 3rd party.

-- 
  ++ytti



More information about the NANOG mailing list