A case against vendor-locking optical modules

Chuck Anderson cra at WPI.EDU
Sat Dec 6 13:37:01 UTC 2014


On Sat, Dec 06, 2014 at 11:51:56AM +0200, Saku Ytti wrote:
> a) one particular optic had slow i2c, vendor polled it more aggressively than
> it could respond. Vendor polling code didn't handle errors reading from i2c,
> but instead crashed whole linecard control-plane.
> Vendor claimed it's not bug, because it didn't happen on their optic. I tried
> to explain to them, they cannot guarantee that I2C reads won't fail on their
> own optics, and it's serious problem, but was unable to convince them to fix
> it.
> Now I am in possession of good bunch of SFP I can stick to your routers in
> colo, have them crash, and you won't have any clue why they crashed.
> 
> b) particular vendor had bug in their SFP microcontroller where after 2**31
> 1/100 of a seconds had passed, it started to write its uptime to a location
> where DDM temperature measurements are read. This was obvious from graphs,
> because it went linearily from -127 ... 127, then jumped back to -127.
> These optics when seated on Vendor1 caused no problems, when seated on Vendor2
> they caused link flapping, even two boxes away! (A-B-C, A having problematic
> optic, B-C might flap). Coincidentally Vendor2 is same as in case a), they
> didn't consider this was bug in their code.
> This was particularly funny, if you rebooted 100 boxes in a maintenance
> window, then the bug would trigger at same moment after 2**31 1/100th of a
> second, causing potentially major outage.

Who is Vendor2?



More information about the NANOG mailing list