Whats so difficult about ISSU

Jimmy Hess mysidia at gmail.com
Sun Nov 11 16:33:37 UTC 2012


On 11/8/12, Mikael Abrahamsson <swmike at swm.pp.se> wrote:
> On Thu, 8 Nov 2012, Phil wrote:
> NSR isn't ISSU.
The equipment vendors call upgrades with NSR failover, ISSU; if their
marketing people feel that a 0.5 or 6 second hit is "good enough"..
If you care about the 0.5 seconds, it's important you speak their
language, and require that vague expressions such as "In-Service
Software Update"  be clarified.
Personally, I don't trust any of it; routers should have regular
maintenance windows, period, with a minimum duration of 30 minutes.
And software updates to fix known bugs should be done regularly, and
during those windows.


NSR for ISSU,  or ISSU with a small hit called ISSU, is likely
inexpensive for the network equipment vendors, because they already
invested hundreds of thousands of developer hours in implementing and
validating NSR functionality to provide redundancy against device
failure.

The process of replacing code on a hot device,  and restructuring any
stored data to match expectations of the new code, without suspending
or delaying execution of any code during that process, is possible,
but a non-trivial problem:   whose solutions  add  complexity  (and
therefore a higher risk of bugs and unexpected results) to the upgrade
process.

You might reduce the hit from 0.5 seconds to 0.01 seconds  by
implementing true in-place upgrade  90% of the time;  but  10% of the
time,   the online upgrade either fails, because of an issue with the
online patch,  or  unexpected interactions between partially patched
functional units,  result in a period of incorrect device operation
--- until the patching finishes,  and continued use of bad data even
after patching finished.


> ISSU contains the wording "in service". 6 seconds of outage isn't "in
> service". 0.5 seconds of outage isn't "in service". I could accept a few
> microseconds of outage as being "ISSU", but tenths of seconds isn't in
> service.

What is the maximum percentage more would your organization be  able
to justify paying the network equipment vendor for routers/switches,
to reduce the  ISSU  hit  from   0.5  seconds  to a few microseconds?
 :)


>> The main remaining hurdle is updating microcode on linecards, they still
>> need to be rebooted after an upgrade.

> --
> Mikael Abrahamsson    email: swmike at swm.pp.se
--
-JH




More information about the NANOG mailing list