Mitigating human error in the SP

David Hiers hiersd at
Thu Feb 4 06:08:57 UTC 2010

You can completely implement Vijay's most impressive stuff and simply
move the problem to a different level of abstraction.

No matter what you do, it still comes down to some geek banging on
some plastic thingy.  I'm as likely to screw up an "Extensible
Entity-Attribute-Relationship" as I am an ACL.


On Wed, Feb 3, 2010 at 8:14 AM, Ross Vandegrift <ross at> wrote:
> On Mon, Feb 01, 2010 at 09:46:07PM -0500, Stefan Fouant wrote:
>> Vijay Gill had some real interesting insights into this in a
>> presentation he gave back at NANOG 44:
>> His Blog article on "Infrastructure is Software" further expounds
>> upon the benefits of such an approach -
>> That stuff is light years ahead of anything anybody is doing today
>> (well, apart from maybe Vijay himself ;) ... but IMO it's where we
>> need to start heading.
> Vijay's stuff is fascinating.  The vision is great.  But in my
> experience, the vendors and implementations basically ruin the dream
> for anyone who doesn't have his pull.
> I'm sure my software is nowhere close to being as sophisticated as
> his, but my plans are pretty much in line with his suggestions.  Some
> problems I've run into that I don't see any kind of solution for:
> 1) Forwarding-impacting bugs: IOS bugs that are triggered by SNMP are
> easily the #1 cause of our accidental service impact.  Most seem to be
> race conditions that require real-world config and forwarding load -
> not something a small shop can afford to build a lab to reproduce.  If
> we stuck to manual deployment, we might have made a few mistakes but
> would it have been worse?  Maybe - but honestly, it could be a wash.
> 2) Vendor support is highly suspicious of automation: anytime I open a
> ticket, even unrelated to an automated software process, the first
> thing the vendor support demands is to disable all automation.
> Juniper is by far the best about this, and they *still* don't actually
> believe their own automation tools work.  Cisco TAC's answer has
> always been "don't ever use SNMP if it causes crashes!"  Procurve
> doesn't even bother to respond to tickets related to automation bugs,
> even if they are remotely triggerable crashes in the default config.
> 3) Automation interfaces are largely unsupported: I imagine vendor
> software development having one or two guys that are the masterminds
> for SNMP/NETCONF/whatever - and that's it.  When I have a question on
> how to find a particular tool, or find a bug in an automation
> function, I can often go months on a ticket with people that have no
> idea what I'm talking about.  What documentation exists is typically
> incomplete or inconsistent across versions and product lines.
> 4) Related tools prevent reliable error reporting: as far as I can
> tell, Net-SNMP returns random values if a request fails; if there's a
> pattern, I've failed to discern it.  expect is similar.  ScreenOS's
> SSH implementation always returns that a file copy failed.  Procurve
> only this year implemented ssh key-based auth in combination with
> remote authentication.  The best-of-breed seems to be an oft-pathetic
> collection of tools.
> 5) Management support: developing automation software is hard - network
> devices aren't nearly as easy to deal with as they should be.  When I
> spend weeks developing features that later causes IOS to spontaneously
> reload, people that don't understand the relation to operational
> impact start to advocate dismantling the automation just like the
> vendors above.
> I'm sure we'll continue to build automated policy and configuration
> tools.  I'm just not convinced it's the panacea that everyone thinks.
> Unless you're one of the biggest, it puts your network at someone
> else's mercy - and that someone else doesn't care about your
> operational expenses.
> Ross
> --
> Ross Vandegrift
> ross at
> "If the fight gets hot, the songs get hotter.  If the going gets tough,
> the songs get tougher."
>        --Woody Guthrie
> Version: GnuPG v1.4.9 (GNU/Linux)
> 5nEAoMnrd2YLrSzGkA71N8vRgFWG/SL1
> =FQbw

More information about the NANOG mailing list