Revisiting the Aviation Safety vs. Networking discussion

Frank Bulk frnkblk at
Fri Dec 25 14:38:08 CST 2009

Shops where engineering and operations function separately can suffer from
reduced efficiencies.  A recent example comes to mind.  Vendor X was onsite
turning up some equipment, including a small VPN concentrator for remote
access.  It was a new model of VPN concentrator that the installers hadn't
worked with before.  They used "scripts", a set of a CLI commands with
field-replaceable variables for site specific parameters, to configure the
device.  But connections to the VPN were failing.  After trying different
versions of the scripts (for similar models) they "broke down" and called
their internal tech support department for help.  Total turn-up time for the
concentrator: 8+ hours.  There wasn't that much wrong with the script that
kept it from working, but the ops folks lacked the training to understand
the problems and fix them.  On the other hand, the engineering folks should
probably have produced a more robust set of scripts.

While having no experience myself, it would seem a good practice that every
project, including the actual turn-up, include representation from
engineering.  This automatically creates a liaison between the two groups
and keeps the engineer abreast of "real world" issues.  


-----Original Message-----
From: Michael Dillon [mailto:wavetossed at] 
Sent: Thursday, December 24, 2009 6:02 PM
To: NANOG list
Subject: Re: Revisiting the Aviation Safety vs. Networking discussion

>> imagine a network engineering culture where the concept of 'attempt to
>> deviate' just does not occur.
> Are you trying to suggest that this is something horrible, or that it's
the future of network engineering? :)

The model of network engineering that grew up during the 1990s is
forever gone unless you work
in a smaller organization where people have to wear many hats. In the
big ISPs, now identical to
the big telcos, operations and engineering design duties are
separated. The operations folks
do not deviate from the written plans that they work with. If the
slightest thing happens that is not
in the plan, they rollback the changes as specified in the plan. They
don't fix anything unless it
is officially broken with trouble tickets filed and escalations up to
senior management. That is
about the only time that operations people can get away with taking
shortcuts and creative solutions.

On the other hand, the engineering design folks should spend a good
part of their day trying out
things, thinking up new ideas, poking around equipment and software to
see how far it can be pushed.
Then, when they have learned something and are ready to implement it
in the network, they write
a detailed plan for operations. Then some other engineering folks test
the heck out of that design
to try and find fault with it. After all the faults are fixed, it goes
to operations and the engineering
design folks move on to something else unless serious problems occur
and operations needs
a design engineer to approve some sensible action to be taken. The
operations folk can't take
the sensible action because that would deviate from their plans, but
getting engineering design
folks involved, gives them an out for real emergencies.

So the term "network engineering" is ambiguous because a lot of people
use it to mean the 90's
style job where engineering design activity and operational activity
were all jumbled together.

In some companies, taking the engineering design track not only means
that you lose enable
on the routers, but you lose all TACACS access and have to get
authorisation from a VP just
to ask for a copy of the running config on a production router. Some
people like ops because
they see a lot of stuff go by and learn from it, get their CCIE and
move into design engineering.
Others like ops because they are scared of the responsibility for
thinking up what to do next,
and making a mistake.

As far as I can see, the only way to get a job that mixes ops and
design is to be in 3rd or 4th
level support which is the top of the technical escalation chain where
a few excellent design
engineers do have enable on the routers because they fix important
problems in near realtime.
I suspect that it would be advantageous to have a career in which you
worked for a while in
ops before moving into design engineering if you want to get into
top-level support.

Take all this with a grain of salt. Every company does things a bit
different, and the terminology
that is used is ambiguous. It would be interesting to see what others
have to say about this

--Michael Dillon

More information about the NANOG mailing list