DevOps workflow for networking

James Bensley jwbensley at gmail.com
Tue Aug 22 08:18:02 UTC 2017


On 10 August 2017 at 01:52, Kasper Adel <karim.adel at gmail.com> wrote:
> We are pretty new to those new-age network orchestrators and automation,
>
> I am curious to ask what everyone is the community is doing? sorry for such
> a long and broad question.
>
> What is your workflow? What tools are your teams using? What is working
> what is not? What do you really like and what do you need to improve? How
> mature do you think your process is? etc etc

The wheels here move extremely slowly so it's slowly, slowly catchy
monkey for us. So far we have been using Ansible and GitLab CI and the
current plan is to slowly engulf the existing network device by device
into the process/toolset.

> Wanted to ask and see what approaches the many different teams here are
> taking!
>
> We are going to start working from a GitLab based workflow.
>
> Projects are created, issues entered and developed with a gitflow branching
> strategy.
>
> GitLab CI pipelines run package loadings and run tests inside a lab.

Yes that is the "joy" of GitLab, see below for a more detailed
breakdown but we use docker images to run CI processes, we can branch
and make merge requests which trigger the CI and CD processes. It's
not very complicated and it just works. I didn't compare with stuff
like BitBucket, I must admit I just looked at GitLab and saw that it
worked, tried it, stuck with it, no problems so far.

> Tests are usually python unit tests that are run to do both functional and
> service creation, modification and removal tests.
>
> For unit testing we typically use python libraries to open transactions to
> do the service modifications (along with functional tests) against physical
> lab devices.

Again see below, physical and virtual devices, and also some custom
python scripts for unit tests like checking IPv4/6 addresses are valid
(not 999.1.2.3 or AA:BB:HH::1), AS numbers are valid integeters of the
right size etc.

> For our prod deployment we leverage 'push on green' and gating to push
> package changes to prod devices.
>
> Thanks

Yeah that is pretty much my approach too. Device configs are in YAML
files (actually multiple files). So one git repo stores the
constituent YAML files, when you update a file and push to the repo
the CI process starts which runs syntax checks and semantic checks
against the YAML files (some custom python scripts basically).

As Saku mentioned, we also follow the “replace entire device config”
approach to guarantee the configuration state (or at least “try” when
it comes to crazy old IOS). So this means we have Jinja2 templates
that render YAML files into device specific CLI config files. They
live in a separate repo and again many constituent Jinaj2 files make
one entire device template. So any push to this Jinja2 repo triggers a
separate CI workflow which performs syntax checking and semantic
checking of the Jinja2 templates (again, custom Python scripts).

When one pushes to the YAML repo to update a device config, the syntax
and semantic checks are made against the YAML files; they are then
“glued” together to make the entire device configs in a single file,
the Jinja2 repo is checked out, the entire YAML file is used to feed
the Jinja templates and configs are built and now the vendor specific
config needs to be syntax checked.

This CD part of the process (to a testing area) is a WIP still, for
Junos we can push to a device and use “commit check” for IOS and
others we can’t. So right now I’m working on a mixture of pushing the
config to virtual IOS devices and to physical kit in the lab but this
also causes problems in that interface / line card slot numbers/names
will change so we need to run a few regex statements against the
config to jimmy it into a lab device (so pretty ugly and temporary I
hope).

When the CD to “testing” passes then CD to “production” can be
manually triggered. Another repo stores the running config of all
devices (from the previous push). So we can push the candidate config
to a live device (using Ansible with NAPALM [1]) and get a diff
against the running config, make the “config replace” action, then
download the running config and put that back into the repo. So we
have a local stored copy of device configs so we can see off-line the
diff’s between pushes. It also provides a record that the process of
going form YAML > Jinaj2 > to device produces the config we expected
(although prior to this one will have had to make a branch and then a
merge request, which is peer reviewed, to get the CD part to run and
push to device, so there shouldn’t be any surprises this late in the
process!).

Is it fool proof, no. It is a young system still being design and
developed. Is it better than before, hell yes.

Cheers,
James.

[1] Ansible and NAPALM here might seem like overkill but we use
Ansible for other stuff like x86 box management so this means
configuring a server or a router is abstracted through one single tool
to the operator (i.e. playbooks are use irrelevant of device type,
rather than say playbooks for servers but python scripts for
firewalls). Also we use YAML files as config files for x86 boxes also
living in GitLab with a CI/CD process so again, one set of tools for
all.



More information about the NANOG mailing list