Operations task management software?

Lee ler762 at gmail.com
Wed Jul 27 23:19:45 UTC 2016


On 7/27/16, David Hubbard <dhubbard at dino.hostasaurus.com> wrote:
> Hi all, curious if anyone has recommendations on software that helps manage
> routine duties assigned to operations staff?

Have computers do the routine scut work - not people.

> For example, let’s say we have a P&P that says someone from the netops group
> must check that Rancid is successfully backing up all router configs
> bi-weekly.

You've got the source code for rancid, so change rancid-run to do something like
  LOGFILE=$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S`; export LOGFILE
change the
  ) >$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S` 2>&1
to
  ) >$LOGFILE 2>&1

and then in control_rancid do something like
  grep "clogin error:" $LOGFILE | sort | uniq -c >$TMP.fail
  if [ -s $TMP.fail ]; then
     # got some output, mail the report
     ...

Do the same type thing for checking on
> backup failures, backup internet circuit status, out of band interfaces, etc.

Automate the checks, put the scripts in crontab & mail out an
"OhNoes!" or "all clear" msg at the end.   At which point you're left
with the problem of making sure the managers are looking at the emails
& making sure whatever problems are found actually get fixed :)

Regards,
Lee



>  Ideally, it would send an email reminder to this pre-defined
> group of people saying hey, it’s Monday, someone needs to check this and
> come acknowledge the task as having been completed.  If that doesn’t occur,
> pre-defined manager X is notified on Tuesday.  If manager X doesn’t get
> someone to complete the task, director Y is notified, so on and so forth.
> Then, perhaps periodically it emails manager X anyway and says hey, it’s
> been three months, you need to audit netops to ensure they’re actually doing
> the Rancid audit and not just checking that it was done.  This could be
> applied to the staff who check on backup failures, backup internet circuit
> status, out of band interfaces, etc.
>
> A data center I looked at recently had QR code stickers on all of their
> infrastructure stuff and there were staff assigned to check and log certain
> displayed values each day.  The software would at least ensure they actually
> visited the equipment by requiring they scan the relevant QR code when in
> front of it.  So I figure something that does what I’m looking for properly
> already exists.
>
> Thanks,
>
> David
>



More information about the NANOG mailing list