cost of misconfigurations
mysidia at gmail.com
Wed Aug 1 20:14:39 CDT 2012
On 8/1/12, Diogo Montagner <diogo.montagner at gmail.com> wrote:
I think it's more complicated than that, the cost of misconfiguration
is almost inseparable
in some cases from the cost of configuration in general.; not all
misconfigs are equal, so you might want to concentrate on a specific
kind of misconfiguration, or a specific misconfig impact "E.g. an
erroneous filter is applied, causing routes to be accepted from an EGP
peer without restriction". Esp. with misconfigurations that might not
have an immediately discovered impact, business impact beyond cost
to discover and resolve may not be apparent, which depend on details
of the misconfig, such as how trivial or 'obvious' the error
should be, how consistent the problems it causes.
At least if you concetrate on a certain specific type of misconfig and
specific impact, you can have a basis for comparison and
approximation, for just that type though.
The "fix" to some types of misconfigs might sometimes be to update the
design documentation, so the "misconfig" is no longer a
misconfiguration; so then you can start asking about how you
define "misconfig" in the first place, and the costs of having
erroneous or missing documentation.
Which is hard, because the "costs" of updating documentation and
finding errors, less than best/optimal practices, or improvements
possible in configurations, are effected by long term "costs" or
loss of efficiencies resulting from failing to correct
documentation, and failing to review and improve arguably
Some misconfigs or suboptimal configs are discovered by review or
other measures before there is any operational impact. Some
misconfigs are "safe" or "harmless" by coincidence, but can cause
issues later when the network is expanded farther according to design
that does not anticipate the misconfig, so the cost there is
Not all possible misconfigurations of a network cause an outage, some
misconfigurations are actually design errors, not operator errors;
not all network issues are outages, some configuration errors are
just things like
"Some entries in an access-list that are dead-weight, e.g. can never
be reached, or is not necessary"; and the impact of this error is
wasted memory resources, or increased complexity / more unnecessary
stuff for humans to look at.
(The entry might not have been dead-weight when originally added.)
Correcting the deadweight ACL entry situation then is an improvement
Not all misconfigurations are detected, either, possibly, sometimes
even misconfigs that caused issues.
An example of a misconfiguration that would occur frequently in some
kinds of environments and might not break an uptime SLA, would be
suboptimal performance, less cost-effectiveness (E.g. early
upgrade required due to an unrecognized misconfiguration).
Or configuration deadweight utilizing so much memory, that hardware
upgrades become needed. On some networks, there might not be a
formal SLA, and the end user might not notice or take issue with it.
Loss of fault resilience (E.g. failover path won't work); no SLA is
violated if the
fault tolerance wasn't required by the SLA, and the configuration
error might go undetected
for years if there was not regular failover testing performed.
It might be corrected before there is an issue... then the cost of
"Increased risk" during the period, in which the misconfig wasn't
service-effecting could be quite nebulous.
> I never saw any literature about this topic. But I think it is not too
> difficult to calculate (or estimate).
> A misconfiguration will, at least, impact on two points: network
> outage and re-work. For the network outage, you have to use the SLAs
> to calculate the cost (how much you lost from the customers' revenue)
> due to that outage. On the other hand, there is the time efforts spent
> to fix the misconfiguration. Under the fix, it could be removing the
More information about the NANOG