Traffic Engineering [was Chanukah [was Re: Hezbollah]]
pkavi at pcmail.casc.com
pkavi at pcmail.casc.com
Wed Sep 17 14:59:43 UTC 1997
Kent,
As a former network design guy who's done traffic engineering and
design (and redesign) on many networks (Internet and otherwise), I
disagree that traffic engineering doesn't work for the Internet.
I've seen many people go with the "throw bandwidth at the problem" as
a cure all. While it tends to work, it tends to be the most expensive
method of solving the problem.
Doing traffic engineering right is hard. The telcos have it down pat
for their voice networks, and telco-based ISPs often have applied
this design expertise to their ISP network. Having a person do
traffic engineering can save the ISP big bucks.
The traffic engineering techniques I'm talking can't handle wildly
dynamic situations. For example, a news event like Princess Di's
death greatly increases traffic to/from England which plays temporary
havoc with forecasted traffic projections. However, outside of these
anomalies, traffic projections work pretty well.
I've outlined a basic technique below which works for many types of
networks, and has some ISP specific steps. The key to this analysis is
that it takes into account the underlying traffic flows and then
determines the appropriate physical backbone topology, or the changes to
be made to an existing topology. This is directly in contrast to the
"throw bandwidth at the problem" case that patches a backbone topology
which might be sub-optimal in the first place.
Here's the overall outline:
1. Divide your network into a small number of geographic areas
(between five and ten). Each geographic area you choose probably
has a large city that serves as a major traffic source for that
area. These cities are usually the natural cities for backbone
connectivity.
Create an NxN matrix, where N is the total number of areas
in your network. Each cell in the matrix will represent the total
traffic demand between each source/destination geographic area.
There are several factors which effect this matrix, each of which
will be discussed below.
1. The locality of traffic.
2. The typical utilization of customers.
3. The entry/exit points of traffic from your network.
2. Identify which % of traffic, if any, has regional locality.
For pure Internet traffic, the probability that the source and
destinatino of traffic are within the same metropolitan area
tends to be low (10% or lower for metros within the US).
However, there are exceptions. Telecommuting applications
tend to have very high locality. People close to work dial
into work through an ISP, so both the source and destination
of traffic tends to stay local (70% or higher).
Places like the Bay Area in California also tend to have higher
traffic locality. This is because the Bay Area has lots of
Internet users (which tend to be traffic sinks), and lots of
web sites (which tend to be traffic sources).
ISPs outside of the US tend to have a higher percentage of
traffic staying within the country, especially non-english
speaking countries.
3. Measure/estimate the typical utilization of customers.
Utilization needs to be measured/estimated in both send
and receive directions. Dial-up users typically receive
almost seven times as much as they send. Corporate customers
not doing telecommuting applications tend to receive about
four times as much as they send (less because corporations
have web sites that others access). Web server farms have
the opposite characteristics of dial-up users.
Percentage utilization tends to increase with bandwidth.
In the U.S., a T1 customer connection typically has a
peak recieve utilization of 20% or less. However, a DS3
customer can easily have a receive utilization of 50% or
more. Simple explanation is that someone paying big bucks
for a DS3 wants to make sure it is justified.
So, take the total number of users in each area, the
connection speeds and customer types, multiply by the
appropriate factors, and you get the total demand you are
trying to serve out of each area.
Take this traffic demand, and multiply it by the non-local
traffic. This represents the total traffic that you need
to get either in/out of the network, or in/out of this
particular part of your network.
4. Determine the entry/exit points for traffic with your network,
and its effect upon the traffic matrix.
How do you setup your routing policies? Many ISPs use nearest
exit. If the nearest exit is in the same geographic area, the
traffic sent by your customers does not affect any other part
of the overall traffic matrix. If the nearest exit is not
within the same geographic area, determine the area where this
traffic will be sent. Enter this value in the appropriate
source/destination box of the traffic matrix.
It gets harder when peering with many other ISPs, some of whom
you connect to in the same area, and others in remote areas.
In this case, determine which percentage of the traffic goes
into each particular region, and
The main traffic sources into your network (excluding your
customers) are your peering points (both public and private).
The amount of traffic from each peering point is measurable.
You can generally estimate that this traffic is to be
distributed proportional to the overall traffic demand in each
geographic area.
This is a significant amount of matrix math, but the overall
concept is simple. Determine the overall flow between one
part of your network to another.
5. With me so far? Good, now it's time to design your backbone to
handle your demands. You can use dedicated lines or layer two
services such as Frame Relay or ATM.
The simplicity of using Frame Relay or ATM is that the circuits
you need between each geographic area has been defined by your traffic
matrix. This is part of the appeal of using public L2 services for a
backbone.
Designing your own backbone is a bit harder. The actual topology
tends to be straightforward--you need to connect up the major cities
in each of the geographic areas. For five areas, a simple ring
suffices. For up to 10 areas, this tends to be rings bisected once or
twice.
The real work in designing your own backbone is in satisfying the
traffic demands going across your network. Remember that geographic
areas in the center of your network have to carry the traffic demands
going across your network. This imposes a heavier burden in the
center of the network than the traffic matrix would indicate. You
also have to worry about resiliency, having sufficient bandwidth
when the backhoes go fiber hunting, etc.
6. Design the network within each geographic areas.
The steps for designing the network within each geographic area tend
to be similar to that of designing the overall network. Breaking the
overall design process into a regional network and backbone network
makes the problem more tractable.
7. Measure data from a real network.
This is really important. You've made lots of assumptions. Regularly
check the overall traffic to see if it matches the assumptions.
Refine the traffic matrix to see if it still represents reality.
Create trendlines which show the overall traffic changes to/from each
area, and project these trendlines into the future. You will tend to
have pretty good certainty about 4 months into the future, with the
value of the information decaying after that.
Use this data to determine where to add additional peering points.
Estimate what impact this would have on the traffic matrix.
8. Factor the measured and projected data into the next network backbone
design.
This next backbone design gives you the optimum backbone given the
underlying flows in your network. See what changes you need to
make to your backbone to get to this new optimum backbone, and
order the circuits.
Phew! Like I said earlier, it is hard to do right, and I've left out
quite a few details in the above outline. But having been there, done
that, (quite a few times) I can say it really works. And it saves
ISPs money!
Question for NANOG members. How important is traffic engineering
given that it is fairly hard to do properly and you folks have enough
other things to think about?
Prabhu Kavi
IP Business Marketing Manager
Ascend Communications
prabhu.kavi at ascend.com
______________________________ Reply Separator _________________________________
Subject: Chanukah [was Re: Hezbollah]
Author: "Kent W. England" <kwe at geo.net> at smtplink
Date: 9/16/97 2:09 PM
At 05:03 PM 14-09-97 -0400, Dorian R. Kim wrote:
>... One of the
>things that needs to be engineered into building and maintaining
>national/international backbones is traffic accounting to an arbitrary
>granularity that paves the way for better traffic engineering and
>bandwidth projections. There are already ample tools to to per-prefix
>matrix of traffic right now. Tying this in with good sales projections
>will alleviate much of the last minute fire fighting.
>
>This will most likely never be 100% accurate and precise, but there is
>no reason why we can't get a better handle on bandwidth forecasts. (say to
>95% percentile)
Dorian;
I don't want to throw cold water on the value of planning and foresight,
but in terms of predicting traffic patterns it has never worked on the
Internet. It sounds good and that was the argument that all the mainframe
networkers made to us early Internet networkers -- Why can't you tell me
upfront what your bandwidth requirements are going to be? Don't you know
exactly how many terminals you have and where they are and what application
keystrokes are going to be pressed at any given time? How else can you
guarantee response time in your network? This Internet stuff is stupid.
It'll never work.
Somehow with the way that HTTP/HTML caught fire and Internet-CB (aka
VocalTec and CUSeeMe) took off, I would be loath to think I could project
my backbone needs with any reliability based on *historical* projections.
>
>Furthermore, with the deployment of WDM and Internet core devices moving
>closer to the transmission gear, if you have access to fiber, getting more
>bandwidth may become as straightforward as using an additional wavelength
>on the ADM that your router's plugged into.
>
>-dorian
>
>
>
This I like a lot better as a design technique. Throwing more bandwidth at
the problem almost always works (unless the transport protocol is broken).
Like Peter Kline said, Turn up the speed dial upon onset of congestion.
Simple. Effective.
Then again, creating a data architecture for the web (a problem that has
been recognized, but not addressed in the last five years) would eliminate
much of the backbone bandwidth demand. What would happen if -- presto -- a
data architecture for the web showed up one day? A lot of backbone
bandwidth would become surplus and a lot more edge bandwidth would be
needed ASAP. What does that do to historical projections?
--Kent
More information about the NANOG
mailing list