2009.10.19 NANOG47 Monday notes, second half
mpetach at netflight.com
Mon Oct 19 16:32:01 CDT 2009
Here's my notes from second half of NANOG today.
Now off to bear and gear. :)
2009.10.19 NANOG 47 Monday notes, part 2
Mike Hughes starts things off after lunch at
1436 hours Pacific time.
Few bits of administrivia still.
If you want to submit a lightning talk, you can do
it up until 7pm today.
Please vote for the committee members!
PC nominations close this evening as well;
if you'd like to be on it, do that as well,
as much help as possible is needed.
3 lightning talks next up.
First up is Ernest McCracken
NetViews: real time visualization of
Internet Path Dynamics for Network Management
Started doing this as part of his undergrad work.
Goal was to help researchers visualize network paths.
Topology mapping typically try to represent
Scatter, skitter, Rocket Fuel, CAIDA,
why graph in realtime?
monitor realtime reachability
spot anomalous depeering
identify route hijacking and misconfigurations
developing next-gen routing monitor system.
BGPMon -- realtime lightweight BGP monitor
with over 70 peers--allows for fast updates
NetViews - visualizes both control plane paths
(via BGP updates) and forwarding paths (via
BGPMon is running, you connect to it, get the
routing updates; data broker sends BGP updates.
Prober probes target network from BGP peers to
get path updates.
GeoCoder and IP crawler get geographic info,
and traceable IPs for probing.
Slide showing data pathway
They probe during routing events; a timeline
showing BGP updates during the timeline. They
keep probing until they see no additional updates.
Visualization filters to show networks based on the
number of ASes an AS connects to.
You can see the updates scroll in realtime on the
live map as the updates come into the system.
Blue is path additions/changes, Red for changes.
They can also visualize forwarding paths, but
there's challenges in inferring forwarding paths
based on traceroutes.
correlate forwarding and routing dynamics to create
a classification model for internet paths
add scalability by having clients run traceroute jobs
in a P2P fashion
Give client users the ability to communicate with each
Funded by NSF, and collaborating with UCLA, ColoState
and UofO on BGPMon system.
Q: Dave Meyer--can it be run internally? What
infrastructure do you need? Server portal runs
in lab, clients can run on any java client.
Synch up with him afterwards if you have any
summer internships available.
Jim Cowie from Renesys
The recession and the routing table
Reading the tea leaves
They dig into the routing tables to see what's happening.
Tough times, tough questions
We konw that internet transit purchases are sensitive
to business conditions (2000 crash)
is the 2008-2009 recession affecting growth in
the global/regional routing tables
Should be some sign of pullback in the routing tables
like in 2000.
3 years of North American routing--it's still going up,
there's no depression visible.
Why did the table keep growing?
Enterprises don't cut costs by leaving the internet,
they cut costs by reducing diversity
cheap transit getting cheaper acts like "easy money"
prospect of v4 runout may result in "use it or lose
it" addition of routes into table.
Half the table is just hanging out with 1 provider.
Number of prefixes with 4 or more providers is going
The 1 provider networks either go to no-longer-advertised
or shift to 2 or 3 providers.
More go to the "no-longer-seen" pool; fewer upgrade
to the next category up.
People postpone getting to multihoming.
Triple-homing seems to be sweet spot.
4 or more provider pool is getting larger
and more stable over time; you don't tend
to decrease over time.
Global recession might give more of a break
before v4 exhaustion
Cheap transit killed that theory
some evidence of single- and dual- homed customers
putting off the move to higher order multihoming
in 2007 and 2008
"obviously practicing for IPv6 transition, after which
apparently multihoming becomes unnecessary"
Otherwise, growth continues apae
Bring on the post-IPv4 marketplace!
Q: Randy Bush--BGP is a great data hiding system;
it doesn't tell you much about the real topology
of the internet. How do you determine how a prefix
has a single upstream?
A: ask him afterwards.
Q: is this transit AS?
Q: You have to have seen the AS through another AS,
that's how you can count the upstreams.
Joe Abley up to the front from ICANN
DNSSec for the Root Zone
Matt Larson, from VeriSign.
Info update for those who care about DNSSec
collaboration between ICANN and VeriSign with DoC
ICANN is IANA functions operator
Manages the Key Signing Key
Accepts DS records from TLD operators
Verifies and processes request
Sends update requests to DoC for authorization
and to VeriSign for implementation
Authorizes changes to the root zone
Root key sets
manages the zone signing key
Proposed Approach to protect the KSK
CPS--certificate practice statement
DPS, DNSSEC policy and practice statement
basically, to assure people the practices are
adequate to protect it.
proposal that community representatives have an active
role in management of the KSK
as crypto officers needed to activate the KSK
as backup key share holders protecting shares of the
symmetric keys in case of disaster recovery
Auditing and Transparency
Third-party auditors check that ICANN...
webcast of sessions
KSK is 2048 bit RSA key
rolled every 2-5 years
RFC5011 for automatic key rollovers
propose using signatures based on SHA-256
but there's no shipping code based on this
Zone signing Key (held by verisign)
ZSK is 1024-bit RSA
rolled once a quarter
Signature validityRRSIG validity 15 days
resign every 10 days
Other RRSIG validity 7 days
resign twice a day
Generation of new KSK
Every 2-5 years
Processing of ZSK signing request (KSR)
signing ZSK for the next upcoming quarter
Root Trust Anchor
published on a web site by ICANN as
XML-wrapped and plain DS record
to facilitate automatic processing
PKCS#10 certificate signing request
incremental roll out of the signed root
groups of root server "letters" at a time
watch the query profile to all root servers
as roll out progresses
Listen to community feedback for any issues
Real keys will be replaced by dummy keys
while rolling out the signed root
signatures not valid during roll out
actual keys will be published at end of rollout
December 1, 2009
root zone signed
initially signed zone stays internal to ICANN and Verisign
incremental roll out of signed root
July 1, 2010
KSK rolled out
root trust anchor
ISP Security BOF later today will talk about it.
Full architectural documents around the process will
be published in the next few weeks.
Next speaker is Paul Francis, talking about
Reducing FIB Size with Virtual Aggregation (VA)
ISPs often want to extend the life of old routers
Routers that have inadequate FIB but otherwise are
A common approach--use old routers as customer PE,
default to core
Other FIB/RIB shrinking tips
Filter out more specific routes
For lower-tier ISPs, default to transit ISPs
ie use 0/0 and load balance among transit ISPs
leads to non-optimal routes
lots of configuration (peer routes, "important" routes
Can't be used by transit ISPs themselves
Mitigating non-optimal default routes
Use more-specific "semi-defaults"
AS3303 Swisscom IP-Plus
point 62/8, 80/7, 21/7, etc. to EU transit ISP
ARIN space to US transit
class B 128/3, 160/5, 168/6 to US transit
IETF working on a more general solution: virtual agg
GROW working group
VA is a way to control FIB size in routers
DFZ FIB, not VPN tables
does not shrink RIB size
Tight control of FIB size for any or all routers
no coordination between ISPs
works with legacy routers
Important today--possibly critical tomorrow?
looking forward, BGP RIB growth rate could increase
exhaustion of v4 erodes aggregation
because of pressure to shrink default prefix size
uptake of v6
VA can help ease these pressures
VA not perfect
Requires configuration of its own
Entails a traffic load/FIB size tradeoff
which can be quite good
academic study on large transit ISP
10x fib reduction with negligible latency/load
But in general we don't know how easy to achieve
Why this talk?
You can help us define VA
certain protocols or configuration details
alternative ways to deploy
or tell us that VA is useless
encourage your vendor to implement VA
current implementations from Huawei and ??
VA Basic Idea
Define "Virtual Prefixes" (VP)
These are shorter (bigger) than real prefixes
think of /6s, /7s, /8s
Assign different routers to be "responsible" for
different virtual prefixes
ie, they need to know how to route everything in the VP
BGP runs as normal
all routers have full RIB
important to not muck with BGP operation per se
suppress updates to FIB for more specifics of
APR (aggregation point router) for 22/8
originate route to 22/8 with nexthop being itself
it FIB-installs all sub-prefixes within 22/8
other routers FIB-suppress all prefixes within 22/8
This just tunnel-maps from one router to another
out to the egress point.
The only router with the need to know how to route
that packet was APR1 (well, that, and the ingress
The packet takes a bit of a longer path to do
this with simple aggregates.
You can add "popular prefixes" to routers to point
them along "better" paths.
Types of tunnels defined
MPLS (using LDP)
A deployment example
Robert Rasuzck at Cisco
shows a POP site with 4 PE customer agg routers,
2 Rs, 2 RRs;
core can use tunnels between them already.
Use RRs as APRs -- can optionally
FIB-install routes for which PE is egress
If you do FIB suppression at the RR layer
Then need to install popular prefixes at the PE
layer--GROW looking to automate that part.
VA from our point of view
Figure out where you need FIB reduction
Based on this, design your deployment
assign routers as APRs, configure
New IETF GROW WG work item for FIB suppression
Q: Patrick, Akamai--this seems very complex; couldn't
we just take prefixes out of the FIB that are covered
by a shorter prefix with same next-hop; wouldn't that
be much easier to do, and save FIB space? Could we
maybe ask vendors to look into doing that?
A: Lixia may have done some looking into that; she
says that two people on her team, they found out
that you can compress your FIB between 10 and 50%
by simply suppressing more specifics with same
She was going to give a talk at GROW at the next
meeting that would do this.
Q: Doni from PeakWeb, was asking vendors for this
around the 200,000 routes in the FIB; the vendors
were wanting to simply sell more hardware.
Which routers need the full FIB in the drawing?
A: None of them need full routes. Generally got
about 10X saving in all of them.
Q: Owen deLong--if you already have all the
routers everywhere, it might make sense; if you
have just 2 routers in a POP, this looks like
a distributed CAM load, to have multiple routers
pretend to look like one router
A: yes, it's like that.
Q: RAS--remember the 8k Foundry boxes? They had 8k
CAM table, and their solution was to either have
just default, or break it up into /12s; this is
similar, it just limits based on number of next-hops
they have. Could we get benefits from doing more
simple aggregation like that?
Q: Igor notes you can probably just upgrade for
cheaper than transferring all sorts of routes
back and forth and paying for additional interconnect
Q: Anton Kapella, have they considered looking about
Auto-TE QoS stuff internally?
If packets are being redirected around internally,
it does mean something for link-loading; how will
this interact with QoS, since this will transport
packets along links not originally planned for it?
In what they saw, very few packets used the
We'll do coffee break at 1615, BOFs at 1645
BGP# - a system for dynamic route control
in data centers
tenants and landlords
owner and manager of the datacenter
search, email, gaming
utility computing customers
empower tenants to control routing decisions
tenants have different goals
tenant goal--spread traffic
or migrate traffic from one server to another
current system, tenant submits tickets to get routing
whole ticket flow is shown
Tenants have limited control over routing
A better system
allow for automated route control
allow tenants independent and safe route control
allow for maintenance changes
simple speakers (multispeaker)
peer with BGP routers
send route announcements/withdrawls (ECMP capable)
Stateful controller (controller)
controls coordinates speakers
custom API ("applications")
Application runs on tenant box; speaks to
controller via API; controller speaks to
multispeaker which peers with router to
send the update
to spread traffic, similar thing;
application uses API to ask controller which
asks mutispeaker; it has 2 sessions to router, with
2 next hops for prefix.
Automated route control
controller API allows for custom applications
Application can automatically manage routes
Independent and safe route control
only allow a tenant to change their own prefixes.
Multispeaker and controller not placed in machines
handling user trafic
eliminates need for one policy controller per machine
reduces peering sessions to router
eliminate per-ticket manual intervention
ensure system continues operating
instantiate multiple multispeakers
single multispeaker failure doesn't affect other MS ability
separate multispeaker and controller
prefix resiliency -- ensure prefix stays available
announce same prefix from mutiple multispeaker
router retains prefix even if one MS fails
Automation service could deploy a new multispeaker
with same config if one dies.
No inconsistency with multiple Multispeakers
suppose some multispeakers become unresponsive
BGP# listening tool detects the lack of router
suppose multispeaker reboots and is in different state?
get config and state from persistent store
each tenant sets up its own BGP instances
needs one session per machine
landlord may need to deal with many BGP peers
Tenants have more power
Landlord retains responsibility for validation of routes.
system achieves stability and resiliency
Q: Francis asks if BGP is an awfully coarsegrained
tool to use for something like this--what about
using MPLS for setting up flows.
A: BGP finite state machine is much simpler to follow
We'll go into coffee break now; BOFs start at 1645
hours Eastern time.
SC elections, JUST DO IT!!
PC nominations open for 3 more hours!
1800 hours in Regency for Bear and Gear.
BOFs, Mobile Data Track, ISP Security BOF,
and DNS BOF will be upstairs in DeSoto room.
Tuesday we start at 0830 again with breakfast.
For now, I head over to the DNS thingie BoF
IPv6 and resolvers; how do we make it less
For most people, rolling out IPv6 can't break
IPv4, and separate hostnames isn't scaleable
Per Google, 0.078% users are impacted by
enabling quad A on machines.
Assuming a user base of 600M that's 470K users that
get broken, which isn't acceptable.
Right now, in browsers, IPv4 fallback is on the
order of 21 seconds to 181 seconds, which is for
most SLA numbers considered "broken"
Don't roll out IPv6
prefer A over AAAA
accept the breakage
what about checking for working IPv6 connectivity
before sending back AAAA record.
Only way to know if user has working v6 transport
is if the AAAA request came via IPv6 instead of IPv4
Recursive servers need to be set to only return
AAAA *only* when request came in via IPv6; otherwise,
return A record only.
Now, auth DNS server only has to worry about IPv6
reachability to the recursive server.
We've asked if ISC can write this; ISC has done
this, it'll be in BIND 9.7; it'll be in a second
beta coming out in early November; if you want to
check it out, if you're on user list or beta list,
you'll get notification; otherwise, check ISC
site in early november and it should be there.
Feature will be a knob you have to turn on.
There's an additional check put in; if DNSSEC
is set, it won't forge DNS answers unless there's
a knob set "BREAKDNSSEC" that you can turn on,
the knob is going to be very well documented.
But if you've gone through the work of setting
up DNSSEC, you should know how to troubleshoot
This should be be set up for resolvers facing
customers, not for internal services that have
What about having an ACL for controlling
behaviour for different subnets?
If they fit in a view statement, you already
have that capability.
Will this be available within a view? Yes,
you can do it there.
But the ACL idea is interesting, and could be
better than pushing people towards views.
This really goes on the recursive side.
We need to try to convince ISPs to use these
If a request comes from a 6to4 address; the
source is a 6to4 address; do you respond with
AAAA or not.
How about ACLs with flexibility to see if it's
over v4 but from a 6to4 address to send different
A simple default policy is good, but the flexibility
is good for more experienced sites.
Simple on-or-off knob is in 9.7b2; more granular
control will be needed for later versions...
what about 6rd? They would get no AAAA results
in that scenario. We might need a DNSv6 option
for DHCPv4 which would be able to give back
the v6 DNS servers.
Should we put together an information draft
for IETF; we can draft one up, so the three
of them should talk;
Igor Gashinksi, Yahoo,
Larissa Shapiro, ISC,
Alain Durand, Comcast
OARC meeting, Beijing, Jason Fesler will be there
to talk about it.
ISOC meeting in Paris next week, we'll be there
to talk about it as well.
Internet2 joint techs talked about it as well.
General consensus is that this is a necessary
If we can get it working with 6rd, it'll be an
interesting working solution to a common problem.
to report DNS lookups back via REST infrastructure
to get an idea what the types of breakage.
OS, browser, IP, and which test cases break or not.
We could do a series of test queries, and see which
ones break or not.
Result of the query comes into the beacon server, so
we can see if they saw the reply or not.
with what *it* saw, as well as see what the server
collecting this data.
If we can at least break it down to 3 buckets
it would help really pin down where the breakage.
Do note that the percentage shown wasn't Yahoo
data, that was Google's data, so we don't have
that breakdown ourselves.
Would going to AS level be too specific for people?
We'd need to consider carefully privacy issues
and anonymizing the data as per our privacy
What about running an experiment in partnerships
for specific ASNs?
Point is, this is coming out, do share the data,
this will be going into mainline code release.
It's opt-in, defaulted off.
The *actual* names for the config options are...
This will apply to just the RR set, not the
glue set; if glue returns AAAA, it'll still
come back intact.
The tests are really testing recursive lookup
server to the last proxy device in front of
But what if the recursive server to auth server
If the recursive server lookup side, can we
turn the knob on in the other direction?
This is an interesting challenge; we'll have to
see how much additional work this will need, and
how much additional funding will be needed to
cover for it.
ISC will...look into the feasability of doing that.
The IETF draft cutoff is tonight for Paris, so
maybe it'll be done for the Anaheim one, at which
point we'll have working code out there, and a bit
more time for writing the draft.
We wrap up the BOF at 1724 hours Pacific time.
(what about a switch for auth servers that allows
for turning off "don't send AAAA records to ZZZZZ"?)
More information about the NANOG