Deepak Jain deepak at ai.net
Mon Apr 20 16:23:51 CDT 2009

So here is an idea that I hope someone shoots down.

We've been talking about pseudo-wires, and the high level of expertise a shared-fabric IXP needs
to diagnose weird switch oddities, etc.

As far as I can tell, the principal reason to use a shared fabric is to allow multiple connections to networks
that may not justify their own dedicated ($$$$) router port. Once they do, they can move over to a PNI. However, an IXP is (at the hardware level at least) trying to achieve any-to-any connectivity without concern for capacity up to the port size of each port on every flow. Scaling this to multiple pieces of hardware has posed interesting challenges when the connection speed to participants is of the same order as the interconnection between IXP switches.

So here is a hybrid idea, I'm not sure if It has been tried or seriously considered before.

Since the primary justification for a shared fabric is cost savings....

What if everyone who participated at an IXP brought their own switch. For argument's sake, a Nexus 5xxx. It has 20+ ports of L2, wire speed 10G.

You connect 1-2 ports on your router, you order 18 cross-connects to your favorite peers. The IXP becomes a cross-connect provider (there is a business model bump that can be addressed here, TelX and others could address it). As you need more ports, you add them. A Nexus 5K runs about $500 per port. 

Here are some advantages. If you have 300 participants, yes, you have a lot of ports/switches. However, as "interconnectivity" increases, so does the total fabric capacity. Each additional switch does NOT add significant
complexity to the participants, but it does bring significant backplane and buffering capabilities. Each participant could then configure their own pVlans, Vlans or whatever on *their* switch. If they screw something up, it doesn't take everyone down.  A non-offending participant that interconnects with an offender can shut down
1 port (automatically or manually) without affecting the rest of their services. 

This also prevents the requirement of very complicated security features in the L2/L3 gray area.  If you don't want your peer to have multiple MACs, don't accept them. If you're cool with it, you can let it slide. 

If you want to move someone over to a PNI, the IXP needs to do zilch. You just move your cross-connect over to a new port on your service window, your peer can do it at the same or a different time, no big deal. If you *keep* it on a switch however, you can use LACP uplinks from the switches you have to provide say 40G uplinks to your router so large peers don't affect your ability to process traffic. I doubt however, that if this model is applied, there is much purpose for PNIs -- the cost savings model mostly vanishes. 

As you want to move to higher speeds (40G and 100G) the IXP has to do zilch. You can switch your ports or peers at anytime you choose or set up a separate fabric for your 100G peers. An upgrade in port density or capacity for a peer, or set of peers, does not require a forklift of the whole IXP or some strange speed shifting (other than in the affected parties). 

Disadvantages. It's probably cheaper on a per-participant basis than a shared fabric once it gets to be a certain size. It's a different model (many-to-many vs one-to-many) that many are used to. It requires interconnects to other participants (en masse) to be about the same as the per port cost of a shared fabric (this is probably achievable given what many places charge for 10G ports). Each participant is managing an additional type of gear. Theoretically if someone uses an Extreme and another uses a Cisco, there might be issues, but at a pure vanilla-L2/VLAN level, I'm pretty sure even 2nd and 3rd tier vendors can interconnect just fine.

I think this addresses the keep it as simple as possible without over simplifying. There is nothing new to this model except (perhaps) as its applied to an IXP. People have been aggregating traffic by ports into trunks by capacity for a long time. I haven't figured out why it hasn't really been done to scale at the IXP level.


Deepak Jain

> -----Original Message-----
> From: vijay gill [mailto:vgill at vijaygill.com]
> Sent: Monday, April 20, 2009 12:35 AM
> To: Jeff Young; Nick Hilliard; Paul Vixie; nanog at merit.edu
> Subject: Re: IXP
> If you are unfortunate enough to have to peer at a public exchange
> point, put your public ports into a vrf that has your routes. Default
> will be suboptimal to debug.
> I must say stephen and vixie and (how hard this is to type) even
> richard steenbergens methodology makes the most sense going forward.
> Mostly to prevent self-inflicted harm on parts of the exchange
> participants. Will it work? Doubtful in todays internet clue level
> /vijay
> On 4/18/09, Jeff Young <young at jsyoung.net> wrote:
> > Best solution I ever saw to an 'unintended' third-party
> > peering was devised by a pretty brilliant guy (who can
> > pipe up if he's listening).  When he discovered traffic
> > loads coming from non-peers he'd drop in an ACL that
> > blocked everything except ICMP - then tell the NOC to
> > route the call to his desk with the third party finally gave
> > up troubleshooting and called in...
> >
> > fun memories of the NAPs...
> >
> > jy
> >
> >
> > On Apr 18, 2009, at 11:35 AM, Nick Hilliard wrote:
> >
> >> On 18/04/2009 01:08, Paul Vixie wrote:
> >>> i've spent more than several late nights and long weekends dealing
> >>> with
> >>> the problems of shared multiaccess IXP networks.  broadcast storms,
> >>> poisoned ARP, pointing default, unintended third party BGP,
> >>> unintended
> >>> spanning tree, semitranslucent loops, unauthorized IXP LAN
> >>> extension...
> >>> all to watch the largest flows move off to PNI as soon as
> somebody's
> >>> port was getting full.
> >>
> >
> >
> --
> Sent from my mobile device

More information about the NANOG mailing list