Does anyone multihome anymore?

Wed Aug 22 19:26:47 UTC 2007

On Wed, 22 Aug 2007, Mike Tancsa wrote:

>> > Multihoming is great for when there is a total outage.  In the case of
>> > Cogent on Monday, it wasnt "down"... In this case, there is only so much
>> > you can do to influence how packets come back at you as BGP doesnt know
>> > anything about a "lossy" or slow connections.
>> >
>> >         ---Mike
>> 
>> Take the carrier that is causing you issues out of your eBGP setup and
>> all's well....
>
> Hi,
> In my case, I have 6453 and 174 for transit.  I want to get to 577 which is 
> directly connected to 6453 and 174. 577 has a higher local pref on paths via 
> 174.  Short of shutting my 174 session (or some deaggregation), I dont have a 
> way to influence how 577 gets back to me.  I can easily exit out 6453, but it 
> does nothing for the return packets.  I have enough capacity on 6453 to 
> handle all my traffic, but its a Draconian step to take and some traffic via 
> 174 is fine and would be worse if I fully shut the session. (ie. peers of 174 
> in Toronto)

I'm posting too much this week and should stop, but...

Again, this is a matter of thinking about design goals.  What were you 
trying to accomplish when you bought redundant connections?  It probably 
wasn't "redundancy," but rather something that redundancy would give you. 
What redundancy gives you is a better statistical probability that not all 
of the redundant components will be broken at once.

It should be noted that multi-homing is just one of many areas of possible 
redundancy.  Anything else that can break -- routers, switches, cables, 
etc. can all be set up redundantly.  No amount of redundancy in any of 
those components guarantees reliability.  What they do mean is that your 
network can keep functioning if some components break, as long as you 
still have enough of whatever component it is to keep running.

So, in a redundant setup, what happens when a component breaks?  In an 
ideal situation, it breaks cleanly, fail-over happens automatically, and 
nobody notices.  Then you just have to hope your monitoring system is good 
enough that you know there's something to fix.  But, in an ideal 
situation, things wouldn't break at all, so designing your procedures 
around "ideal" failure scenarios doesn't make much sense.  What redundancy 
really gives you is the ability to have outages not turn into major 
disruptions; the ability, when you see that a component is malfunctioning, 
to turn it off and go back to sleep.  You can then do the real fix later, 
when it's more convenient or less disruptive.

Thought about that way, there's nothing "Draconian" about turning off a 
connection (or a switch, or a router, or any other redundant component) 
that's not doing what you want it to.  Instead, you're taking advantage of 
a main feature of your design.  If your other providers are doing 95th 
percentile billing, you even have a day and a half per month that you can 
leave a connection down at no financial cost.  The alternative, as you 
seem to have noticed, is to spend your day stressing out about your 
network not working properly, and complaining about being helpless.  You 
don't need redundancy for that.

-Steve