Is multihoming hard? [was: DNS amplification]

Sat Mar 23 19:52:30 UTC 2013

On Mar 23, 2013, at 12:12 , Jimmy Hess <mysidia at gmail.com> wrote:

> On 3/23/13, Owen DeLong <owen at delong.com> wrote:
>> A reliable cost-effective means for FTL signaling is a hard problem without
>> a known solution.
> 
> Faster than light signalling is not merely a hard problem.
> Special relativity doesn't provide that information may travel faster
> than the maximum
> speed C.    If you want to signal faster than light, then slow down the light.
> 
>> An idiot-proof simple BGP configuration is a well known solution. Automating
>> it would be relatively simple if there were the will to do so.
> 
> Logistical problems...  if it's a multihomed connection, which of the
> two or three providers manages it,  and gets to blame the other
> provider(s) when anything goes wrong: or are you gonna rely on the
> customer to manage it?
> 

The box could (pretty easily) be built with a "Primary" and "Secondary" port.

The cable plugged into the primary port would go to the ISP that sets the
configuration. The cable plugged into the other port would go to an ISP
expected to accept the announcements of the prefix provided by the ISP
on the primary port.

BFD could be used to illuminate a tri-color LED on the box for each port,
which would be green if BFD state is good and red if BFD state is bad.

At that point, whichever one is red gets the blame. If they're both green,
then traffic is going via the primary and the primary gets the blame.

If you absolutely have to troubleshoot which provider is broken, then
start by unplugging the secondary. If it doesn't start working in 5 minutes,
then clearly there's a problem with the primary regardless of what else
is happening.

Lather, rinse, repeat for the secondary.

> Someone might be able to make a protocol that lets this happen, which
> would need to detect on a per-route basis any performance/connectivity
> issues, but I would say it's not any known implementation of BGP.

A few additional options to DHCP could actually cover it from the primary
perspective.

For the secondary provider, it's a little more complicated, but could be
mostly automated so long as the customer identifies the primary provider
and/or provides an LOA for the authorized prefix from the primary to
the secondary.

The only complexity in the secondary case is properly filtering the announcement
of the prefix assigned by the primary.

>> 1.	ISPs are actually motivated to prevent customer mobility, not enable it.
> 
>> 2.	ISPs are motivated to reduce, not increase the number of multi-homed
>> 	sites occupying slots in routing tables.
> 
>    This is not some insignificant thing.   The ISPs have to maintain
> routing tables
>    as well;  ultimately the ISP's customers are in bad shape, if too many slots
>    are consumed.
> 

I never said it was insignificant. I said that solving the multihoming problem
in this manner was trivial if there was will to do so. I also said that the above
were contributing factors in the lack of will to do so.

> How about
>   3.  Increased troubleshooting complexity when there are potential
> issues or complaints.
> 

I do not buy that it is harder to troubleshoot a basic BGP configuration
than a multi-carrier NAT-based solution that goes woefully awry.

I'm sorry, I've done the troubleshooting on both scenarios and I have
to say that if you think NAT makes this easier, you live in a different
world than I do.

> The concept of a "fool proof"  BGP configuration is clearly a new sort of myth.

Not really.

Customer router accepts default from primary and secondary providers.
So long as default remains, primary is preferred. If primary default goes
away, secondary is preferred.

Customer box gets prefix (via DHCP-PD or static config or whatever
either from primary or from RIR). Advertises prefix to both primary
and secondary.

All configuration of the BGP sessions is automated within the box
other than static configuration of customer prefix (if static is desired).

Primary/Secondary choice is made by plugging providers into the
Primary or Secondary port on the box.

> The idea that the protocol on its own, with a very basic config, does
> not ever require
> any additional attention,  to achieve expected results;  where
> expected results include isolation from any faults with the path from
> one of of the user's two, three, or four providers,  and  balancing
> for optimal throughput and best latency/loss to every destination.

I have installed these configurations at customer sites for several of
my consulting clients that wanted to multihome their SMBs.

Some of them have been running for more than 8 years without a
single issue.

For all of the above requirements, no. You can't do that with the most
advanced manual BGP configurations today.

However, if we reduce it to:

1.	The internet connection stays up so long as one of the two
	providers is up.

2.	Traffic prefers the primary provider so long as the primary provider
	is up.

3.	My addressing remains stable so long as I remain connected to
	the primary provider (or if I use RIR based addressing, longer).

Then what I have proposed actually is achievable, does work, and
does actually meet the needs of 99+% of organizations that wish to
multihome.

> BGP multihoming doesn't  prevent users from having issues because:
> 
>      o Connectivity issues that are a responsibility of one of their provider's
>         That they might have expected multihoming to protect them against
>          (latency, packet loss).

Correct. However, this is true of ANY multihoming solution. The dual-
provider NAT solution certainly does NOT improve this.

>      o very Poor performance of one of their links;  or poor
> performance of one of their
>         links to their favorite destination

See above.

>      o Asymmetric paths;  which means that when latency or loss is poor,
>         the customer doesn't necessarily know which provider to blame,
>         or if both are at fault,  and  the providers can spend a lot of time
>         blaming each other.

See above.

> These are all solvable problems,   but at cost, and therefore not for
> massmarket lowest cost ISP service.

My point is that the automated simple BGP solution I propose can provide
a better customer experience than the currently popular NAT-based
multihoming with simpler troubleshooting and lower costs.

> It's not as if they can have
>    "Hello, DSL technical support...  did you try shutting off your
> other peers and retesting'?"

ROFL.

> The average end user won't have a clue -- they will need one of the
> providers, or someone else to be managing that for them,  and
> understand  how each provider is connected.

Again, you're setting a much higher goal than I was.

My goal was to do something better than what is currently being done.
(Connect a router to two providers and use NAT to choose between them).

> I don't see large ISPs  training up their support reps for  DSL
> $60/month services, to handle BGP troubleshooting, and multihoming
> management/repair.

But they already get stuck with this in the current NAT-based solution which
is even harder to troubleshoot and creates even more problems.

Owen