Google's peering, GGC, and congestion management

Baldur Norddahl baldur.norddahl at
Thu Oct 15 21:13:17 UTC 2015

On 15 October 2015 at 22:00, Patrick W. Gilmore <patrick at> wrote:

> The reason routers do not do that is what you suggest would not work.
Of course it will work and it is in fact exactly the same as your own
suggestion, just implemented in the network. Besides it _is already_ a
standard feature, it is called equal cost multipath routing. The only
difference is dynamically changing the weights between the multipaths.

> First, you make the incorrect assumption that inbound will never exceed
> outbound. Almost all CDN nodes have far more capacity between the servers
> and the router than the router has to the rest of the world. And CDN nodes
> are probably the least complicated example in large networks. The only way
> to ensure A < B is to control A or B - and usually A.

I make absolutely no assumptions about ingress (towards the ASN) as we have
no control of that. There is no requirement that routing is symmetric and
it is the responsibility of whoever controls the ingress to do something if
the port is overloaded in that direction. In the case of a CDN however, the
ingress will be very little. Netflix does not take much data in from their
customers, it is all egress traffic towards the customers and the CDN is in
control of that. The same goes for Google.

Two non CDN peers could use the system, but if the traffic level is
symmetric then they better both do it.

> Second, the router has no idea how much traffic is coming in at any
> particular moment. Unless you are willing to move streams mid-flow, you
> can’t guarantee this will work even if sum(in) < sum(out). Your idea would
> put Flow N on Port X when the SYN (or SYN/ACK) hits. How do you know how
> many Mbps that flow will be? You do not, therefore you cannot do it right.
> And do not say you’ll wait for the first few packets and move then. Flows
> are not static.

Flows can move at any time in a BGP network. As we are talking about CDNs
we can assume that we have many many small flows (compared to port size).
We can be fairly sure that traffic will not make huge jumps from one second
to the next - you will have a nice curve here. You know exactly how much
traffic you had the last time period, both out through the contested port
and through the alternative paths. Recalculating the weights is just a
matter of assuming that the next time period will be the same or that the
delta will be the same. It is a classic control loop problem. TCP is trying
to do much the same btw.

You can adjust how close to 100% you want the algorithm to hit. If it
performs badly, give it a little bit more space.

If the time period is one second, flows can move once a second at maximum
and very few flows would be likely to move. You could get a few out of
order packets on your flow, which is not such a big issue in a rare event.

> Third…. Actually, since 1 & 2 are each sufficient to show why it doesn’t
> work, not sure I need to go through the next N reasons. But there are
> plenty more.

There are more reasons why this problem is hard to do on the servers :-).



More information about the NANOG mailing list