100G - Whitebox

Nick Hilliard nick at foobar.org
Mon Aug 21 11:10:17 UTC 2017


Mikael Abrahamsson wrote:
> On Sun, 20 Aug 2017, Nick Hilliard wrote:
>> Mostly you can engineer around this, but it's not as simple as saying
>> that small-buffer switches aren't suitable for an IXP.
> 
> Could you please elaborate on this?
> 
> How do you engineer around having basically no buffers at all, and
> especially if these very small buffers are shared between ports.

you assess and measure, then choose the appropriate set of tools to deal
with your requirements and which is cost appropriate for your
financials, i.e. the same as in any engineering situation.

At an IXP, it comes down to the maximum size of tcp stream you expect to
transport.  This will vary depending on the stakeholders at the IXP,
which usually depends on the size of the IXP.  Larger IXPs will have a
wider traffic remit and probably a much larger variance in this regard.
 Smaller IXPs typically transport content to access network data, which
is usually well behaved traffic.

Traffic drops on the core need to be kept to the minimum, particularly
during normal operation.  Eliminating traffic drops is unnecessary and
unwanted because of how IP works, so in your core you need to aim for
either link overengineering or else enough buffering to ensure that
site-to-site latency does not exceed X ms and Y% packet loss.  Each
option has a cost implication.

At the IXP participant edge, there is a different set of constraints
which will depend on what's downstream of the participant, where the
traffic flows are, what size they are, etc.  In general, traffic loss at
the IXP handoff will tend only to be a problem if there is a disparity
between the bandwidth left available on the egress direction and the
maximum link speed downstream of the IXP participant.

For example, a content network has servers which inject content at 10G,
which connects through a 100G IXP port.  The egress IXP port is a
mid-loaded 1G link which connects through to 10mbit WISP customers.  In
this case, the ixp will end up doing negligible buffering because most
of the buffering load will be handled on the WISP's internal
infrastructure, specifically at the core-to-10mbit handoff.  The IXP
port might end up dropping a packet or two during the initial tcp burst,
but that is likely to be latency specific and won't particularly harm
overall performance because of tcp slow start.

On the other hand, if it were a mid-loaded 1G link with 500mbit access
customers on the other side (e.g. docsis / gpon / ftth), then the IXP
would end up being the primary buffering point between the content
source and destination and this would cause problems.  The remedy here
is either for the ixp to move the customer to a buffered port (e.g.
different switch), or for the access customer to upgrade their link.

If you want to push 50G-80G streams through an IXP, I'd argue that you
really shouldn't, not just because of cost but also because this is very
expensive to engineer properly and you're also certainly better off with
a pni.

This approach works better on some networks than others.  The larger the
IXP, the more difficult it is to manage this, both in terms of core and
edge provisioning, i.e. the greater the requirement for buffering in
both situations because you have a greater variety of streaming scales
per network.  So although this isn't going to work as well for top-10
ixps as for mid- or smaller-scale ixps, where it works, it can provide
similar quality of service at a significantly lower cost base.

IOW, know your requirements and choose your tools to match.  Same as
with all engineering.

Nick



More information about the NANOG mailing list