100G - Whitebox

Mike Hammett nanog at ics-il.net
Mon Dec 4 19:45:45 UTC 2017


In terms of 1G - 10G steps, it looks like UCSC has done some of that homework already. 


https://people.ucsc.edu/~warner/Bufs/summary 


"Ability to buffer 6 Mbytes is sufficient for a 10 Gb/s sender and a 1 Gb/s receiver." I'd suspect 10x would be appropriate for 100G - 10G (certainly made more accurate by testing). 


http://people.ucsc.edu/~warner/I2-techs.ppt 



Looking through their table ( https://people.ucsc.edu/~warner/buffer.html ), it looks like more switches than not in the not-100g realm have just enough buffers to handle one, possibly two mis-matches at a time. Some barely don't have enough and others are woefully inadequate. 



----- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

----- Original Message -----

From: "Nick Hilliard" <nick at foobar.org> 
To: "Mikael Abrahamsson" <swmike at swm.pp.se> 
Cc: "NANOG list" <nanog at nanog.org> 
Sent: Monday, August 21, 2017 6:10:17 AM 
Subject: Re: 100G - Whitebox 

Mikael Abrahamsson wrote: 
> On Sun, 20 Aug 2017, Nick Hilliard wrote: 
>> Mostly you can engineer around this, but it's not as simple as saying 
>> that small-buffer switches aren't suitable for an IXP. 
> 
> Could you please elaborate on this? 
> 
> How do you engineer around having basically no buffers at all, and 
> especially if these very small buffers are shared between ports. 

you assess and measure, then choose the appropriate set of tools to deal 
with your requirements and which is cost appropriate for your 
financials, i.e. the same as in any engineering situation. 

At an IXP, it comes down to the maximum size of tcp stream you expect to 
transport. This will vary depending on the stakeholders at the IXP, 
which usually depends on the size of the IXP. Larger IXPs will have a 
wider traffic remit and probably a much larger variance in this regard. 
Smaller IXPs typically transport content to access network data, which 
is usually well behaved traffic. 

Traffic drops on the core need to be kept to the minimum, particularly 
during normal operation. Eliminating traffic drops is unnecessary and 
unwanted because of how IP works, so in your core you need to aim for 
either link overengineering or else enough buffering to ensure that 
site-to-site latency does not exceed X ms and Y% packet loss. Each 
option has a cost implication. 

At the IXP participant edge, there is a different set of constraints 
which will depend on what's downstream of the participant, where the 
traffic flows are, what size they are, etc. In general, traffic loss at 
the IXP handoff will tend only to be a problem if there is a disparity 
between the bandwidth left available on the egress direction and the 
maximum link speed downstream of the IXP participant. 

For example, a content network has servers which inject content at 10G, 
which connects through a 100G IXP port. The egress IXP port is a 
mid-loaded 1G link which connects through to 10mbit WISP customers. In 
this case, the ixp will end up doing negligible buffering because most 
of the buffering load will be handled on the WISP's internal 
infrastructure, specifically at the core-to-10mbit handoff. The IXP 
port might end up dropping a packet or two during the initial tcp burst, 
but that is likely to be latency specific and won't particularly harm 
overall performance because of tcp slow start. 

On the other hand, if it were a mid-loaded 1G link with 500mbit access 
customers on the other side (e.g. docsis / gpon / ftth), then the IXP 
would end up being the primary buffering point between the content 
source and destination and this would cause problems. The remedy here 
is either for the ixp to move the customer to a buffered port (e.g. 
different switch), or for the access customer to upgrade their link. 

If you want to push 50G-80G streams through an IXP, I'd argue that you 
really shouldn't, not just because of cost but also because this is very 
expensive to engineer properly and you're also certainly better off with 
a pni. 

This approach works better on some networks than others. The larger the 
IXP, the more difficult it is to manage this, both in terms of core and 
edge provisioning, i.e. the greater the requirement for buffering in 
both situations because you have a greater variety of streaming scales 
per network. So although this isn't going to work as well for top-10 
ixps as for mid- or smaller-scale ixps, where it works, it can provide 
similar quality of service at a significantly lower cost base. 

IOW, know your requirements and choose your tools to match. Same as 
with all engineering. 

Nick 




More information about the NANOG mailing list