100G - Whitebox

Mon Dec 4 20:30:32 UTC 2017

Mike,

Whether it becomes a practical problem depends on the use case and by
that I mean buffers can cut both ways.  If buffers are too small,
traffic can be dropped and even worse, other traffic could be affected
depending on factors like ASIC design an HOLB.  Too large, latency or
order sensitive traffic can be adversely affected.

We're still dealing with the same limitations of switching which were
identified 30+ years ago as the technology was developed.  Sure we have
better chips, the options of better buffers and years ago experience to
help minimize those limitations, but those still exist and likely always
will with switching.

Honestly at this point it comes down to understanding what the use case
it and understanding the nuances that each vendor's offerings provide
and determining where things line up.  Then test test test.

-- Stephen

On 2017-12-04 2:45 PM, Mike Hammett wrote:
> In terms of 1G - 10G steps, it looks like UCSC has done some of that homework already. 
> 
> 
> https://people.ucsc.edu/~warner/Bufs/summary 
> 
> 
> "Ability to buffer 6 Mbytes is sufficient for a 10 Gb/s sender and a 1 Gb/s receiver." I'd suspect 10x would be appropriate for 100G - 10G (certainly made more accurate by testing). 
> 
> 
> http://people.ucsc.edu/~warner/I2-techs.ppt 
> 
> 
> 
> Looking through their table ( https://people.ucsc.edu/~warner/buffer.html ), it looks like more switches than not in the not-100g realm have just enough buffers to handle one, possibly two mis-matches at a time. Some barely don't have enough and others are woefully inadequate. 
> 
> 
> 
> ----- 
> Mike Hammett 
> Intelligent Computing Solutions 
> http://www.ics-il.com 
> 
> Midwest-IX 
> http://www.midwest-ix.com 
> 
> ----- Original Message -----
> 
> From: "Nick Hilliard" <nick at foobar.org> 
> To: "Mikael Abrahamsson" <swmike at swm.pp.se> 
> Cc: "NANOG list" <nanog at nanog.org> 
> Sent: Monday, August 21, 2017 6:10:17 AM 
> Subject: Re: 100G - Whitebox 
> 
> Mikael Abrahamsson wrote: 
>> On Sun, 20 Aug 2017, Nick Hilliard wrote: 
>>> Mostly you can engineer around this, but it's not as simple as saying 
>>> that small-buffer switches aren't suitable for an IXP. 
>>
>> Could you please elaborate on this? 
>>
>> How do you engineer around having basically no buffers at all, and 
>> especially if these very small buffers are shared between ports. 
> 
> you assess and measure, then choose the appropriate set of tools to deal 
> with your requirements and which is cost appropriate for your 
> financials, i.e. the same as in any engineering situation. 
> 
> At an IXP, it comes down to the maximum size of tcp stream you expect to 
> transport. This will vary depending on the stakeholders at the IXP, 
> which usually depends on the size of the IXP. Larger IXPs will have a 
> wider traffic remit and probably a much larger variance in this regard. 
> Smaller IXPs typically transport content to access network data, which 
> is usually well behaved traffic. 
> 
> Traffic drops on the core need to be kept to the minimum, particularly 
> during normal operation. Eliminating traffic drops is unnecessary and 
> unwanted because of how IP works, so in your core you need to aim for 
> either link overengineering or else enough buffering to ensure that 
> site-to-site latency does not exceed X ms and Y% packet loss. Each 
> option has a cost implication. 
> 
> At the IXP participant edge, there is a different set of constraints 
> which will depend on what's downstream of the participant, where the 
> traffic flows are, what size they are, etc. In general, traffic loss at 
> the IXP handoff will tend only to be a problem if there is a disparity 
> between the bandwidth left available on the egress direction and the 
> maximum link speed downstream of the IXP participant. 
> 
> For example, a content network has servers which inject content at 10G, 
> which connects through a 100G IXP port. The egress IXP port is a 
> mid-loaded 1G link which connects through to 10mbit WISP customers. In 
> this case, the ixp will end up doing negligible buffering because most 
> of the buffering load will be handled on the WISP's internal 
> infrastructure, specifically at the core-to-10mbit handoff. The IXP 
> port might end up dropping a packet or two during the initial tcp burst, 
> but that is likely to be latency specific and won't particularly harm 
> overall performance because of tcp slow start. 
> 
> On the other hand, if it were a mid-loaded 1G link with 500mbit access 
> customers on the other side (e.g. docsis / gpon / ftth), then the IXP 
> would end up being the primary buffering point between the content 
> source and destination and this would cause problems. The remedy here 
> is either for the ixp to move the customer to a buffered port (e.g. 
> different switch), or for the access customer to upgrade their link. 
> 
> If you want to push 50G-80G streams through an IXP, I'd argue that you 
> really shouldn't, not just because of cost but also because this is very 
> expensive to engineer properly and you're also certainly better off with 
> a pni. 
> 
> This approach works better on some networks than others. The larger the 
> IXP, the more difficult it is to manage this, both in terms of core and 
> edge provisioning, i.e. the greater the requirement for buffering in 
> both situations because you have a greater variety of streaming scales 
> per network. So although this isn't going to work as well for top-10 
> ixps as for mid- or smaller-scale ixps, where it works, it can provide 
> similar quality of service at a significantly lower cost base. 
> 
> IOW, know your requirements and choose your tools to match. Same as 
> with all engineering. 
> 
> Nick 
>