Lossy cogent p2p experiences?

Mon Sep 11 16:14:16 UTC 2023

Some interesting new developments on this, independent of the divergent network equipment discussion. 😊

Cogent had a field engineer at the east coast location where my local loop (10gig wave) meets their equipment, i.e. (me – patch cable to loop provider’s wave equipment – wave – patch cable to Cogent equipment).  On the other end, the geographically distant west coast direction, it’s Cogent equipment to my equipment in the same facility with just patch cable.  They connected some model of EXFO’s NetBlazer FTBx 8880-series testing device to a port on their east coast network device, not disconnecting my circuit.  Originally, they were planning to have someone physically loop at their equipment at the other end, but I volunteered that my Arista gear supports a provider-facing loop at the transceiver level if they wanted to try that, so my loop, cabling, and transceiver could be part of the testing.

One direction at a time, they interrupted the point to point config to create a point to point between one direction of my gear, set to loopback mode, and the NetBlazer device.  The device was set to use five parallel streams.  In the close direction, where the third-party wave is involved, they ran at full 5 x 2gbps for thirty minutes, had zero packets lost, no issues.  My monitoring confirmed this rate of port input was occurring, although oddly not output, but perhaps Arista doesn’t “see”/count the retransmitted packets in phy loopback mode.

In the distant direction across their backbone, their equipment at the remote end, and the fiber patch cable to me, they tested at 9.5 Gbit for thirty minutes through my device in loopback mode.  The result was, of 2.6B packets sent, only 334 packets lost.  They configured for 9.5 gbps rate of testing, so five 1.9gbps streams.  Across the five streams, the report has a “frame loss” and out of sequence section.  Zero out of sequence, but among the five streams, loss seconds / count were 3 / 26, 3 / 48, 1 / 5, 13 / 221, 1 / 34.  I’m not familiar with this testing device, but to me that suggests it’s stating how many of the total seconds experienced loss, and the counted packet loss.  So really the only one that stands out is the one with thirteen seconds where loss occurred, but the packet counts we’re talking about are miniscule.  Again, my monitoring at the interface level showed this 9.5gbps of testing occurring for the thirty minutes the report says.

So, now I’m just completely confused.  How is this device, traversing the same equipment, ports, cables, able to achieve far greater average throughput, and almost no loss, across a very long duration?  There are times I’ll be able to achieve nearly the same, but never for a test longer than ten seconds as it just falls off from there.  For example, I did a five parallel stream TCP test with iperf just now and did achieve a net throughput of 8.16 Gbps with about 1200 retransmits.  Same five stream test run for half hour like theirs, I got no better than 2.64 Gbps and 183,000 retransmits.

iperf and UDP allow me to see loss at any rate of transmit exceeding ~140mbps, in just seconds, not a half hour.  To rule out my gear, I’m also able to perform the same tests from the same systems (both VM and physical) using public addresses and traversing the internet, as these are publicly connected systems.  I get far lower loss and much greater throughput on the internet path.  For example, simple ten second test of a single stream at 400 Mbit UDP; 5 packets lost across internet, 491 across P2P.  Single stream TCP across the internet for ten seconds; 3.47 Gbps, 162 retransmits.  Across the P2P, this time at least, 637 Mbps, 3633 retransmits.

David

From: David Hubbard <dhubbard at dino.hostasaurus.com>
Date: Friday, September 1, 2023 at 10:19 AM
To: Nanog at nanog.org <nanog at nanog.org>
Subject: Re: Lossy cogent p2p experiences?
The initial and recurring packet loss occurs on any flow of more than ~140 Mbit.  The fact that it’s loss-free under that rate is what furthers my opinion it’s config-based somewhere, even though they say it isn’t.

From: NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com at nanog.org> on behalf of Mark Tinka <mark at tinka.africa>
Date: Friday, September 1, 2023 at 10:13 AM
To: Mike Hammett <nanog at ics-il.net>, Saku Ytti <saku at ytti.fi>
Cc: nanog at nanog.org <nanog at nanog.org>
Subject: Re: Lossy cogent p2p experiences?

On 9/1/23 15:44, Mike Hammett wrote:
and I would say the OP wasn't even about elephant flows, just about a network that can't deliver anything acceptable.

Unless Cogent are not trying to accept (and by extension, may not be able to guarantee) large Ethernet flows because they can't balance them across their various core links, end-to-end...

Pure conjecture...

Mark.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20230911/0b3ba83f/attachment.html>