[Nanog] Cogent Router dropping packets

Joe Greco jgreco at ns.sol.net
Mon Apr 21 15:41:21 UTC 2008


> On Sat, Apr 19, 2008 at 7:26 PM, manolo <mhernand1 at comcast.net> wrote:
> > Some things just never change at cogent.. fought them for months way
> >  back when to get me off their infamous 2 bgp peer setup after many an
> >  outage due to this setup, they finally put us on a single bgp session
> >  but it took forever. Lets just say cogent didn't last long at the
> >  company I worked for.
> 
> Could you provide additional details on the failure mode experienced
> resultant from this "two tiered" configuration?  How did moving to a
> "conventional" configuration with a single directly-connected neighbor
> solve things?

For those unfamiliar, Cogent has a system where you set up an EBGP peering
with the Cogent router you're connected to, for the purposes of announcing
your routes into Cogent.  However, these are typically smaller, aggregation
class routers, and do not handle full tables - so you don't get your routes
from that router.  To get a full table FROM Cogent, you need to set up an
EBGP multihop session with them, to their nearest full-table router.  I 
believe they actually do all their BGP connections in that manner.

This probably makes a lot of sense from an engineering point of view, and
could be construed as a BGP competence test.  On the other hand, it does
have the potential to make things more complex in the event of a failure.

I'm not aware of any flaws with such a design that would cause "many an
outage," and connections that we've managed for customers with Cogent
suggest that it works well.  However, if there are problems within the
local Cogent node, I could easily see situations where hard-to-identify
problems could result.  That would seem to me to be an equipment, capacity,
or possibly a configuration issue, but not something which discredits the
overall strategy.  Given that they're providing inexpensive bandwidth, it
isn't likely that they'll be sticking large routers everywhere for the
customers who want a full table and a simpler BGP configuration.

There are many things that you can realistically criticize Cogent for, but
I'm not sure the peerA/peerB thing should be one of them.  It is certainly
more complex, but seems to serve a purpose.

> What steps were taken during your postmortem and subsequent lab
> simulations to verify that the outages were not with the customer-side
> implementation, or perhaps a simple typographical error?
> 
> Here in H-town, we are deploying a metro/BLEC network comprised of
> 1000s of small L3 boxes not carrying full tables (Cisco 3560 and
> similar), and would like very much to learn from these major
> architectural mistakes, so that we can avoid similar outage scenarios.
>  Any information you could provide would be excellent.

Interesting :-)

> >   You get what you pay for....
> 
> Not passing any judgment on quality, Cogent is more towards the middle
> of the road for price, these days, on larger commits.

Or in places like Ashburn.  I've been wondering what their future strategy
will be.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.




More information about the NANOG mailing list