Service provider story about tracking down TCP RSTs

William Herrin bill at
Sun Sep 2 00:38:51 UTC 2018

On Sat, Sep 1, 2018 at 6:11 PM, Lee <ler762 at> wrote:
> On 9/1/18, William Herrin <bill at> wrote:
>> On Sat, Sep 1, 2018 at 4:00 PM, William Herrin <bill at> wrote:
>>> Better yet, do the job right and build an anycast TCP stack as
>>> described here:
> An explosion in state management would be the least of my worries :)
> I got as far as your Third hook: and thought of this

Hi Lee,

On a brief tangent: Geographic routing would drastically simplify the
Internet core, reducing both cost and complexity. You'd need to carry
only nearby specific routes and a few broad aggregates for
destinations far away. It will never be implemented, never, because no
cross-ocean carriers are willing to have their bandwidth stolen when
the algorithm decides it likes their path better than a paid one. Even
though the algorithm gets the packets where they're going, and does so
simply, it does so in a way that's too often incorrect.

Then again, I don't really understand the MIT/New Jersey argument in
Richard's worse-is-better story. The MIT guy says that a routine
should handle a common non-fatal exception. The Jersey guy says that
it's ok for the routine to return a try-again error and expect the
caller to handle it. Since its trivial to build another layer that
calls the routine in a loop until it returns success or a fatal error,
it's more a philosophical argument than a practical one. As long as a
correct result is consistently achieved in both cases, what's the

Richard characterized the Jersey argument as, "It is slightly better
to be simple than correct." I just don't see that in the Jersey
argument. Every component must be correct. The system of components as
a whole must be complete. It's slightly better for a component to be
simple than complete. That's the argument I read and it makes sense to

Honestly, the idea that software is good enough even with known corner
cases that do something incorrect... I don't know how that survives in
a world where security-conscious programming is not optional.

> I had it much easier with anycast in an enterprise setting.  With
> anycast servers in data centers A & B, just make sure no site has an
> equal cost path to A and B.  Any link/ router/ whatever failure & the
> user can just re-try.

You've delicately balanced your network to achieve the principle that
even when routing around failures the anycast sites are not
equidistant from any other site. That isn't simplicity. It's
complexity hidden in the expert selection of magic numbers. Even were
that achievable in a network as chaotic as the Internet, is it simpler
than four trivial tweaks to the TCP stack plus a modestly complex but
fully automatic user-space program that correctly reroutes the small
percentage of packets that went astray?

Bill Herrin

William Herrin ................ herrin at  bill at
Dirtside Systems ......... Web: <>

More information about the NANOG mailing list