packet loss question

Mel Beckman mel at beckman.org
Thu Jul 7 21:50:02 UTC 2016


Ken,

I should have made clear I wasn't replying to you. I was replying to Brielle's comment:

>  Is it bad that the first thing that came to mind is "Oh FFS, another troll"?

 -mel beckman

> On Jul 7, 2016, at 2:35 PM, Ken Chase <math at sizone.org> wrote:
> 
> On Thu, Jul 07, 2016 at 08:32:19PM +0000, Mel Beckman said:
>> Yes. It indicates that there was never a time when you did not know everything :)
>> 
>> -mel beckman
> 
> The issue isnt knowing everything, it's making accusations of issues while you still
> dont know how much you dont know. (~D. Rumsfeld) -- My customers in a nutshell
> (they pay to be able to yell about random stuff I guess, and I provide that service!).
> 
> The OP didnt make any accusations however, and just asked what was going on (sorry
> if I sounded harsh in reply). Once, Google having a 8.8.8.8 failure locally on
> its (anycast?) dns servers resulted in dozens of calls to us "your server
> hosting our site must be down!! Our website isnt working! People are calling us!".
> 
> Most of my work is with these situations is spent proving it's not our fault.
> Mtr makes it very hard because it's a very subtle tool, and only gives partial
> information. (I still think mtr is a killer app though!)
> 
> consider this (fake, example) trace:
> 
>  6. 100ge13-1.core1.chi1.he.net            0.0%    10 
>  7. 100ge14-1.core2.chi1.he.net            0.0%    10 
>  8. 100ge3-1.core1.sjc2.he.net            30.0%    10 
>  9. ???
> 10. UNKNOWN-216-115-101-X.yahoo.com       10.0%    10 
> 11. routerer-ext.ysv.freebsd.org          20.0%    10 
> 12. wfe0.ysv.freebsd.org                  30.0%    10 
> 
> First off, the OP may have asked "who's fault is hop 9, yahoo or HE?" and seen it
> as an issue. Ignoring that for now, the rest of the packetloss is an issue --
> where is the problem though?
> 
> This is very tricky - it looks like hop 8 is at fault of course - or is it
> just dropping ICMP as it's allowed to? How did hop 10 get only 10% loss then if
> 8 has 30? Is 8 then dropping ~20% (not statistically correct..) of ICMP just cuz
> it can, and then having a 'real' 10% loss on top of that?
> 
> Or it's hop 11? But hop 12 has more PL, perhaps hop 12 is the issue
> all along and 8 10 and 11 are just dropping ICMP? Or it's 8, 11 and 12 doing
> ~10% each? (not statistically correct.)
> 
> Can't say for sure - it's a probabilities game - and being completely correct
> about it, hop 6 isn't blameless either (just very unlikely to be at fault
> statistically, though not impossible with only 10 pings per hop - a statistician
> can calculate it for us).
> 
> This is why more pings are required to be sure of the situation - I like to do
> -i 0.1 -c 100 so it's completed quickly before conditions change.  Then you
> can make a statistically valid pronouncement of where the problem MIGHT BE
> within a useful confidence interval - however, without the return route we're
> still largely in the dark as to the actual location of the issue. You cant be
> '100% sure' with this stuff - technically speaking, it's all 'luck of the draw'.
> 
> (Beware: this one time, at band camp, some etherchannel or equiv at HE was
> showing PL only for specific ips in any target subnet -- because they were xor'ing
> the source & target IP to load balance and one channel was wonky. Fun times
> debugging that one: "WFM from here, what's your issue?")
> 
> /kc
> -- 
> Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada
> Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.



More information about the NANOG mailing list