Consistent asymetric latency on monitoring?

Thu Oct 22 11:25:54 UTC 2009

Lots of good info, and a nice mind-dump that gives me a whole host of other
things that need to be looked at... Umm. "thanks" :)

On Wed, Oct 21, 2009 at 11:10 PM, Perry Lorier <perry at coders.net> wrote:

> Rick Ernst wrote:
>
>> Resent, since I responded from the wrong address:
>> ---
>> The basic operation of IP SLA is as surmised; payload with timestamps
>> and other telemetry data is sent to a 'responder' which manipulates
>> the payload, including adding its own timestamps, and returns the
>> altered payload.
>>
>>
>
> Yup :) It's the obvious way to do it :)
>
>  I had to do a mental walk-through, but I think I see how drift can
>> cause this. I'm going to generate some artificial data, graph it, and
>> see if it matches the general waveshape I'm seeing.
>>
>> I purposefully have the traffic generators ntp syncing against the
>> responders. I thought that would keep the clocks more closely in sync.
>> I don't necessarily care if the time is 'right', just that it's the
>> same.
>>
>
> This causes major problems.  What you're actually measuring here is how
> well ntp can keep the clock sync'd under assymetric latency.  ntp is trying
> to do it's own measurements of one way delay, without the help of clocks to
> measure clock drift as well.   As you can see from your graphs ntp is not
> coping[1].
>
> You are far better to have each end sync to a local stratum 1 or stratum 2
> ntp source, preferably one over a different link to the one under test.  If
> you don't have a local stratum 1/2 time source at each end,  you might be
> able find one over a local exchange or other less congested link.  If this
> is very important to you then you should consider looking at running your
> own stratum 1 clocks at each end syncronised off something like GPS, CDMA or
> a T1 clock.
>
>  What kind of difference should I expect if I sync both
>> generators and responders against the same source, or not sync the
>> responder? I'm thinking that having one source with constant drift may
>> be better than both devices trying to walk/correct the time.
>>
>>
>
> Most hardware clocks in PC's/routers/switches etc have pretty atrocious
> amounts of drift if left to free run[2], sometimes in the order of seconds
> or occasionally minutes per week.  To get useful numbers you really do need
> to syncronise them to /something/.  Synchronising them to each other causes
> problems as ntp I think (I could be wrong) assumes mostly symmetrical
> latency, and if the latency isn't symmetric assumes it's because one clock
> is running fast/slow and will alter the clock's speed to account for it.
>  The great thing about ntp stratum 1 servers is that by definition they have
> more or less the same time no matter where they are, so synchronising each
> against a local ntp server will be a much much better solution.  If possible
> you should consider peering with at least 3 upstreams, preferably 4(!)[3]
> other ntp servers.
>
> [1]: To be fair it's a hard problem.  Anything that involves time just gets
> more and more complicated the more you look at it, ntp is extremely clever
> and probably knows more about time than I'd ever want to know, but you're
> making it's job hard.
>
> [2]: http://vancouver-webpages.com/time/ /
> http://vancouver-webpages.com/time/ltmhist.png
>
> [3]:
> http://twiki.ntp.org/bin/view/Support/SelectingOffsiteNTPServers#Section_5.3.3
> .
>