Cost-effectivenesss of highly-accurate clocks for NTP

Fri May 13 19:39:27 UTC 2016

Mel Beckman <mel at beckman.org>:
>Finally, do you want to weigh in on the necessity for highly accurate local RT
>clocks in NTP servers? That seems to be the big bugaboo in cost limiting right
>now.

Yes, this is a topic on which I have some well-developed judgments
due to having collected (and, where practical, tested) a pretty
comprehensive set of figures on components of the NTP error budget.
I've even invented some hardware that simplifies the problem.

The background to my findings is laid out in my "Introduction to Time
Service" HOWTO:

    http://www.catb.org/gpsd/time-service-intro.html

I find that an effective way to learn my way into a new application domain
is to first do knowledge capture on the assumptions its experts are
using and then document those. "Introduction to Time Service" was
written to do that and as a white paper for my project management.
Criticism and corrections are, of course, welcome.

In order to discuss the value of accurate clocks intelligently, we need
to split apart two issues: accuracy and availability. Of course we
want the most accurate time our networks can deliver; we also want to
hedge against sporadic or systemic failure of single time sources.

The most important simplification of either issue is that clock
accuracy worth paying for is bounded both by user expectations and
the noise floor defined by our network jitter.

According to RFC 5095 expected accuracy of NTP time is "several tens
of milliseconds." User expectations seem to evolved to on the close
order of 10ms.  I think it's not by coincidence this is pretty close
to the jitter in ping times I see when I bounce ICMP off a
well-provisioned site like (say) google.com through my Verizon FIOS
connection.

It's good rule-of-thumb engineering that if you want to be
metrologically accurate you should pay for precision an order of
magnitude below your feature size *and not more than that*.  Thus,
with a feature size of 10ms the economic sweet spot is a clock with
accuracy about 1ms.

One reason discussions of how to budget for WAN timeservice clocks has
tended to become heated in the past is that nothing inexpensive hit
this sweet spot.  The world was largely divided between cheap time
sources with too much jitter (e.g. GPS in-band data with a wander of
100ms or worse) and expensive high-precision clocks designed for PTP
over Ethernet that deliver three or more orders of magnitude than WAN
time service can actually use.

However...also use the 1PPS signal your GPS engine ships (top of UTC
second accurate to about 50ns) and the picture changes
completely. With that over RS232 your delivered accuracy rises to
single-digit numbers of microseconds, which is two orders of magnitude
below what you need for your 1ms goal.

Now we have a historical problem, though: RS232 and the handshake
lines you could use to signal 1PPS are dying, being replaced by USB.
which doesn't normally bring 1PPS out to where the timeserver OS
can see it.

In 2012, nearly three years before being recruited for NTPsec, I
solved this problem as part of my work on GPSD.  The key to this
solution is an obscure feature of USB, and a one-wire
patch to the bog-standard design for generic USB that exploits
it.  Technical details on request, but what it comes down to is
that with this one weird trick(!) you can mass-produce primary time
sources with a jitter bounded by the USB polling interval for
about $20 a pop.

The USB 1 polling interval is 1ms. Bingo.  We're done.  If we're only
weighting accuracy and not availability, a USB GPS is as much clock as
you need for WAN timeservice *provided it exposes 1PPS*.  These
devices exist, because I designed them and found a shop in Shenzhen
to build them. They're called the Navisys GR-601W, GR-701W, and
GR-801W.

(A viable, only skightly more expensive alternative is to mate a GPS
daughterboard to a hackerboard like the Raspberry Pi and run NTP
service on that.  I'll have much, much more to say about that in a
future post.)

Of course, now we have to talk about availability.  GPS sometimes
loses lock.  There are sporadic and systemic availability risks due to
jamming and system failures like the 2012 hiccup, and extreme
scenarios like a giant meteorite hitting GPSS ground control in
Colorado.

What you should be willing to pay for a hedge against this is
proportional to your 95% confidence estimate of the maximum
outage interval. At the low end, this is simple; put your
antenna on a mast that guarantees unobstructed skyview.  At the high
end it's effectively impossible, in that anything that takes down GNSS
and Galileo permanently (giant meteor impact, war in space, collapse
of the U.S. and Europe) is likely to be in the you-have-much-bigger-
problems than-inaccurate-time department.

Traditionally dedicated time-source hardware like rubidium-oscillator
GPSDOs is sold on accuracy, but for WAN time service their real draw
is long holdover time with lower frequency drift that you get from the
cheap, non-temperature-compensated quartz crystals in your PC.

There is room for debate about how much holdover you should pay for,
but you'll at least be thinking more clearly about the problem if
you recognize that you *should not* buy expensive hardware for
accuracy.  For WAN time service, in that price range, you're wither
buying holdover and knowing you're doing so or wasting your money.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Everything you know is wrong.  But some of it is a useful first approximation.