NTP Sync Issue Across Tata (Europe)

Mel Beckman mel at beckman.org
Wed Aug 16 14:51:28 UTC 2023


So, probably not a failure "caused by GPS", rather one caused by poor
design (only two clock sources) combined with unsupported and buggy
devices.

Interesting software bug, but not really germane to this discussion, other than as a cautionary tale about time distribution architectures.


 -mel beckman

On Aug 16, 2023, at 3:51 AM, Matthew Richardson <matthew-l at itconsult.co.uk> wrote:

Mel Beckman wrote:-

Do you have a citation for your Jersey event? I doubt GPS caused the problem, but I'd like to see the documentation.

The event took place on the evening of Sunday 12 July 2020, and seems NOT
to have been due to an issue caused directly by GPS, but rather to
misbehaviour of a GPS NTP server relating to week numbers.  Our regulator
subsequently issued the following comprehensive document:-

https://www.jcra.je/media/598397/t-027-jt-july-2020-outage-decision-directions.pdf

By way of summary, JT operated two GPS derived NTP servers, with all of
their routers were pointing to both.  On the evening in question, one of
the two reset its clock back to 27 November 2000.

Their interior routing protocol used amongst their mesh of routers was
IS-IS which was using authentication.  The authentication [section 4.19]
was described having a "password validity start date" of 01 July 2012.
Thus, any routers which had picked up the time from the faulty source no
longer had valid IS-IS authentication and were thus isolated.

Whilst only 15% of their routers were affected, this was enough to cause an
almost total failure in their network, affecting telephony (fixed & mobile)
and Internet.  For foreign readers (this is NANOG!) "999" calls refer to
the emergency services in these parts, where any failures attract the
attention of our regulator.

The details of why the clock "failed" start at section 4.23, and seem to
relate a GPS week number rollover.

So, probably not a failure "caused by GPS", rather one caused by poor
design (only two clock sources) combined with unsupported and buggy
devices.

One curious aspect is that some routers followed the "bad" time, which is
alluded to in section 4.31.

Something not discussed in that report is that JT's email failed during the
incident despite its being hosted on Office365.  The reason was that the
two authoritative DNS servers for jtglobal.com were hosted in Jersey inside
their network.  As that network was wholly disconnected, there was no DNS
and hence no email.  Despite my having raised this since with their senior
management, their DNS remains hosted in this way:-

matthew at m88:~$ dig +norec +noedns +nocmd +nostats -t ns jtglobal.com @ns1.jtibs.net
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20462
;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 4

;; QUESTION SECTION:
;jtglobal.com.            IN    NS

;; ANSWER SECTION:
jtglobal.com.        60    IN    NS    ns2.jtibs.net.
jtglobal.com.        60    IN    NS    ns1.jtibs.net.

;; ADDITIONAL SECTION:
ns1.jtibs.net.        60    IN    A    212.9.0.135
ns2.jtibs.net.        60    IN    A    212.9.0.136
ns1.jtibs.net.        60    IN    AAAA    2a02:c28::d1
ns2.jtibs.net.        60    IN    AAAA    2a02:c28::d2

Rediculously (and again despite my agitation to their management) our
government domain gov.je has similar DNS fragility:-

matthew at m88:~$ dig +norec +noedns +nocmd +nostats -t ns gov.je @ns1.gov.je
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4249
;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2

;; QUESTION SECTION:
;gov.je.                IN    NS

;; ANSWER SECTION:
gov.je.            3600    IN    NS    ns2.gov.je.
gov.je.            3600    IN    NS    ns1.gov.je.

;; ADDITIONAL SECTION:
ns2.gov.je.        3600    IN    A    212.9.21.137
ns1.gov.je.        3600    IN    A    212.9.21.9

--
Best wishes,
Matthew

------
From: Mel Beckman <mel at beckman.org>
To: Matthew Richardson <matthew-l at itconsult.co.uk>
Cc: Nanog <nanog at nanog.org>
Date: Tue, 8 Aug 2023 15:12:29 +0000
Subject: Re: NTP Sync Issue Across Tata (Europe)

Until the Internet NTP network can be made secure, no. Do you have a citation for your Jersey event? I doubt GPS caused the problem, but I'd like to see the documentation.

Using GPS for time sync is simple risk management: the risk of Internet NTP with known, well documented vulnerabilities and many security incidents, versus the risk of some theoretical GPS-based vulnerability, for which mitigations such as geographic diversity are readily available. Sure, you could use Internet NTP as a last resort should GPS fail globally (perhaps due to a theoretical - but conceivable - meteor storm). But that would be a fall-back. I would not mix the systems.

-mel

On Aug 8, 2023, at 1:36 AM, Matthew Richardson <matthew-l at itconsult.co.uk> wrote:

?Mel Beckman wrote:-

It's a problem that has received a lot of attention in both NTP and
aviation navigation circles. What is hard to defend against is total signal
suppression via high powered jamming. But that you can do with a
geographically diverse GPS NTP network.

Whilst looking forward to being corrected, GPS (even across multiple
locations) seems to be a SINGLE source of time.  You seem (have I
misunderstood?) to be a proponent of using GPS exclusively as the external
clock source.

Might it be preferable to have a mixture of GPS (perhaps with another GNSS)
together with carefully selected Internet-based NTP servers?

I recall an incident over here in Jersey (the one they named New Jersey
after!) where our primary telco had a substantial time shift on one of
their two GPS synced servers.  This managed to adjust the clock on enough
of their routers that the certificate-based OSPF authentication considered
the certificates invalid, and caused a failure of almost their whole
network.

This is, of course, not to say that GPS is not a very good clock source,
but rather to wonder whether more diversity would be preferable than using
it as a single source.

--
Best wishes,
Matthew

------
From: Mel Beckman <mel at beckman.org>
To: "Forrest Christian (List Account)" <lists at packetflux.com>
Cc: Nanog <nanog at nanog.org>
Date: Mon, 7 Aug 2023 14:03:30 +0000
Subject: Re: NTP Sync Issue Across Tata (Europe)

Forrest,

GPS spoofing may work with a primitive Raspberry Pi-based NTP server, but commercial industrial NTP servers have specific anti-spoofing mitigations. There are also antenna diversity strategies that vendors support to ensure the signal being relied upon is coming from the right direction. It's a problem that has received a lot of attention in both NTP and aviation navigation circles. What is hard to defend against is total signal suppression via high powered jamming. But that you can do with a geographically diverse GPS NTP network.

-mel

On Aug 7, 2023, at 1:39 AM, Forrest Christian (List Account) <lists at packetflux.com> wrote:

?
The problem with relying exclusively on GPS to do time distribution is the ease with which one can spoof the GPS signals.

With a budget of around $1K, not including a laptop, anyone with decent technical skills could convince a typical GPS receiver it was at any position and was at any time in the world.   All it takes is a decent directional antenna, some SDR hardware, and depending on the location and directivity of your antenna maybe a smallish amplifier.   There is much discussion right now in the PNT (Position, Navigation and Timing) community as to how best to secure the GNSS network, but right now one should consider the data from GPS to be no more trustworthy than some random NTP server on the internet.

In order to build a resilient NTP server infrastructure you need multiple sources of time distributed by multiple methods - typically both via satellite (GPS) and by terrestrial (NTP) methods.   NTP does a pretty good job of sorting out multiple time servers and discarding sources that are lying.  But to do this you need multiple time sources.  A common recommendation is to run a couple/few NTP servers which only get time from a GPS receiver and only serve time to a second tier of servers that pull from both those in-house GPS-timed-NTP servers and other trusted NTP servers.   I'd recommend selecting the time servers to gain geographic diversity, i.e. poll NIST servers in Maryland and Colorado, and possibly both.

Note that NIST will exchange (via mail) a set of keys with you to talk encrypted NTP with you.   See https://www.nist.gov/pml/time-and-frequency-division/time-services/nist-authenticated-ntp-service .



On Sun, Aug 6, 2023 at 8:36?PM Mel Beckman <mel at beckman.org<mailto:mel at beckman.org>> wrote:
GPS Selective Availability did not disrupt the timing chain of GPS, only the ephemeris (position information).  But a government-disrupted timebase scenario has never occurred, while hackers are a documented threat.

DNS has DNSSec, which while not deployed as broadly as we might like, at least lets us know which servers we can trust.

Your own atomic clocks still have to be synced to a common standard to be useful. To what are they sync'd? GPS, I'll wager.

I sense hand-waving :)

-mel via cell

On Aug 6, 2023, at 7:04 PM, Rubens Kuhl <rubensk at gmail.com<mailto:rubensk at gmail.com>> wrote:

?


On Sun, Aug 6, 2023 at 8:20?PM Mel Beckman <mel at beckman.org<mailto:mel at beckman.org>> wrote:
Or one can read recent research papers that thoroughly document the incredible fragility of the existing NTP hierarchy and soberly consider their recommendations for remediation:

The paper suggests the compromise of critical infrastructure. So, besides not using NTP, why not stop using DNS ? Just populate a hosts file with all you need.

BTW, the stratum-0 source you suggested is known to have been manipulated in the past (https://www.gps.gov/systems/gps/modernization/sa/), so you need to bet on that specific state actor not returning to old habits.

OTOH, 4 of the 5 servers I suggested have their own atomic clock, and you can keep using GPS as well. If GPS goes bananas on timing, that source will just be disregarded (one of the features of the NTP architecture that has been pointed out over and over in this thread and you keep ignoring it).

Rubens


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20230816/e0985aee/attachment.html>


More information about the NANOG mailing list