NTP Sync Issue Across Tata (Europe)

Tom Beecher beecher at beecher.cc
Wed Aug 16 14:55:59 UTC 2023


>
> So, probably not a failure "caused by GPS", rather one caused by poor
> design (only two clock sources) combined with unsupported and buggy
> devices.


100% correct. From the PDF :

4.31 JT summarised its findings in relation to the ‘Panic Timer’ on the
> Cisco IOS XR NTP Client, namely that: JT’s efforts in understanding the
> root cause, and mitigation steps to take to avoid future incidents have
> focused on the Cisco NTP Client behaviour, and notably Cisco’s decision to
> not implement the ‘Panic Timer’ on their IOS XR operating system. Arguably,
> whilst the NTP server injected an invalid time into the network, it is the
> NTP Clients filtering and selection algorithms which are responsible for
> detecting and disregarding falsetickers, and it was the Cisco NTP Clients
> failure to appropriately handle this which triggered the network incident.
> 43 […] Further detailed soak testing, log analysis and debug analysis
> corroborated that the Cisco IOS XR NTP Client did not implement the ‘Panic
> Timer’ that would normally cause an NTP Client to ignore an NTP Server
> exceeding 1000 seconds variance.


On Wed, Aug 16, 2023 at 10:50 AM Mel Beckman <mel at beckman.org> wrote:

> So, probably not a failure "caused by GPS", rather one caused by poor
> design (only two clock sources) combined with unsupported and buggy
> devices.
>
>
>
>  -mel beckman
>
> On Aug 16, 2023, at 3:51 AM, Matthew Richardson <matthew-l at itconsult.co.uk>
> wrote:
>
> Mel Beckman wrote:-
>
> Do you have a citation for your Jersey event? I doubt GPS caused the
> problem, but I'd like to see the documentation.
>
>
> The event took place on the evening of Sunday 12 July 2020, and seems NOT
> to have been due to an issue caused directly by GPS, but rather to
> misbehaviour of a GPS NTP server relating to week numbers.  Our regulator
> subsequently issued the following comprehensive document:-
>
>
> https://www.jcra.je/media/598397/t-027-jt-july-2020-outage-decision-directions.pdf
>
> By way of summary, JT operated two GPS derived NTP servers, with all of
> their routers were pointing to both.  On the evening in question, one of
> the two reset its clock back to 27 November 2000.
>
> Their interior routing protocol used amongst their mesh of routers was
> IS-IS which was using authentication.  The authentication [section 4.19]
> was described having a "password validity start date" of 01 July 2012.
> Thus, any routers which had picked up the time from the faulty source no
> longer had valid IS-IS authentication and were thus isolated.
>
> Whilst only 15% of their routers were affected, this was enough to cause an
> almost total failure in their network, affecting telephony (fixed & mobile)
> and Internet.  For foreign readers (this is NANOG!) "999" calls refer to
> the emergency services in these parts, where any failures attract the
> attention of our regulator.
>
> The details of why the clock "failed" start at section 4.23, and seem to
> relate a GPS week number rollover.
>
> So, probably not a failure "caused by GPS", rather one caused by poor
> design (only two clock sources) combined with unsupported and buggy
> devices.
>
> One curious aspect is that some routers followed the "bad" time, which is
> alluded to in section 4.31.
>
> Something not discussed in that report is that JT's email failed during the
> incident despite its being hosted on Office365.  The reason was that the
> two authoritative DNS servers for jtglobal.com were hosted in Jersey
> inside
> their network.  As that network was wholly disconnected, there was no DNS
> and hence no email.  Despite my having raised this since with their senior
> management, their DNS remains hosted in this way:-
>
> matthew at m88:~$ dig +norec +noedns +nocmd +nostats -t ns jtglobal.com @
> ns1.jtibs.net
>
> ;; Got answer:
>
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20462
>
> ;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 4
>
>
> ;; QUESTION SECTION:
>
> ;jtglobal.com.            IN    NS
>
>
> ;; ANSWER SECTION:
>
> jtglobal.com.        60    IN    NS    ns2.jtibs.net.
>
> jtglobal.com.        60    IN    NS    ns1.jtibs.net.
>
>
> ;; ADDITIONAL SECTION:
>
> ns1.jtibs.net.        60    IN    A    212.9.0.135
>
> ns2.jtibs.net.        60    IN    A    212.9.0.136
>
> ns1.jtibs.net.        60    IN    AAAA    2a02:c28::d1
>
> ns2.jtibs.net.        60    IN    AAAA    2a02:c28::d2
>
>
> Rediculously (and again despite my agitation to their management) our
> government domain gov.je has similar DNS fragility:-
>
> matthew at m88:~$ dig +norec +noedns +nocmd +nostats -t ns gov.je @ns1.gov.je
>
> ;; Got answer:
>
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4249
>
> ;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
>
>
> ;; QUESTION SECTION:
>
> ;gov.je.                IN    NS
>
>
> ;; ANSWER SECTION:
>
> gov.je.            3600    IN    NS    ns2.gov.je.
>
> gov.je.            3600    IN    NS    ns1.gov.je.
>
>
> ;; ADDITIONAL SECTION:
>
> ns2.gov.je.        3600    IN    A    212.9.21.137
>
> ns1.gov.je.        3600    IN    A    212.9.21.9
>
>
> --
> Best wishes,
> Matthew
>
> ------
>
> From: Mel Beckman <mel at beckman.org>
>
> To: Matthew Richardson <matthew-l at itconsult.co.uk>
>
> Cc: Nanog <nanog at nanog.org>
>
> Date: Tue, 8 Aug 2023 15:12:29 +0000
>
> Subject: Re: NTP Sync Issue Across Tata (Europe)
>
>
> Until the Internet NTP network can be made secure, no. Do you have a
> citation for your Jersey event? I doubt GPS caused the problem, but I'd
> like to see the documentation.
>
>
> Using GPS for time sync is simple risk management: the risk of Internet
> NTP with known, well documented vulnerabilities and many security
> incidents, versus the risk of some theoretical GPS-based vulnerability, for
> which mitigations such as geographic diversity are readily available. Sure,
> you could use Internet NTP as a last resort should GPS fail globally
> (perhaps due to a theoretical - but conceivable - meteor storm). But that
> would be a fall-back. I would not mix the systems.
>
>
> -mel
>
>
> On Aug 8, 2023, at 1:36 AM, Matthew Richardson <matthew-l at itconsult.co.uk>
> wrote:
>
>
> ?Mel Beckman wrote:-
>
>
> It's a problem that has received a lot of attention in both NTP and
>
> aviation navigation circles. What is hard to defend against is total signal
>
> suppression via high powered jamming. But that you can do with a
>
> geographically diverse GPS NTP network.
>
>
> Whilst looking forward to being corrected, GPS (even across multiple
>
> locations) seems to be a SINGLE source of time.  You seem (have I
>
> misunderstood?) to be a proponent of using GPS exclusively as the external
>
> clock source.
>
>
> Might it be preferable to have a mixture of GPS (perhaps with another GNSS)
>
> together with carefully selected Internet-based NTP servers?
>
>
> I recall an incident over here in Jersey (the one they named New Jersey
>
> after!) where our primary telco had a substantial time shift on one of
>
> their two GPS synced servers.  This managed to adjust the clock on enough
>
> of their routers that the certificate-based OSPF authentication considered
>
> the certificates invalid, and caused a failure of almost their whole
>
> network.
>
>
> This is, of course, not to say that GPS is not a very good clock source,
>
> but rather to wonder whether more diversity would be preferable than using
>
> it as a single source.
>
>
> --
>
> Best wishes,
>
> Matthew
>
>
> ------
>
> From: Mel Beckman <mel at beckman.org>
>
> To: "Forrest Christian (List Account)" <lists at packetflux.com>
>
> Cc: Nanog <nanog at nanog.org>
>
> Date: Mon, 7 Aug 2023 14:03:30 +0000
>
> Subject: Re: NTP Sync Issue Across Tata (Europe)
>
>
> Forrest,
>
>
> GPS spoofing may work with a primitive Raspberry Pi-based NTP server, but
> commercial industrial NTP servers have specific anti-spoofing mitigations.
> There are also antenna diversity strategies that vendors support to ensure
> the signal being relied upon is coming from the right direction. It's a
> problem that has received a lot of attention in both NTP and aviation
> navigation circles. What is hard to defend against is total signal
> suppression via high powered jamming. But that you can do with a
> geographically diverse GPS NTP network.
>
>
> -mel
>
>
> On Aug 7, 2023, at 1:39 AM, Forrest Christian (List Account) <
> lists at packetflux.com> wrote:
>
>
> ?
>
> The problem with relying exclusively on GPS to do time distribution is the
> ease with which one can spoof the GPS signals.
>
>
> With a budget of around $1K, not including a laptop, anyone with decent
> technical skills could convince a typical GPS receiver it was at any
> position and was at any time in the world.   All it takes is a decent
> directional antenna, some SDR hardware, and depending on the location and
> directivity of your antenna maybe a smallish amplifier.   There is much
> discussion right now in the PNT (Position, Navigation and Timing) community
> as to how best to secure the GNSS network, but right now one should
> consider the data from GPS to be no more trustworthy than some random NTP
> server on the internet.
>
>
> In order to build a resilient NTP server infrastructure you need multiple
> sources of time distributed by multiple methods - typically both via
> satellite (GPS) and by terrestrial (NTP) methods.   NTP does a pretty good
> job of sorting out multiple time servers and discarding sources that are
> lying.  But to do this you need multiple time sources.  A common
> recommendation is to run a couple/few NTP servers which only get time from
> a GPS receiver and only serve time to a second tier of servers that pull
> from both those in-house GPS-timed-NTP servers and other trusted NTP
> servers.   I'd recommend selecting the time servers to gain geographic
> diversity, i.e. poll NIST servers in Maryland and Colorado, and possibly
> both.
>
>
> Note that NIST will exchange (via mail) a set of keys with you to talk
> encrypted NTP with you.   See
> https://www.nist.gov/pml/time-and-frequency-division/time-services/nist-authenticated-ntp-service
> .
>
>
>
>
> On Sun, Aug 6, 2023 at 8:36?PM Mel Beckman <mel at beckman.org<mailto:
> mel at beckman.org>> wrote:
>
> GPS Selective Availability did not disrupt the timing chain of GPS, only
> the ephemeris (position information).  But a government-disrupted timebase
> scenario has never occurred, while hackers are a documented threat.
>
>
> DNS has DNSSec, which while not deployed as broadly as we might like, at
> least lets us know which servers we can trust.
>
>
> Your own atomic clocks still have to be synced to a common standard to be
> useful. To what are they sync'd? GPS, I'll wager.
>
>
> I sense hand-waving :)
>
>
> -mel via cell
>
>
> On Aug 6, 2023, at 7:04 PM, Rubens Kuhl <rubensk at gmail.com<mailto:
> rubensk at gmail.com>> wrote:
>
>
> ?
>
>
>
> On Sun, Aug 6, 2023 at 8:20?PM Mel Beckman <mel at beckman.org<mailto:
> mel at beckman.org>> wrote:
>
> Or one can read recent research papers that thoroughly document the
> incredible fragility of the existing NTP hierarchy and soberly consider
> their recommendations for remediation:
>
>
> The paper suggests the compromise of critical infrastructure. So, besides
> not using NTP, why not stop using DNS ? Just populate a hosts file with all
> you need.
>
>
> BTW, the stratum-0 source you suggested is known to have been manipulated
> in the past (https://www.gps.gov/systems/gps/modernization/sa/), so you
> need to bet on that specific state actor not returning to old habits.
>
>
> OTOH, 4 of the 5 servers I suggested have their own atomic clock, and you
> can keep using GPS as well. If GPS goes bananas on timing, that source will
> just be disregarded (one of the features of the NTP architecture that has
> been pointed out over and over in this thread and you keep ignoring it).
>
>
> Rubens
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20230816/69ef87ff/attachment.html>


More information about the NANOG mailing list