FW: Reliability of looking glass sites / rviews

Matthew Huff mhuff at ox.com
Sat Sep 16 09:48:32 UTC 2017


ASN 14607, and 129.77.0.0/16

After slightly over an hour after our power event where 100% of our equipment was down, this is what I saw at routeviews

BGP routing table entry for 129.77.0.0/16, version 24978989
Paths: (7 available, best #7, table default)
  Not advertised to any peer
  Refresh Epoch 1
  134708 3491 6939 46887 14607
    103.197.104.1 from 103.197.104.1 (123.108.254.70)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  3333 1273 6939 46887 14607
    193.0.0.56 from 193.0.0.56 (193.0.0.56)
      Origin IGP, localpref 100, valid, external
      Community: 1273:23000
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  8283 57866 6762 6939 46887 14607
    94.142.247.3 from 94.142.247.3 (94.142.247.3)
      Origin IGP, metric 0, localpref 100, valid, external
      Community: 6762:33 6762:16500 8283:15 57866:105
      unknown transitive attribute: flag 0xE0 type 0x20 length 0xC
        value 0000 205B 0000 0006 0000 000F 
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  24441 3491 3491 6939 46887 14607
    202.93.8.242 from 202.93.8.242 (202.93.8.242)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  20912 1267 1273 6939 46887 14607
    212.66.96.126 from 212.66.96.126 (212.66.96.126)
      Origin IGP, localpref 100, valid, external
      Community: 1273:23000 9035:50 9035:100 20912:65001
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  1221 4637 6939 46887 14607
    203.62.252.83 from 203.62.252.83 (203.62.252.83)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  2497 6939 46887 14607
    202.232.0.2 from 202.232.0.2 (202.232.0.2)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0


From: Tim Evens [mailto:tim at snas.io] 
Sent: Friday, September 15, 2017 10:45 AM
To: Matthew Huff <mhuff at ox.com>
Cc: morrowc.lists at gmail.com; nanog at nanog.org
Subject: Re: FW: Reliability of looking glass sites / rviews

You didn't mention details about which ASN or prefixes you were checking.  Are you referring to ASN 14607 that only advertises two prefixes 129.77.0.0/16 and 2620:0:2810::/48?

Based what we see over the weekend (using routeviews data), we see:

Event Start Time: 2017-09-09 11:29:23 UTC (2017-09-09 07:29:23 EDT)
Event End Time: 2017-09-09 13:31:30 UTC (2017-09-09 09:31:30 EDT)

Are the above times correct?

We see the routes withdraw and then come back.   For example: http://demo-rv.snas.io:3000/dashboard/db/prefix-history?orgId=2&var-prefix=129.77.0.0&var-prefix_len=16&var-asn_num=All&var-router_name=All&var-peer_name=All&from=1504908000000&to=1505203200000

When you checked routeviews, which router and peer were you looking at?  When you did a "show ip bgp ..." did you include the prefix length? If not, it would have then shown you 0/0 or 128/5, depending on which router you were on.


--Tim 





On 9/13/17, 8:43 AM, "NANOG on behalf of Matthew Huff" <nanog-bounces at nanog.org on behalf of mhuff at ox.com> wrote:

    Both should have been similar.
    
    In the first case we lost power to all of our BGP border routers that are peered with the upstream providers
    In the second case, I did an explicit “shut” on the interface connected to the upstream provider that appeared “stuck” after an hour after the outage.
    
    From: <christopher.morrow at gmail.com> on behalf of Christopher Morrow <morrowc.lists at gmail.com>
    Date: Wednesday, September 13, 2017 at 10:58 AM
    To: Matthew Huff <mhuff at ox.com>
    Cc: nanog2 <nanog at nanog.org>
    Subject: Re: Reliability of looking glass sites / rviews
    
    
    
    On Wed, Sep 13, 2017 at 5:30 AM, Matthew Huff <mhuff at ox.com<mailto:mhuff at ox.com>> wrote:
    This weekend our uninterruptible power supply became interruptible and we lost all circuits. While I was doing initial debugging of the problem while I waited on site power verification, I noticed that there was still paths being shown in rviews for the circuit that were down. This was over an hour after we went hard down and it took hours before we were back up.
    
    explicit vs implicit withdrawals causing different handling of the problem routes?
    
    I worked with our providers last night to verify there weren't any hanging static routes, etc... We shut the upstream circuit down and watched the convergence and saw that eventually all the paths disappeared. Given what we saw on Saturday, what would cause route-views to cache the paths that long?  Some looking glass sites only show what they are peered with or at most what their peers are peered with, that's why I've always used route-views.
    
    What looking glass sites other than route-views would people recommend?
    
    ripe ris.
    

 
 


More information about the NANOG mailing list