TWC (AS11351) blocking all NTP?

Christopher Morrow morrowc.lists at gmail.com
Mon Feb 3 22:58:21 UTC 2014


wait, so the whole of the thread is about stopping participants in the
attack, and you're suggesting that removing/changing end-system
switch/routing gear and doing something more complex than:
  deny udp any 123 any
  deny udp any 123 any 123
  permit ip any any

is a good plan?

I'd direct you at:
  <https://www.nanog.org/resources/tutorials>

and particularly at:
 "Tutorial: ISP Security - Real World Techniques II"
 <https://www.nanog.org/meetings/nanog23/presentations/greene.pdf>

On Mon, Feb 3, 2014 at 5:16 PM, Peter Phaal <peter.phaal at gmail.com> wrote:
> On Mon, Feb 3, 2014 at 12:38 PM, Christopher Morrow
> <morrowc.lists at gmail.com> wrote:
>> On Mon, Feb 3, 2014 at 2:42 PM, Peter Phaal <peter.phaal at gmail.com> wrote:
>>> On Mon, Feb 3, 2014 at 10:16 AM, Christopher Morrow
>>> <morrowc.lists at gmail.com> wrote:
>>>> On Mon, Feb 3, 2014 at 12:42 PM, Peter Phaal <peter.phaal at gmail.com> wrote:
>>
>>>> There's certainly the case that you could drop acls/something on
>>>> equipment to selectively block the traffic that matters... I suspect
>>>> in some cases the choice was: "50% of the edge box customers on this
>>>> location are a problem, block it across the board here instead of X00
>>>> times" (see concern about tcam/etc problems)
>>>
>>> I agree that managing limited TCAM space is critical to the
>>> scaleability of any mitigation solution. However, tying up TCAM space
>>> on every edge device with filters to prevent each new threat is likely
>>
>> yup, there's a tradeoff, today it's being made one way, tomorrow
>> perhaps a different way. My point was that today the percentage of sdn
>> capable devices is small enough that you still need a decimal point to
>> measure it. (I bet, based on total devices deployed) The percentage of
>> oss backend work done to do what you want is likely smaller...
>>
>> the folk in NZ-land (Citylink, reannz ... others - find josh baily /
>> cardigan) are making some strides, but only in the exchange areas so
>> far. fun stuff... but not the deployed gear as an L2/L3 device in
>> TWC/Comcast/Verizon.
>
> I agree that today most networks aren't SDN ready, but there are
> inexpensive switches on the market that can perform these functions
> and for providers that have them in their network, this is an option
> today. In some environments, it could also make sense to drop in a
> layer switches to monitor and control traffic entering / exiting the
> network.

it's probably not a good plan to forklift your edge, for dos targets
where all you really need is a 3 line acl.

>
>>> The current 10G upgrade cycle provides an opportunity to deploy
>>
>> by 'current 10g upgrade cycle' you mean the one that happened 2-5 yrs
>> ago? or somethign newer? did you mean 100G?
>
> I was referring to the current upgrade cycle in data centers, with
> servers connected with 10G rather than 1G adapters. The high volumes
> are driving down the cost of 10/40/100G switches.

again, lots of cost and churn for 3 lines of acl... I'm not sold.

>>> With integrated hybrid OpenFlow, there is very little activity on the
>>> OpenFlow control plane. The normal BGP, ECMP, LAG, etc. control planes
>>> handles forwarding of packets. OpenFlow is only used to selectively
>>> override specific FIB entries.
>>
>> that didn't really answer the question :) if I have 10k customers
>> behind the edge box and some of them NOW start being abused, then more
>> later and that mix changes... if it changes a bunch because the
>> attacker is really attackers. how fast do I change before I can't do
>> normal ops anymore?
>
> Good point - the proposed solution is most effective for protecting
> customers that are targeted by DDoS attacks. While trying to prevent

Oh, so the 3 line acl is not an option? or (for a lot of customers a
fine answer) null route? Some things have changed in the world of dos
mitigation, but a bunch of the basics still apply. I do know that in
the unfortunate event that your network is the transit or terminus of
a dos attack at high volume you want to do the least configuration
that'll satisfy the 2 parties involved (you and your customer)...
doing a bunch of hardware replacement and/or sdn things when you can
get the job done with some acls or routing changes is really going to
be risky.

> attackers entering the network is good citizenship, the value and
> effectiveness of the mitigation service increases as you get closer to
> the target of the attack. In this case there typically aren't very
> many targets and so a single rule filtering on destination IP address
> and protocol would typically be effective (and less disruptive to the
> victim that null routing).
>
>>
>>> Typical networks probably only see a few DDoS attacks an hour at the
>>> most, so pushing a few rules an hour to mitigate them should have
>>> little impact on the switch control plane.
>>
>> based on what math did you get 'few per hour?' As an endpoint (focal
>> point) or as a contributor? The problem that started this discussion
>> was being a contributor...which I bet happens a lot more often than
>> /few an hour/.
>
> I am sorry, I should have been clearer, the SDN solution I was
> describing is aimed at protecting the target's links, rather than
> mitigating the botnet and amplification layers.

and i'd say that today sdn is out of reach for most deployments, and
that the simplest answer is already available.

> The number of attacks was from the perspective of DDoS targets and
> their service providers.  If you are considering each participant in
> the attack the number goes up considerably.

I bet roland has some good round-numbers on number of dos attacks per
day... I bet it's higher than a few per hour globally, for the ones
that get noticed.

>>> A good working definition of a large flow is 10% of a link's
>>> bandwidth. If you only trigger actions for large flows then in the
>>> worst case you would only require 10 rules per port to change how
>>> these flows are treated.
>>
>> 10% of a 1g link is 100mbps, For contributors to ntp attacks, many of
>> the contributors are sending ONLY 300x the input, so less than
>> 100mbps. On a 10g link it's 1G... even more hidden.
>>
>> This math and detection aren't HARD, but tuning it can be a bit challenging.
>
> Agreed - the technique is less effective for addressing the
> contributors to the attack. RPF and other edge controls should be

note that the focus of the original thread was on the contributors. I
think the target part of the problem has been solved since before the
slides in the pdf link at the top...

> applied, but until everyone participates and eliminates attacks at
> source, there is still a value in filtering close to the target of the
> attack.
>
>>
>>>>> http://blog.sflow.com/2014/01/physical-switch-hybrid-openflow-example.html
>>>>>
>>>>> The example can be modified to target NTP mon_getlist requests and
>>>>> responses using the following sFlow-RT flow definition:
>>>>>
>>>>> {'ipdestination,udpsourceport',value:'ntppvtbytes',filter:'ntppvtreq=20,42'}
>>>>>
>>>>> or to target DNS ANY requests:
>>>>>
>>>>> {keys:'ipdestination,udpsourceport',value:'frames',filter:'dnsqr=true&dnsqtype=255'}
>>>>>
>>>>
>>>> this also assume almost 1:1 sampling... which might not be feasible
>>>> either...otherwise you'll be seeing fairly lossy results, right?
>>>
>>> Actually, to detect large flows (defined as 10% of link bandwidth)
>>> within a second, you would only require the following sampling rates:
>>
>> your example requires seeing the 1st packet in a cycle, and seeing
>> into the first packet. that's going to required either acceptance of
>> loss (and gathering the loss in another rule/fashion) or 1:1 sampling
>> to be assured of getting ALL of the DNS packets and seeing what was
>> queried.
>
> The flow analysis is stateless - based on a random sample of 1 in N
> packets, you can decode the packet headers and determine the amount of
> traffic associated with specific DNS queries. If you are looking at

you're getting pretty complicated for the target side:
  ip access-list 150 permit ip any any log

(note this is basically taken verbatim from the slides)

view logs, see the overwhelming majority are to hostX port Y proto
Z... filter, done.
you can do that in about 5 mins time, quicker if you care to rush a bit.

> the traffic close to the target, there may be hundreds of thousands of
> DNS responses per second and so you very quickly determine the target
> IP address and can apply a filter to remove DNS traffic to that
> target.
>
>> provided your device does sflow and can export to more than one
>> destination, sure.
>
> This brings up an interesting point use case for an OpenFlow capable
> switch - replicating sFlow, NetFlow, IPFIX, Syslog, SNMP traps etc.
> Many top of rack switches can also forward the traffic through a
> GRE/VxLAN tunnel as well.

yes, more complexity seems like a great plan... in the words of
someone else: "I encourage my competitors to do this"

I think roland's other point that not very many people actually even
use sflow is not to be taken lightly here either.

-chris

> http://blog.sflow.com/2013/11/udp-packet-replication-using-open.html

Domain Name: SFLOW.COM
<snip>
Registry Registrant ID:
Registrant Name: PHAAL, PETER
Registrant Organization: InMon Corp.
<snip>



More information about the NANOG mailing list