TWC (AS11351) blocking all NTP?

Peter Phaal peter.phaal at gmail.com
Mon Feb 3 22:16:36 UTC 2014


On Mon, Feb 3, 2014 at 12:38 PM, Christopher Morrow
<morrowc.lists at gmail.com> wrote:
> On Mon, Feb 3, 2014 at 2:42 PM, Peter Phaal <peter.phaal at gmail.com> wrote:
>> On Mon, Feb 3, 2014 at 10:16 AM, Christopher Morrow
>> <morrowc.lists at gmail.com> wrote:
>>> On Mon, Feb 3, 2014 at 12:42 PM, Peter Phaal <peter.phaal at gmail.com> wrote:
>
>>> There's certainly the case that you could drop acls/something on
>>> equipment to selectively block the traffic that matters... I suspect
>>> in some cases the choice was: "50% of the edge box customers on this
>>> location are a problem, block it across the board here instead of X00
>>> times" (see concern about tcam/etc problems)
>>
>> I agree that managing limited TCAM space is critical to the
>> scaleability of any mitigation solution. However, tying up TCAM space
>> on every edge device with filters to prevent each new threat is likely
>
> yup, there's a tradeoff, today it's being made one way, tomorrow
> perhaps a different way. My point was that today the percentage of sdn
> capable devices is small enough that you still need a decimal point to
> measure it. (I bet, based on total devices deployed) The percentage of
> oss backend work done to do what you want is likely smaller...
>
> the folk in NZ-land (Citylink, reannz ... others - find josh baily /
> cardigan) are making some strides, but only in the exchange areas so
> far. fun stuff... but not the deployed gear as an L2/L3 device in
> TWC/Comcast/Verizon.

I agree that today most networks aren't SDN ready, but there are
inexpensive switches on the market that can perform these functions
and for providers that have them in their network, this is an option
today. In some environments, it could also make sense to drop in a
layer switches to monitor and control traffic entering / exiting the
network.

>> The current 10G upgrade cycle provides an opportunity to deploy
>
> by 'current 10g upgrade cycle' you mean the one that happened 2-5 yrs
> ago? or somethign newer? did you mean 100G?

I was referring to the current upgrade cycle in data centers, with
servers connected with 10G rather than 1G adapters. The high volumes
are driving down the cost of 10/40/100G switches.

>
>> equipment that is SDN capable. The functionality required for this use
>> case is supported by current generation merchant silicon and is widely
>> available right now in inexpensive switches.
>>
>
> right... and everyone is removing their vendor supported gear and
> replacing it with pica8 boxes? The reality is that as speeds/feeds
> have increased over the last while basic operations techiques really
> haven't. Should they? maybe? will they? probably? is that going to
> happen on a dime? nope. Again, I suspect you'll see smaller
> deployments of sdn-like stuff 'soon' and larger deployments when
> people are more comfortable with the operations/failure modes that
> change.

Not just Pica8, most vendors (branded or white box) are using the same
Broadcom merchant silicon, including Cisco, Juniper, Arista,
Dell/Force10, Extreme etc.:

http://blog.sflow.com/2014/01/drivers-for-growth.html

>
>>>> Specifically looking at sFlow, large flood attacks can be detected
>>>> within a second. The following article describes a simple example
>>>> using integrated hybrid OpenFlow in a 10/40G ToR switch:
>>>
>>> hopefully there's some clamp on how much change per device/port you
>>> plan too? :) I'd hate to see the RP/RE/etc get so busy programming
>>> tcam that bgp/isis/ospf/etc flaps :(
>>
>> With integrated hybrid OpenFlow, there is very little activity on the
>> OpenFlow control plane. The normal BGP, ECMP, LAG, etc. control planes
>> handles forwarding of packets. OpenFlow is only used to selectively
>> override specific FIB entries.
>
> that didn't really answer the question :) if I have 10k customers
> behind the edge box and some of them NOW start being abused, then more
> later and that mix changes... if it changes a bunch because the
> attacker is really attackers. how fast do I change before I can't do
> normal ops anymore?

Good point - the proposed solution is most effective for protecting
customers that are targeted by DDoS attacks. While trying to prevent
attackers entering the network is good citizenship, the value and
effectiveness of the mitigation service increases as you get closer to
the target of the attack. In this case there typically aren't very
many targets and so a single rule filtering on destination IP address
and protocol would typically be effective (and less disruptive to the
victim that null routing).

>
>> Typical networks probably only see a few DDoS attacks an hour at the
>> most, so pushing a few rules an hour to mitigate them should have
>> little impact on the switch control plane.
>
> based on what math did you get 'few per hour?' As an endpoint (focal
> point) or as a contributor? The problem that started this discussion
> was being a contributor...which I bet happens a lot more often than
> /few an hour/.

I am sorry, I should have been clearer, the SDN solution I was
describing is aimed at protecting the target's links, rather than
mitigating the botnet and amplification layers.

The number of attacks was from the perspective of DDoS targets and
their service providers.  If you are considering each participant in
the attack the number goes up considerably.

>
>> A good working definition of a large flow is 10% of a link's
>> bandwidth. If you only trigger actions for large flows then in the
>> worst case you would only require 10 rules per port to change how
>> these flows are treated.
>
> 10% of a 1g link is 100mbps, For contributors to ntp attacks, many of
> the contributors are sending ONLY 300x the input, so less than
> 100mbps. On a 10g link it's 1G... even more hidden.
>
> This math and detection aren't HARD, but tuning it can be a bit challenging.

Agreed - the technique is less effective for addressing the
contributors to the attack. RPF and other edge controls should be
applied, but until everyone participates and eliminates attacks at
source, there is still a value in filtering close to the target of the
attack.

>
>>>> http://blog.sflow.com/2014/01/physical-switch-hybrid-openflow-example.html
>>>>
>>>> The example can be modified to target NTP mon_getlist requests and
>>>> responses using the following sFlow-RT flow definition:
>>>>
>>>> {'ipdestination,udpsourceport',value:'ntppvtbytes',filter:'ntppvtreq=20,42'}
>>>>
>>>> or to target DNS ANY requests:
>>>>
>>>> {keys:'ipdestination,udpsourceport',value:'frames',filter:'dnsqr=true&dnsqtype=255'}
>>>>
>>>
>>> this also assume almost 1:1 sampling... which might not be feasible
>>> either...otherwise you'll be seeing fairly lossy results, right?
>>
>> Actually, to detect large flows (defined as 10% of link bandwidth)
>> within a second, you would only require the following sampling rates:
>
> your example requires seeing the 1st packet in a cycle, and seeing
> into the first packet. that's going to required either acceptance of
> loss (and gathering the loss in another rule/fashion) or 1:1 sampling
> to be assured of getting ALL of the DNS packets and seeing what was
> queried.

The flow analysis is stateless - based on a random sample of 1 in N
packets, you can decode the packet headers and determine the amount of
traffic associated with specific DNS queries. If you are looking at
the traffic close to the target, there may be hundreds of thousands of
DNS responses per second and so you very quickly determine the target
IP address and can apply a filter to remove DNS traffic to that
target.

> provided your device does sflow and can export to more than one
> destination, sure.

This brings up an interesting point use case for an OpenFlow capable
switch - replicating sFlow, NetFlow, IPFIX, Syslog, SNMP traps etc.
Many top of rack switches can also forward the traffic through a
GRE/VxLAN tunnel as well.

http://blog.sflow.com/2013/11/udp-packet-replication-using-open.html




More information about the NANOG mailing list