TWC (AS11351) blocking all NTP?

Mon Feb 3 20:38:49 UTC 2014

On Mon, Feb 3, 2014 at 2:42 PM, Peter Phaal <peter.phaal at gmail.com> wrote:
> On Mon, Feb 3, 2014 at 10:16 AM, Christopher Morrow
> <morrowc.lists at gmail.com> wrote:
>> On Mon, Feb 3, 2014 at 12:42 PM, Peter Phaal <peter.phaal at gmail.com> wrote:

>> There's certainly the case that you could drop acls/something on
>> equipment to selectively block the traffic that matters... I suspect
>> in some cases the choice was: "50% of the edge box customers on this
>> location are a problem, block it across the board here instead of X00
>> times" (see concern about tcam/etc problems)
>
> I agree that managing limited TCAM space is critical to the
> scaleability of any mitigation solution. However, tying up TCAM space
> on every edge device with filters to prevent each new threat is likely

yup, there's a tradeoff, today it's being made one way, tomorrow
perhaps a different way. My point was that today the percentage of sdn
capable devices is small enough that you still need a decimal point to
measure it. (I bet, based on total devices deployed) The percentage of
oss backend work done to do what you want is likely smaller...

the folk in NZ-land (Citylink, reannz ... others - find josh baily /
cardigan) are making some strides, but only in the exchange areas so
far. fun stuff... but not the deployed gear as an L2/L3 device in
TWC/Comcast/Verizon.

>>> Real-time analytics based on measurements from switches/routers
>>> (sFlow/PSAMP/IPFIX) can identify large UDP flows and integrated hybrid
>>> OpenFlow, I2RS, REST, NETCONF APIs, etc. can be used to program the
>>> switches/routers to selectively filter traffic based on UDP port and
>>> IP source / destination. By deploying a DDoS mitigation SDN
>>> application,  providers can use their existing infrastructure to
>>> protect their own and their customers networks from flood attacks, and
>>> generate additional revenue by delivering flood protection as a value
>>> added service.
>>
>> yup, that sounds wonderous... and I'm sure that in the future utopian
>> world (like 7-10 years from now, based on age-out of gear and OSS IT
>> change requirements) we'll see more of this. I don't think you'll see
>> much (in terms of edge ports on the network today) of this happening
>> 'right now' though.
>
> The current 10G upgrade cycle provides an opportunity to deploy

by 'current 10g upgrade cycle' you mean the one that happened 2-5 yrs
ago? or somethign newer? did you mean 100G?

> equipment that is SDN capable. The functionality required for this use
> case is supported by current generation merchant silicon and is widely
> available right now in inexpensive switches.
>

right... and everyone is removing their vendor supported gear and
replacing it with pica8 boxes? The reality is that as speeds/feeds
have increased over the last while basic operations techiques really
haven't. Should they? maybe? will they? probably? is that going to
happen on a dime? nope. Again, I suspect you'll see smaller
deployments of sdn-like stuff 'soon' and larger deployments when
people are more comfortable with the operations/failure modes that
change.

>>> Specifically looking at sFlow, large flood attacks can be detected
>>> within a second. The following article describes a simple example
>>> using integrated hybrid OpenFlow in a 10/40G ToR switch:
>>
>> hopefully there's some clamp on how much change per device/port you
>> plan too? :) I'd hate to see the RP/RE/etc get so busy programming
>> tcam that bgp/isis/ospf/etc flaps :(
>
> With integrated hybrid OpenFlow, there is very little activity on the
> OpenFlow control plane. The normal BGP, ECMP, LAG, etc. control planes
> handles forwarding of packets. OpenFlow is only used to selectively
> override specific FIB entries.

that didn't really answer the question :) if I have 10k customers
behind the edge box and some of them NOW start being abused, then more
later and that mix changes... if it changes a bunch because the
attacker is really attackers. how fast do I change before I can't do
normal ops anymore?

> Typical networks probably only see a few DDoS attacks an hour at the
> most, so pushing a few rules an hour to mitigate them should have
> little impact on the switch control plane.

based on what math did you get 'few per hour?' As an endpoint (focal
point) or as a contributor? The problem that started this discussion
was being a contributor...which I bet happens a lot more often than
/few an hour/.

> A good working definition of a large flow is 10% of a link's
> bandwidth. If you only trigger actions for large flows then in the
> worst case you would only require 10 rules per port to change how
> these flows are treated.

10% of a 1g link is 100mbps, For contributors to ntp attacks, many of
the contributors are sending ONLY 300x the input, so less than
100mbps. On a 10g link it's 1G... even more hidden.

This math and detection aren't HARD, but tuning it can be a bit challenging.

>>> http://blog.sflow.com/2014/01/physical-switch-hybrid-openflow-example.html
>>>
>>> The example can be modified to target NTP mon_getlist requests and
>>> responses using the following sFlow-RT flow definition:
>>>
>>> {'ipdestination,udpsourceport',value:'ntppvtbytes',filter:'ntppvtreq=20,42'}
>>>
>>> or to target DNS ANY requests:
>>>
>>> {keys:'ipdestination,udpsourceport',value:'frames',filter:'dnsqr=true&dnsqtype=255'}
>>>
>>
>> this also assume almost 1:1 sampling... which might not be feasible
>> either...otherwise you'll be seeing fairly lossy results, right?
>
> Actually, to detect large flows (defined as 10% of link bandwidth)
> within a second, you would only require the following sampling rates:

your example requires seeing the 1st packet in a cycle, and seeing
into the first packet. that's going to required either acceptance of
loss (and gathering the loss in another rule/fashion) or 1:1 sampling
to be assured of getting ALL of the DNS packets and seeing what was
queried.

I wonder also about privacy concerns with this.

>>
>>> The OpenFlow block control can be modified to selectively filter UDP
>>> traffic based on the identified UDP source port and destination IP
>>> address.
>>>
>>
>> hopefully your OSS and netflow/sflow collection isn't also being used
>> for traffic engineering/capacity planning purposes? else... you might
>> get odd results from that infrastructure with such changes to the
>> sflow/netflow sender platform.
>
> This use case might be more problematic for NetFlow since obtaining
> the measurements may affect the router configuration (flow cache
> definitions) and other applications that depend on them (like capacity
> planning). In the case of sFlow monitoring, the flow cache is built
> externally and you can feed the sFlow to multiple independent analysis
> tools without risk of interference.
>
> http://blog.sflow.com/2013/05/software-defined-analytics.html

provided your device does sflow and can export to more than one
destination, sure.