Spitballing IoT Security

Ronald F. Guilmette rfg at tristatelogic.com
Thu Oct 27 05:28:48 UTC 2016


In message <58112F9F.6060705 at vaxination.ca>, 
Jean-Francois Mezei <jfmezei_nanog at vaxination.ca> wrote:

>A camera showing the baby in 4K resolution along witgh sounds of him
>crying on dolby surround to the mother who is at work would likely
>saturate upload just as much as the virus sending DNS requests. This
>falls into the tonne of feathers weighting as much as a tonne of lead
>category.

Agreed.

So the "solution" is either to define such devices out of the problem
set (i.e. to say "that is not really an IoT type device") or else to
find some other solution.

Questions:

Does the 4K baby monitor (or the 4K 7-11 parking lot cam) need to be
sending its high-bandwidth outbound feed, arbitrarily, to absolutely
anything?  Could it instead reasonably be limited to sending its high-
bitrate data _only_ back to just those clients which themselves had
first made an inbound TCP connection to the thing?  (Note: The video
data itself wouldn't necessarily have to travel back to the client
via the TCP session.  It could be sent back to the client via a separate
and parallel UDP data link directed back to the same IP that initiated,
and that is currently holding open the TCP session.  I think that this
is kinda/sorta how FTP works, actually.)

IoT devices that need to send a *lot* of data out can be programmed
to only send such high-rate/high-bulk data to client IP addresses that
currently have live TCP sessions open... ones which the clients them-
selves initiated...  and the kernel can be made to enforce this simple
restriction.  Problem solved.  No DDoSing of arbitrary IP addreses here!
Move along.

Alternatively, in the model where the security camera needs, for whatever
reason, silly or otherwise... to be the one that initiates an outbound
connection (and then just blasts its data up through that) it seems to
me that it should not be too awfully hard to minimally enforce, in the
kernel, just step 1 of a kind of SMTP-ish protocol... one where the IoT
device initiates the outbound connection, to an IP address of its choosing
(perhaps after having done a DNS lookup to find it) and where the IoT
device then does nothing unless and until it gets some kind of affirmative
signal from the other side...  like a 2xx banner/greeting which effectively
says to the IoT device "I'm here, and you are Clear-To-Send."  (Again, it
isn't necessary for the IoT device to send the actual data stream up through
this TCP connection.  It could be sent via UDP, but only to the pre-verified
"cloud" IP address that we have already established is willing to accept
the bulk data.)

Of course, it would be best if there were some sort of a standardized
port number and protocol for this specific kind of IoT-to-Cloud interaction.
It would surely cause problems to try to overload these semantics on top
of, say, port 25.

So, in summary, it isn't even necessary to define video cameras out of the
"IoT" problem set.  The problem is excessive outbound data flowing to
perfectly arbitrary "victim" IP addresses.  (And remember, even attack
reflectors are victims too.)  Given the problem statement, the solution
is obvious:  You gotta start building boxes that have kernel-enforced
restrictions that fit one or the other of these two models:

     1) Ordinary (non-cloud-oriented) things MUST either:

        1a)  be prevented from sending large amounts of data outbound AT ALL,
             ever, or else

        1b)  be prevented from send large amounts of data outbound *except*
             when explicitly requested to do so by some verified IP address...
             which is to say an IP address that has initiated and completed an
             inbound TCP handshake

     2)  Cloud-Oriented things MUST be prevented from sending "unsolicited"
         bulk data to any IP address other than one which has very explicitly
         consented to receive it, i.e. by accepting and completing an inbound
         TCP handshake on some specially reserved port, and perhaps also via
         some additional layered trivial RTS/CTS protocol.

That's it.  Simple, no?  Implementation is of course, completely trivial,
just as it is for all things that I myself don't actually have to write
the code for.


Regards,
rfg



More information about the NANOG mailing list