TCP session disconnection caused by Code Red?

Mon Aug 6 19:50:09 UTC 2001

> I've been told (but not given permission to forward details of
> who/how/what) that some major sites with a single router
> and relatively flat network topology are dying due to the ARP
> request flood that is being generated by Code Red scans on the
> inside of their border router choking the router.  Check the
> rate of ARP requests coming off your border router and see if
> it seems excessive; if so, that may be it.

2 points:

1. RFC826 appears to mandate only positive ARP caching. I can't
   see a reason why negative ARP caching shouldn't work this
   way:

   Keep only one ARP request in flight at a time. Retry ARPs
   a maximum of [5] times, separated by at least [1] second.
   After that, cache non-existance of a h/w address for that
   IP address for normal positive caching time. If you see any
   IP traffic inbound on that interface with that IP address,
   remove the negative cache. However, to get a positive cache
   entry you still need a valid ARP response (promiscuous or not).

   More formally, when address resolution is required:

   a) Look up IP address in ARP table
      i)   If entry is PRESENT (i.e. h/w address OK)
           return this value.
      ii)  If entry is NEXIST return ARP failure
           immediately (i.e. as a router, drop into
           the code where no route is found - on Cisco
           this would be rate-limited unreachables)
      iii) If entry is INCOMPLETE[\d] go to (b) performing
           further packet transmission (i.e. transmitting
           an ARP packet ONLY if the entry is fully aged
           (i.e. otherwise perform
           your RFC826 compatible / current operation
           without transmitting another ARP packet)
      iv)  If entry is absent, transmit ARP packet
           as normal, set entry to INCOMPLETE[0] and go to (b)
   b) [this is the action we perform if we don't yet
      know the h/w address]. RFC826 suggests returning
      allowing a higher layer to retransmit, though I
      suppose blocking is theoretically possible

   If a valid ARP response is received (promiscuous or
   otherwise), remove any existing entry, and generate
   a PRESENT entry.

   If /any/ packet is received from with a valid IP
   address remove an NEXIST entry if present (on the
   ARP table for the interface on which it was received only)
   [this check is arguably too thorough as it will remove
   valid NEXIST entries for IP addresses that exist, but behind
   a router on the current subnet, rather than on it directly,
   though this is (a) better than nothing, and (b) required
   to support proxy ARP properly; note that you can't rely
   on the MAC address being that of the IP though - still have
   to ARP]

   Age INCOMPLETE[n] states to INCOMPLETE[n+1] states after
   [t1] seconds (probably about 1 second), for n<N, and to
   NEXIST for n>=N (N is probably about 5)

   Age NEXIST state to deleted after about [t2] seconds (where
   t2 is probably close to the arp timeout - i.e. about 300)

   INCOMPLETE essentially means PENDING

2. It has been observed that Cisco products in particular do not
   handle ARP storms well. Even worse is the Catalyst 5[50]00. This
   may have been fixed since I saw it. The application in which I
   saw it seriously merited having a linux box or similar 'proxy'-arp
   all non-existant addresses to null. You can probably achieve the
   same result with static arp entries to a non-existant h/w address.

Alex Bligh
Personal Capacity