Thousands of hosts on a gigabit LAN, maybe not

Sun May 10 16:16:42 UTC 2015

On 10/05/2015 00:33, Karl Auer wrote:
> Would be interesting to see how IPv6 performed, since is one of the
> things it was supposed to be able to deliver - massively scalable links
> (equivalent to an IPv4 broadcast domain) via massively reduced protocol
> chatter (IPv6 multicast groups vs IPv4 broadcast), plus fully automated
> L3 address assignment.

It will perform badly because putting large numbers of hosts in a single
broadcast domain is a bad idea, no matter what the protocol.

If you have a very large L2 domain and if you use router advertisements to
handle your default gateway announcement, you'll probably end up trashing
your routers due to periodic neighbor solicitation messages.  If you don't
use tight timers, your failover convergence time will be trash.  On the
other hand, the tighter the timers, the more you'll trash your routers,
particularly if there is a failover event - in other words, exactly when
you don't want to stress the network.

In the best case, the gateway unavailability mttr will be around 5-10
seconds and it will be non-deterministic.  This means that if you want
router failover which actually works, you will need to use a first-hop
routing protocol like vrrp or similar.

You will probably want to disable all multicast snooping on your network
because of ipv6 chatter.  Pushing state requirements into the L2 forwarding
mechanism turns out not to be a good idea especially at scale - see the
bimajority.org url that someone else posted on this thread, which is as
much about poor switch implementation as it is about poor protocol design
and solving problems that are a lot less relevant on today's networks.
This will mean that you will also need to manually prune the scope of your
dot1q network domain because otherwise the multicast chatter will be
spammed network-wide across all vlans on which it's defined.

RA gives the operator no way of controlling which IP address is assigned to
which hosts, which means that the operator of the large l2 domain is likely
to want to disable SLAAC if they plan to have any input on what IP address
is assigned to what host.  This may or may not be important to the
operator.  If it's hosts on a hot-seated corporate lan, probably it doesn't
matter too much.  If it's a service provider selling ipv6 services, it
matters a lot.

Regardless of whether this is the case, RA guard on each end-point is a
necessity and if you don't have it, your control plane will be compromised.
 RA guard is more complicated than ARP / DHCP guard and is not well
supported on a lot of hardware.

Finally, if you have a connectivity problem with your large l2 domain, your
problem surface area is much greater than if you segment your network into
smaller chunks, which allows the scope of your outage to be a lot larger.

Nick