MPLS in the campus Network?

Thu Oct 20 13:43:26 UTC 2016

Dear NANOG members,

We operate a campus network reaching more than 100 buildings on 5 campuses.
We also operate a regional backbone and the interconnexion to our NREN.
The current architecture is made of a L2 backbone and a few routers.
Most of the buildings are connected with a 1 Gb/s link using our own
optical fiber
(only a few building are connected at 10 Gb/s).
In a smaller number of buildings (a few dozens), we also operate the
internal network, made of ethernet switches (in a multi-vendor environment).
In each building, we provide at least an edge switch, marking the boundary
between
us and the customer, where we deliver the different services on ethernet
ports.

The services we currently offer:
- L2 interconnections (400 vlans are present in 2 buildings or more;
only a few VLANs are present in more than 30 buildings)
- IPv4 et IPv6 routing (hundreds of subnets) and Internet access,
- specific interconnections (ex: terminating a VPN to the customer,
say a national private infrastructure delivered by the NREN through
MPLS L2/L3VPN and stitched to the customer network using a specific VLAN)
- routing isolation using routing instances (~ VRF Lite) : only 5
instances, but we could have more,
- routing and filtering using open source firewalls running on servers
in our DCs (less than 15 platforms, as most customer operate their
own firewall),
- user authentication,
- shared VPN platform allowing direct access for an identified user into
the customer network (based on radius attribute) - this platform uses
VLANs to interconnect to the rest of the network,
- wireless LAN, also allowing direct access for an identified user into
the customer network ; the platform is a centralized controller, and
it uses VLAN to interconnect to the rest of the network.
(those last two services could use just a VLAN or a dedicated subnet
delivered on a port of the edge switch which is then connected to the
customer firewall)

We are not satisfied with the current backbone design ; we had our share
of problems in the past:
- high CPU load on the core switches due to multiple instances of spanning
tree slowly converging when a topology change happens (somehow fixed
with a few instances of MSTP)
- spanning tree interoperability problems and spurious port blocking
(fixed by BPDU filtering)
- loops at the edge and broadcast/multicast storms (fixed with traffic
limits and port blocking based on threshhold)
- some small switches at the edge are overloaded with large numbers of
MAC addresses (fixed with reducing broadcast domain size and subnetting)

This architecture doesn't feel very solid.
Even if the service provisionning seems easy from an operational point
of view (create a VLAN and it is immediately available at any point of the
L2 backbone), we feel the configuration is not always consistent.
We have to rely on scripts pushing configuration elements and human
discipline (and lots of duct-tape, especially for QoS and VRFs).

We are re-designing our network architecture.
We have enough fiber to imagine many ways to link the core network
devices.
We find MPLS has its merit as a platform, to bring all the network services
we currently provide (L2, L3 VPN, VPLS, and soon EVPN)
However, we also want to upgrade the infrastructure to allow future growth
of the traffic. Some labs, especially in physics, could need more than 10
Gb/s
in the coming years. Our cycles of evolution are long (we keep a backbone
technology
for 8 years). MPLS is definitely not cheap considering the price of a 10G
or 100G
router interface.

Compared to MPLS, a L2 solution with 100 Gb/s interfaces between
core switches and a 10G connection for each buildings looks so much
cheaper. But we worry about future trouble using Trill, SPB, or other
technologies, not only the "open" ones, but specifically the proprietary
ones based on central controller and lots of magic (some colleagues feel
the debug nightmare are garanteed).

If you had to make such a choice recently, did you choose an MPLS design
even at lower speed ?
How would you convince your management that MPLS is the best solution for
your campus network ? How would you justify the cost or speed difference ?

Thanks for your insights!