Linux BNG

Sat Jul 14 17:09:37 UTC 2018

interspersed comments ....

On 07/14/2018 06:13 AM, Baldur Norddahl wrote:
> I am investigating Linux as a BNG. The BNG (Broadband Network Gateway) 
> being the thing that acts as default gateway for our customers.
> 
> The setup is one VLAN per customer. Because 4095 VLANs is not enough, we 
> have QinQ with double VLAN tagging on the customers. The customers can 
> use DHCP or static configuration. DHCP packets need to be option82 
> tagged and forwarded to a DHCP server. Every customer has one or more 
> static IP addresses.

Where do you have this happening?  Do you have aggregation switches 
doing this?  Are those already in place, or being planned?  Because I 
would make a suggestion for how to do the aggregation.

> IPv4 subnets need to be shared among multiple customers to conserve 
> address space. We are currently using /26 IPv4 subnets with 60 customers 
> sharing the same default gateway and netmask. In Linux terms this means 
> 60 VLAN interfaces per bridge interface.

I suppose it could be made to work, but forcing a layer 3 boundary over 
a bunch of layer 2 boundaries, seems to be a bunch of work, but I 
suppose that would be the brute force and ignorance approach from the 
mechanisms you would be using.

> However Linux is not quite ready for the task. The primary problem being 
> that the system does not scale to thousands of VLAN interfaces.

It probably depends upon which Linux based tooling you wish to use. 
There are some different ways of looking at this which scale better.

> We do not want customers to be able to send non routed packets directly 
> to each other (needs proxy arp). Also customers should not be able to 
> steal another customers IP address. We want to hard code the relation 
> between IP address and VLAN tagging. This can be implemented using 
> ebtables, but we are unsure that it could scale to thousands of customers.

I would consider suggesting the concepts of VxLAN (kernel plus FRR 
and/or openvswitch) or OpenFlow.(kernel plus openvswitch)

VxLAN scales to 16 million vlan equivalents.  Which is why I ask about 
your aggregation layers.  Rather than trying to do all the addressing 
across all the QinQ vlans in the core boxes, the vlans/vxlans and 
addressing are best dealt with at the edge.  Then, rather than running a 
bunch of vlans through your aggregation/distribution links, you can keep 
those resilient with a layer 3 only based strategy.

> I am considering writing a small program or kernel module. This would 
> create two TAP devices (tap0 and tap1). Traffic received on tap0 with 
> VLAN tagging, will be stripped of VLAN tagging and delivered on tap1. 
> Traffic received on tap1 without VLAN tagging, will be tagged according 
> to a lookup table using the destination IP address and then delivered on 
> tap0. ARP and DHCP would need some special handling.

I don't think this would be needed.  I think all the tools are already 
available and are robust from daily use.  Free Range Routing with 
EVPN/(VxLAN|MPLS) for a traditional routing mix, or use OpenFlow tooling 
in Open vSwitch to handle the layer 2 and layer 3 rule definitions you 
have in mind.

Open vSwitch can be programmed via command line rules or can be hooked 
up to a controller of some sort.  So rather than writing your own kernel 
program, you would write rules for a controller or script which drives 
the already kernel resident engines.

> This would be completely stateless for the IPv4 implementation. The IPv6 
> implementation would be harder, because Link Local addressing needs to 
> be supported and that can not be stateless. The customer CPE will make 
> up its own Link Local address based on its MAC address and we do not 
> know what that is in advance.

FRR and OVS are IPv4 and IPv6 aware. The dynamics of the CPE MAC would 
be handled in various ways, depending upon what tooling you decide upon.

> The goal is to support traffic of minimum of 10 Gbit/s per server. 
> Ideally I would have a server with 4x 10 Gbit/s interfaces combined into 
> two 20 Gbit/s channels using bonding (LACP). One channel each for 
> upstream and downstream (customer facing). The upstream would be layer 3 
> untagged and routed traffic to our transit routers.

As mentioned earlier, why make the core boxes do all of the work?  Why 
not distribute the functionality out to the edge?  Rather than using 
traditional switch gear at the edge, use smaller Linux boxes to handle 
all that complicated edge manipulation, and then keep your high 
bandwidth core boxes pushing packets only.

> I am looking for comments, ideas or alternatives. Right now I am 
> considering what kind of CPU would be best for this. Unless I take steps 
> to mitigate, the workload would probably go to one CPU core only and be 
> limited to things like CPU cache and PCI bus bandwidth.

There is much more to write about, but those writings would depend up on 
what you already have in place, what you would like to put in place, and 
how you wish to segment your network.

Hope this helps.

> Baldur

-- 
Raymond Burkholder
ray at oneunified.net
https://blog.raymond.burkholder.net