Anycast but for egress

Glenn McGurrin nanog at cloudoptimizedsmb.com
Wed Jul 28 18:06:34 UTC 2021


I'd had a similar thought/question, though keeping the geo diversity, 
you manage the crawlers, and are making contact individually with these 
sites from what you have stated (and so don't need a one size fit's all 
list for public posting), so why not have a restricted subset of the 
crawlers handle sites with these issues (which subset may be unique per 
site, which makes maintaining even load balancing not overly complex 
/limiting, especially as you are using nat anyway, so multiple servers 
can be behind each ip and that number can vary).  That let's you have 
geo diversity (or even multi cloud diversity) for every site, but each 
site that needs this IP whitelisting only needs 3-5 IP's at any site, 
but yet you can distribute load over a much larger overall set of 
machines and nat gateways.

As I understand it even CDN's that anycast TCP (externally or internally 
[load balancing via routers and multi path]) do similar by spreading 
load over multiple IP's at the DNS layer first.

As the transition to IPv6 happens you may have it easier as getting a 
large enough allocation to allow for splitting it out into multiple 
subnets advertised from different locations without providers dropping 
the route as too long a prefix is much easier on the v6 side, so you 
could give one /36 or /40 or even /44 out to whitelist but have /48's at 
each location.  For sites with ipv6 support that may help now, but it 
won't help all sites for quite some time, though the number that support 
v6 is slowly getting better.  For the foreseeable future you still need 
to handle the v4 side one way or another though.

On 7/28/2021 10:21 AM, William Herrin wrote:
> On Wed, Jul 28, 2021 at 6:04 AM Vimal <j.vimal at gmail.com> wrote:
>> My intention is to run a web-crawling service on a public cloud. This service
>> is geographically distributed, and therefore will run in multiple regions
>> around the world inside AWS... this means there will be multiple AWS VPCs,
>> each with their own NAT gateway, and traffic destined to websites
>> that we crawl will appear to come from this NAT gateway's IP address.
> 
> Hello,
> 
> AWS does not provide the ability to attach anycasted IP addresses to a
> NAT gateway, regardless of whether it would work, so that's the end of
> your quest.
> 
>> The reason I want a predictable IP is to communicate this IP to website
>> owners so they can allow access from these IPs into their networks.
>> I chose IP as an example; it can also be a subnet, but what I don't want to
>> provide is a list of 100 different IP addresses without any predictability.
> 
> If you bring your own IP addresses, you can attach a separate /24s of
> them to your VPCs in each region, providing you with a single
> predictable range of source addresses. You will find it difficult and
> expensive to acquire that many IP addresses from the regional
> registries for the purpose you describe.
> 
> 
> Silly question but: for a web crawler, why do you care whether it has
> the limited geographically distribution that a cloud service provides?
> It's a parallel batch task. It doesn't exactly matter whether you have
> minimum latency.
> 
> Regards,
> Bill Herrin
> 
> 
> 


More information about the NANOG mailing list