Anycast but for egress
j.vimal at gmail.com
Thu Jul 29 16:37:16 UTC 2021
Great point. We don't need geo-diversity for websites with the IP address
issue, so we could design for that case specially on a one-off basis.
For throughput it shouldn't be an issue where we're located, but we often
find websites serving different content based on the source IP of the
traffic. So, having a presence closer to the user is useful. But then
again, this is a different concern that's orthogonal to the original
question, because geo-ip doesn't make much sense with an anycast IP. For
those websites that need a stable IP for NACLs *and* serve different
content based on source IP, we have to use the predictable 3-5 IPs per site
suggestion of yours.
On Wed, Jul 28, 2021 at 11:27 AM Glenn McGurrin via NANOG <nanog at nanog.org>
> I'd had a similar thought/question, though keeping the geo diversity,
> you manage the crawlers, and are making contact individually with these
> sites from what you have stated (and so don't need a one size fit's all
> list for public posting), so why not have a restricted subset of the
> crawlers handle sites with these issues (which subset may be unique per
> site, which makes maintaining even load balancing not overly complex
> /limiting, especially as you are using nat anyway, so multiple servers
> can be behind each ip and that number can vary). That let's you have
> geo diversity (or even multi cloud diversity) for every site, but each
> site that needs this IP whitelisting only needs 3-5 IP's at any site,
> but yet you can distribute load over a much larger overall set of
> machines and nat gateways.
> As I understand it even CDN's that anycast TCP (externally or internally
> [load balancing via routers and multi path]) do similar by spreading
> load over multiple IP's at the DNS layer first.
> As the transition to IPv6 happens you may have it easier as getting a
> large enough allocation to allow for splitting it out into multiple
> subnets advertised from different locations without providers dropping
> the route as too long a prefix is much easier on the v6 side, so you
> could give one /36 or /40 or even /44 out to whitelist but have /48's at
> each location. For sites with ipv6 support that may help now, but it
> won't help all sites for quite some time, though the number that support
> v6 is slowly getting better. For the foreseeable future you still need
> to handle the v4 side one way or another though.
> On 7/28/2021 10:21 AM, William Herrin wrote:
> > On Wed, Jul 28, 2021 at 6:04 AM Vimal <j.vimal at gmail.com> wrote:
> >> My intention is to run a web-crawling service on a public cloud. This
> >> is geographically distributed, and therefore will run in multiple
> >> around the world inside AWS... this means there will be multiple AWS
> >> each with their own NAT gateway, and traffic destined to websites
> >> that we crawl will appear to come from this NAT gateway's IP address.
> > Hello,
> > AWS does not provide the ability to attach anycasted IP addresses to a
> > NAT gateway, regardless of whether it would work, so that's the end of
> > your quest.
> >> The reason I want a predictable IP is to communicate this IP to website
> >> owners so they can allow access from these IPs into their networks.
> >> I chose IP as an example; it can also be a subnet, but what I don't
> want to
> >> provide is a list of 100 different IP addresses without any
> > If you bring your own IP addresses, you can attach a separate /24s of
> > them to your VPCs in each region, providing you with a single
> > predictable range of source addresses. You will find it difficult and
> > expensive to acquire that many IP addresses from the regional
> > registries for the purpose you describe.
> > Silly question but: for a web crawler, why do you care whether it has
> > the limited geographically distribution that a cloud service provides?
> > It's a parallel batch task. It doesn't exactly matter whether you have
> > minimum latency.
> > Regards,
> > Bill Herrin
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NANOG