AWS S3 DNS load balancer

Deepak Jain deepak at ai.net
Tue Jun 15 16:38:03 UTC 2021




I've just taken a squiz at an S3-based website we have, and via the S3 URL it is a CNAME with a 60-secod TTL pointing at a set of A records with 5-second TTLs.

Any one dig returns the CNAME and a single IP address:

dig our-domain.s3-website-ap-southeast-2.amazonaws.com.
our-domain.s3-website-ap-southeast-2.amazonaws.com.	14 IN CNAME s3-
website-ap-southeast-2.amazonaws.com.
s3-website-ap-southeast-2.amazonaws.com. 5 IN A	52.95.134.145

If the query is multiply repeated, the returned IP address changes, roughly every five seconds.

What's interesting is the name attached to the A records, which does not include "our-domain". It seems to be a record pointing to ALL S3 websites in the region. And all of the addresses I saw reverse-resolve to that one name. So there is definitely some under-the-bonnet magic discrimination going on.

In Route53 the picture is very different, with the published website host name (think "our-domain.com.au") resolving to four IP addresses that are all returned in the response to a single dig query. There is an A-ALIAS (a non-standard AWS record type) that points to a CloudFront distribution that has the relevant S3 bucket as its origin.

Using the CNAME bypasses the CloudFront distribution unless steps are taken to forbid direct access to the bucket. It would be usual to use (and enforce) access via CloudFront, if for no other reason than to provide for HTTPS access. 

---

So, depending on what query you make... you get very different answers. For example. If you try s3.amazon.com you get a CNAME to a rewrite.amazon.com which seems reasonable for any subdomain request that they would have a better response for. 

I don't remember, and they may be moving to deterministic subdomains as you've shown above, and only "legacy" uses go to s3.amazonaws.com. I remember hearing a big uproar about it. Perhaps an AWS person will chime in with some color on this.

So deterministic subdomain to a group of relatively deterministic endpoints, even round-robin, makes sense to me as in... "usual in the practice of the art." Even if those systems end up being load balancers for other systems behind them.

The s3.amazonaws.com is different than that. I'm guessing that no one (else) uses this sort of single IP from a pool trick and therefore it's not standard. Further, given that AWS appears to be moving *back* to the traditional way of doing things, there must be undesirable limitations to this model.

[just spitballing here]

Deepak


More information about the NANOG mailing list