Using Cloudfront, S3, & Route 53 for hosting a web site

2019-07-19

This article was originally published on dev.to.

In the past few months, I’ve retired our nginx server and replaced it with Cloudfront, s3, Route 53, and amazon issued certificates to simplify hosting www.inbot.io. This has been a bit more painful than I liked and there have been a few issues. This is mainly because the AWS documentation sucks and is fragmented over different products and generally represents a maze of misdirection.

Anyone from Amazon reading this, a helpful guid for somebody looking to combine your products for a basic use case like “Given a completely bog standard one page javascript website, this is how you host it on AWS” would be extremely helpful. Instead you have bits and pieces of documentation for each of your products leaving all the other bits and pieces as an exercise to the reader. I had to refer to Stackoverflow because every documentation page you land to seems to be lacking some crucial details. I wasted some serious amounts of time figuring out this most basic of uses for these products.

The good news is that it does work once you figure out all the workarounds in each of their products. The benefit is that it simplifies your infrastructure for a simple one page app. We have no self hosted bits and pieces. Additionally, a CDN ensures that your users have a good experience downloading your website from a fast CDN instead of hitting your poor web server on the wrong side of the globe. The bad news is that the AWS UI is a bit lacking in usability and flexibility and doing some common things that you would do in e.g. nginx are not that easy.

So, to avoid me having to google this together again, I’m documenting what I had to do to make this work. Also, others may find this useful.

Basic setup

First, thanks to Tiberiu Oprea for this extremely helpful overview. This will get you pretty far. I’m not going to repeat everything that he says there and will instead focus some extra stuff I had to figure out separately. AWS people reading this, use this website as a reference on how to properly document your product.

Assuming you followed his instructions to the letter, you would end up with a Cloudfront + s3 setup (just replace inbot.io with your own domain):

  • https certificates for inbot.io and www.inbot.io in the AWS certificate manager. One gotcha here is that it takes ages for Cloudfront to actually see those. Allow a few hours for this to be picked up. If you get it wrong, delete and try again. I lost half a day over this where I thought I was doing it wrong when in fact Cloudfront just is a bit stupid when it comes to actually asking for the current list of certificates. Eventually they showed up in the relevant drop down. Give it some time.
  • An s3 bucket with the domain www.inbot.io.s3-website-eu-west-1.amazonaws.com. This is where you deploy your static content. I suggest using AWS cli for this.
  • Another s3 bucket with the domain inbot.io.s3-website-eu-west-1.amazonaws.com. This bucket is configured to redirect to www.inbot.io. It has no content.
  • Two matching Cloudfront setups for both buckets. Make sure you redirect http to https in Cloudfront.
  • Two A records in Route 53 pointing for the domains with and without www at both Cloudfront setups.

Note that the bucket names are important for getting Route 53 to do the right things. So match the domain name in the bucket name and don’t get creative here.

You can verify that everything redirects as required with a few simple curl commands:

# Cloudfront redirects to https
curl -v http://inbot.io 
# Cloudfront hits the S3 bucket and S3 redirects to https://www.inbot.io
curl -v https://inbot.io

If all this works alright, your browser will redirect http://inbot.io/whatever/path to https://www.inbot.io/whatever/path via two permanent redirects. So users typing in inbot.io end up on your website instead of a blank page, or worse, an S3 403 XML page.

This seems like the AWS product people needs to get spanked a little. This is literally one of the most basic use cases a user of their products might have which is to host a simple website and it totally sucks. There’s literally no website in this world that would not want this out of the box without having redirects for https and handling domains with and without www. This should not require two separate buckets and Cloudfront setups. This is madness. But at least it works.

Updating content

One problem with Cloudfront is that TTLs are pretty long. This is helpful for caching things in a globally distributed CDN, but annoying when you need to fix a bug fix in your website and it takes hours or days for your users to actually get the fix. Addressing this requires some planning.

If you use something like Webpack, you should ensure the file names are hashed so this does not matter for most files. This leaves you one file that is still a problem: index.html. Since in our case this file is small, I ended up disabling TTL for this.

In our CI build we use AWS cli to interact with AWS. We use this command to upload our index.html like this:

aws s3 cp ./build/index.html ${S3_BUCKET} \
  --acl public-read \
  --cache-control max-age=0,no-cache,no-store,must-revalidate

S3 will respect these headers and start serving your new index.html as soon as you fix something. Cloudfront passes these headers to the browser as well.

For the other files, we use a reasonable TTL. We have a bunch of static files that rarely change and lots of webpack hashed artifacts that generate new files.

One gotcha here is to not use aws sync --delete. The problem here is that if you delete a file from s3 and the user still has an old index.html pointing to the old file hash, they will now run into 404s until they force reload the page in their browser. And of course assuming they even know how to do this; this poses some unique challenges on mobile.

Cloudfront invalidations

You can force Cloudfront invalidations when you deploy new content. The main file to invalidate is index.html. This ensures that Cloudfront updates the CDN nodes world wide within a few minutes instead of doing it much slower after the next user tries to load the file for the first time. So, within minutes after updating this any browser that reloads our page, will hit Cloudfront to get the latests because it forwards the max-age=0,no-cache,no-store,must-revalidate and then get the latest version.

aws Cloudfront create-invalidation \
  --distribution-id ${CLOUDFRONT_ID} \
  --paths / /index.html /app/index.html

Of course keeping your index.html small will be helpful since loads from s3 are going to be a bit slower than cache hits from Cloudfront. This is OK since the bulk of our content is in the javascript and other files.

Url rewrites using S3

We had a few url rewrites in our nginx. These caused us some headaches until I figured out how to use S3 Routing Rules. Part of the problem here is that we had some links that we distributed to users that broke without us knowing because things got lost during the redirects. For example, we have a reward program for referring new users. They click on a link that used to go to http://inbot.io/join/XYZ (we’ve since fixed that to at least go to https) where XYZ is a referral code that needs to be passed into the javascript that constructs the signup form on our website so it can be passed to our server. To redirect this, I added a rule to our S3 bucket’s static content properties:

<RoutingRules>
  <RoutingRule>
    <Condition>
      <KeyPrefixEquals>join/</KeyPrefixEquals>
    </Condition>
    <Redirect>
      <Protocol>https</Protocol>
      <HostName>www.inbot.io</HostName>
      <ReplaceKeyPrefixWith>app/index.html#/join/</ReplaceKeyPrefixWith>
    </Redirect>
  </RoutingRule>
</RoutingRules>

Note, the protocol and hostname are essential, otherwise S3 will happily redirect you to the bucket url without SSL.

You can also use RoutingRules to fix nicer http errors for e.g. 403s that S3 throws when it can’t find an object (how is that not a 404?). One of these days, I’ll probably invest some time in using AWS cli or Cloudfront to do this in one command but I ended up clicking all of this together in the AWS ui.

So this fixed our issues. Note we are redirecting to app/index.html#/join/XYZ. I decided to get rid of our subdomain for the web app and simply host everything under www.inbot.io. Inside our web app, we use # paths (aka anchors) and everything is handled by our javascript.

CORS header for our stellar.toml

Another issue we had that was that we have one file on our site that needs to have CORS headers set correctly. The file in question is the stellar.toml file that Stellar uses to figure out meta data about our cryptocurrency, the InToken. The how and why is not important but setting up CORS is a common requirement and something that is straightforward in nginx. This one had me pondering for a while. S3 does not provide a good solution for this. However, it turns out that in Cloudfront you can configure so-called cache behaviors for specific paths. In our case we added a behavior for /.well-known/stellar.toml, with the custom header. This location is where Stellar clients expect to find the meta data file. And since a lot of Stellar clients are browser applications, they need the CORS headers to be set correctly. If you want to read more about this, refer to the Stellar documentation.

Conclusion

This setup is working pretty OK. We have a simple CI job that does most of this. I did use the AWS UI to click together the buckets and Cloudfront setup. You could probably use e.g. Cloudformation or some aws cli incantations for this but since this is a one time thing, I did not bother to automate this. The flip-side of that is that it is kind of a long process with many steps and quite easy to mess up.

Also, I’m not completely happy about having to do a lot of things in the AWS UI (painfully unusable). However, I can’t bring myself to automating this hopefully one time setup. There are diminishing returns when it comes to automating this stuff.