Using AWS S3 as storage for Heroku app

I've been working hard on SEO improvements for my website. The obvious thing to do is ensure your sitemap.xml is up-to-date and being read by google. If you have a nifty SPA-style site that relies heavily on javascript, you might be wondering the best way to maintain the sitemap. Perhaps you're like me and have lots of dynamically generated pages, pages that fetch their information from an api, google will have a harder time finding them if they aren't present in your sitemap.

That got me thinking I should probably leverage my api and start experimenting with webhooks to keep the sitemap up to date whenever I publish or remove content. Before long I had a working node backend that could handle hook events coming from my Storyblok CMS, then automatically update my sitemap.xml file with everything published in the CMS. The bad news was when I tried to use this method in my Heroku production environment, it appeared I couldn't successfully update the sitemap.xml file. With a bit of research I found this interesting information directly from Heroku:

Heroku has an “ephemeral” hard drive, this means that you can write files to disk, but those files will not persist after the application is restarted.

Once I read that, I basically shut my computer and packed it in for the night, having burned quite a few hours of experimentation for nothing. Or so I thought. In the morning, I started poking around SO for ideas and read something that seemed plausible. The basic concept is as follows:

  1. Use Google Search Console to point to a path on your site domain it can access the sitemap

  2. Redirect the request to an Amazon S3 bucket which hosts your sitemap.xml

  3. When a webhook event is triggered, use your node backend to update the file in the S3 bucket

I'm happy to say that after working through the implementation, this approach is sound and I'd like to share what I learned along the way.

In this short tutorial, I'll walk you through the basics of the following:

  1. Create an S3 bucket in Amazon, and adding a bucket policy to only allow public read access

  2. Add an API credential for the bucket and add these to your Heroku Config Vars

  3. Use aws-sdk to easily send requests to your S3 bucket, using your config variables to create api requests

I won't go into the details of generating a sitemap.xml, or using webhooks, as those are pretty use-case specific and the implementation depends on a lot of things. But if you follow along, you'll have the information you'll need to work around the Heroku ephemeral file system problem and apply it to your specific use case.

With that said, let's get to it.

Create an S3 Bucket

If you don't already have one, you'll need to sign up for a free AWS account. Go ahead and do that before continuing.

Once you have confirmed your account, sign in and access the S3 Management. From here you're going to Create a New Bucket.

Give the bucket a name, then you'll want to likely keep the defaults provided. For my use case, I allowed public read-only access because I wanted to expose my sitemap.xml there. Depending on your needs, you may choose to block all public access, and only allow access through the api key.

In my case, I added the following bucket policy to allow public read-only access to my bucket:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicRead",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": "arn:aws:s3:::your-bucket-name/*"
        }
    ]
}

Create AWS Access Keys

In order to read/write to the S3 bucket from your node app, you'll need to create access keys in AWS. You can get there by clicking the account dropdown and then select Security Credentials.

You can either create keys on your root user account, or create a new user. Once the user is created, AWS will show you a screen where you can access the keys. Make sure you record the information because it will be the only time you can access it. If you lose the key info, you'll have to generate new keys.

Add Key Data to Heroku Config Variables

Now log in to your Heroku account and go to the app you plan to access S3 with. Under Settings, scroll down to Config Vars. Click Show Variables and add the AWS_ACCESS_KEY_ID and AWS_SECRET_KEY.

Now when you deploy your app to Heroku, you'll reference these variables instead of adding them statically to your app code and risking exposing the information to the Web. You will use the same config vars in your app by adding them to an .env file and using process.env.AWS_ACCESS_KEY_ID to reference them.

Access S3 Bucket from Your App

Now here comes the fun part, you will be using the aws-sdk library to read/write to the S3 bucket from your app. To start, you will need to import the library, we can use NPM for that:

npm i aws-sdk

Next, you'll want to import or require the package at the top of your node script.

const AWS = require('aws-sdk')

Following that I added a const s3 to store the S3 key info

const s3 = new AWS.S3({
  accessKeyId: process.env.AWS_ACCESS_KEY,
  secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
})

Later on in my script, I use the aws-sdk library to write a file to my S3 bucket

const params = {
  Bucket: 'chromaloop', // pass your bucket name
  Key: 'sitemap.xml', // file will be saved as chromaloop/sitemap.xml
  Body: sitemap,
}
s3.upload(params, function (s3Err, data) {
  if (s3Err) { 
    throw s3Err
  }
  console.log('sitemap: ', sitemap)
})

The above code is using a const sitemap xml-formatted string as the body in the s3.upload request. I won't go into the details of creating the xml file, but here's a pretty good tutorial about that if you're interested. Then, if there are no errors, it logs the sitemap data which I used mainly for testing purposes once I deployed the app to Heroku. You would probably want to remove the log once you confirmed it was working.

I was pretty pleased about how easy it was to use the aws-sdk library and glad to get a little experience working with AWS tools. I think using S3 for storage is a pretty good idea in general if you plan to host larger files on your site, you might want to consider it.

I hope you found this post useful, I'll be working on more code tutorials in the near future.