Build a dynamic sitemap.xml for your Next.js app

One of the more challenging aspects of running a SPA, (single-page application) website is making it more visible to search engines due to their reliance on javascript to present content in the browser. If you look at the flowchart below, you'll see why it can take longer for a SPA-style site to get pages crawled and listed.

If the crawler reaches your page and sees no immediate HTML content, it will send the page to the render queue, meaning it will need to spend additional time and resources to render your page using javascript. Meanwhile, other SSR, (server side rendering), sites generate pages that don't require javascript to render content, get indexed directly after being crawled, so they have an SEO advantage. The good news is there have been advancements made in JS platforms to deal with problem, the most popular examples are Next.js and Gatsby. They can both provide fully rendered content to the browser through hybrid or static rendering. These rendering strategies will ensure the content is picked up by the crawlers and more likely to be indexed right away.

Despite these new amazing platforms, there is still a major SEO gap in my opinion. The one sure fire way to make Google notice your lonely little pages is to present them in a lovely sitemap.xml file. This is one of the first things Mr. Google Spider looks for when searching your website for content. If there is no sitemap, it relies solely on HTML href links to hop from one page to the next, and this doesn't account for dynamically generated pages that you'd have if you power your website with a CMS or have other api-driven content.

Creating the sitemap isn't that difficult, there are plenty of good examples to follow, and if you have a small amount of pages you can probably create it by hand. However, if you use a CMS to help generate content, you might forget to update the sitemap every time you post something new. And if you already have a lot of content, creating the map by hand might now be too practical.

In this tutorial I'm going to help you create a sitemap using Next.js and Storyblok, but you could use the code examples and apply it to whichever coding platform you use.

Initial Setup

If you're brand-new to Next.js, you might want to pause this tutorial and walk through their online tutorial to get started. It steps through some of the fundamental building blocks and core Next.js concepts. If you already have a project and want to get right into building the sitemap.xml, read ahead.

The easiest way to get started with Next.js is to run the following in your console:

npx create-next-app@latest

This will automatically setup a Next.js project in whichever folder you run this from.

Once the packages have finished downloading, you should be able to run your new Next.js app with the following:

npm run dev

If everything is good you should see this in your browser.

Sitemap Route Creation

The idea with our sitemap is we'll create a new route in our app that will serve up the sitemap when google hits it. In Next.js, this is really easy. We'll be creating a new file in your pages folder and name it sitemap.xml.js. Here is where we'll be doing the bulk of the work. The core Next.js feature we're going to be using here is getServerSideProps. This will use node.js to prefetch all the data needed to generate the sitemap before the page is generated. The reason we'll use server-side here is because it will be able to fetch data from your content api to ensure it has all the latest content when Google comes a knockin'.

In your page, enter the following:

const SITE_URL = process.env.NEXT_PUBLIC_SITE_URL || "http://localhost:3000";

function generateSiteMap(posts) {
  return `<?xml version="1.0" encoding="UTF-8"?>
   <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
     <url>
       <loc>${SITE_URL}</loc>
     </url>
   </urlset>
 `;
}

So what is happening here? The const SITE_URL is using an environment variable to get the full site URL which will eventually reference the production URL. When developing, or if no environment variable is set, it will default to localhost.

The next function is the beginning of our xml file string. We will start with something basic and just insert the static pages we already know about. The first sitemap location we added is just the root of the site <loc>${SITE_URL}</loc>, using the SITE_URL we created above.

Below this function, we'll add an empty default component function:

export default function SiteMap() {
  // getServerSideProps will do the heavy lifting
}

Essentially the component doesn't require any internal logic , it will rely solely on the getServerSideProps feature we'll create in the next code block.

Hook Up API

In this tutorial we'll connect with Storyblok and axios to fetch some content from its api. To make it easy to talk to the api, use npm to install the storyblok js client.

npm i storyblok-js-client axios

You can use any api content, just keep in mind the shape of our content might be different and you'll need to adapt some of the final code to make it work.

You need to create a new folder in the root directory called lib. Inside, create a new file called posts.js.

Add the following:

import StoryblokClient from 'storyblok-js-client'

const Storyblok = new StoryblokClient({
  accessToken: process.env.NEXT_PUBLIC_STORYBLOK_TOKEN,
  cache: {
    clear: 'auto',
    type: 'memory',
  },
})

export const getPostLinks = async () => {
  const result = await Storyblok.get('cdn/links', {
    starts_with: 'posts',
  })
  return result
}

This will create a new Storyblok instance, provide it with a new .env variable called NEXT_PUBLIC_STORYBLOK_TOKEN that will contain your Storyblok api token, and use automatic cache clearing. You can either create a new .env.local file and add the variable to it, or you can export the variable to your console with:

export NEXT_PUBLIC_STORYBLOK_TOKEN=your-storyblok-api-token-here

The next function getPostLinks will use the /links Storyblok endpoint which will send minimal amount of data that can be used to generate a sitemap from. We're going to be pulling the slug property from the results.

Connect the Dots

Let's finish thing off now in your siltemap.xml.js file. Now that we have the means to fetch api data, let's use the data to complete the xml formatted string we started earlier. Right below the SiteMap default function, add a new function called getServerSideProps

export async function getServerSideProps({ res }) {
  const request = await getPostLinks();
  const links = request.data.links;
}

Here we're making an async request to getPostLinks and saving the links portion of the request to our const links

Next we're going to iterate over the returned data and build the rest of our xml string. The /links endpoint returns an object with the id of the post as keys. The data shape will look something like:

{
    "links": {
        "11a3e2ae-b339-4c59-b61e-503546858814": {
            "id": 105955658,
            "slug": "posts/why-i-switched-from-create-react-app-to-next-js",
            "name": "Why I Switched from Create React App to Next.js",
            "is_folder": false,
            "parent_id": 99512295,
            "published": true,
            "position": -70,
            "uuid": "11a3e2ae-b339-4c59-b61e-503546858814",
            "is_startpage": false,
            "real_path": "/posts/why-i-switched-from-create-react-app-to-next-js"
        }
    }
}

So we'll use Object.keys().forEach() to walk the returned data and pull out the information we want to use in our xml.

export async function getServerSideProps({ res }) {
  const request = await getPostLinks();
  const links = request.data.links;
  const paths = [];
  Object.keys(links).forEach((linkKey) => {
    if (links[linkKey].is_folder) {
      return;
    }
    const slug = links[linkKey].slug;
    paths.push({ slug });
  });
  
  const sitemap = generateSiteMap(paths);
}

There is a condition that prevents objects with the is_folder property from appearing in our xml, those do not map to actual URLs in our example so we don't want it in our sitemap.

Now we'll need to go back and modify our generateSiteMap function, to accept the paths data and complete the xml string.

function generateSiteMap(paths) {
  return `<?xml version="1.0" encoding="UTF-8"?>
   <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
     <url>
       <loc>${SITE_URL}</loc>
     </url>
     ${paths
       .map(({ slug }) => {
         return `
       <url>
           <loc>${`${SITE_URL}/${slug}`}</loc>
       </url>
     `;
       })
       .join("")}
   </urlset>
 `;
}

We'll use paths.map() with some object destructuring to pull out the slug property we're after. The map function will output another xml location with the main URL path followed by the path from the slug. The final join() function will convert our array to a string.

Finally, let's finish off our getServerSideProps function with the following:

export async function getServerSideProps({ res }) {
  const request = await getPostLinks();
  const links = request.data.links;
  const paths = [];
  Object.keys(links).forEach((linkKey) => {
    if (links[linkKey].is_folder) {
      return;
    }
    const slug = links[linkKey].slug;
    paths.push({ slug });
  });

  const sitemap = generateSiteMap(paths);

  res.setHeader("Content-Type", "text/xml");
  res.write(sitemap);
  res.end();

  return {
    props: {},
  };
}

What this does is utilizes the node server to output the xml string we created directly to the browser. So now, whenever Google scrapes your sitemap.xml, it will get the most up-to-date map automatically with no extra work on your part. If you point your browser to localhost:3000 you should see something like the following:

That just about wraps up this tutorial. I hope you found it useful. You might want to use Google Search Console to verify it can read your sitemap once you have this up on your production site at a URL google can reach. With an automated way of generating your sitemap, you'll be well on your way to greater visibility in Google searches. Happy coding.