I made a script to Migrate posts from WordPress to Contentful

WordPress to Contentful Script

This blog has recently made the move from WordPress to being a headless site powered by Vue & Nuxt, using Contentful to deliver the content.

I wanted to see if I could automatically migrate my posts from WordPress to Contentful using their APIs, even though there aren't that many posts on this site, I could have easily manually moved them over, but where's the fun in that?

As it turns out, it worked very well. So in this post, I'm going to share with you how you can export your posts from WordPress into Contentful using Contentful's Content Management API and the WordPress REST API.

The full script and instructions will be supplied at the bottom, but it will need to be modified slightly to suit your specific Content Model. In this post, I'm simply supplying the script as-is and documenting the process I developed for your learning benefit.

That being said, let's get into it.

What we're doing:

The overview of what this script does is as follows:

  • Get all posts from WordPress
  • Get all images and featured media from WordPress
  • Get tags and category names from WordPress
  • Create new assets from WordPress media on Contentful
  • Converting WordPress posts from Rich Text to Markdown
  • Create new posts on Contentful
  • Match our new Contentful assets to our new Contentful Posts

Get posts, tags, categories and images from WordPress

The first thing that we'll be doing is using the Rest API to gather up our WordPress data. We'll be calling each endpoint and storing the data in a big object called apiData. I've also made a separate object called wpData, where each key in this object represents a different endpoint that we'll use to dynamically call, and eventually, we'll store our cherry-picked WordPress data here.

The below function essentially makes four requests to WordPress at https://website.com/wp-json/wp/v2/[KEY_NAME] and stores them in an apiData object.

I've added per_page=90 as a random number because it's way more than I needed. You may need to adjust.

let wpData = {
  'posts': [],
  'tags': [],
  'categories': [],
  'media': []
};

function migrateContent() {
  let promises = [];

  // Loop over our content types and create API endpoint URLs
  for (const [key, value] of Object.entries(wpData)) {
    let wpUrl = `${wpEndpoint}${key}?per_page=90`
    promises.push(wpUrl)
  }

  getAllData(promises)
    .then(response =>{
      apiData = response

      mapData();

    }).catch(error => {
      console.log(error)
    })
}

function getAllData(URLs){
  return Promise.all(URLs.map(fetchData));
}

function fetchData(URL) {
  return axios
    .get(URL)
    .then(function(response) {
      return {
        success: true,
        endpoint: '',
        data: response.data
      };
    })
    .catch(function(error) {
      return { success: false };
    });
}

Prepare Posts for Contentful

So now that all of our relevant WordPress data is stored in apiData and we're free to reference our data anywhere in the script, but we won't need all of the additional bloat that WordPress supplies.

At this point that you should have created your basic Content Model for your new post structure, because we're now going to cherry pick the data that we actually want from WordPress and map it to our new Content Model in Contentful.

for (let [key, postData] of Object.entries(apiPosts.data)) {
  let fieldData = {
    id: postData.id,
    type: postData.type,
    postTitle: postData.title.rendered,
    slug: postData.slug,
    content: postData.content.rendered,
    publishDate: postData.date_gmt + '+00:00',
    featuredImage: postData.featured_media,
    tags: getPostLabels(postData.tags, 'tags'),
    categories: getPostLabels(postData.categories, 'categories'),
    contentImages: getPostBodyImages(postData)
  }

  wpData.posts.push(fieldData)
}

The above is a snippet from the mapData() function, and is a loop of our WordPress posts. For each post, we're creating an empty object and essentially stripping the post to only include relevant data that we actually want in Contentful. The idea here is that each object key will match a field name in Contentful. For example - the post.title value will need to populate the new postTitle field in Contentful. We'll later match the keys used here to their Contentful fields in the API.

You'll notice additional functions within the loop. Remember how at the start the post, we made a list of all WordPress endpoints to call for additional data? Here, we getting that data. The WordPress /posts/ endpoint only supplies an ID for tag - so these additional functions will pluck relevant info from the /tag/ data we fetched earlier to create a nested objects for each item that requires additional data mapping.

I think that I could have used Object Map at this point to simply reduce the original object, rather than create a new one, but I wanted to keep the original WordPress data sample in-tact, in case we need it later.

Outside of API credentials, it's most likely possible to copy and paste this entire script and just modify the above function to your needs, and things should just work.

Creating assets and entries in Contentful

We must convert our WordPress images to Contentful assets before we add any posts, and that happens in the buildContentfulAssets() function. Each post is looped over and we grab every featured & inline image, format the details so that Contentful can interpret it, and then use a Promise to publish each asset:

function createContentfulAssets(environment, promises, assets) {
  return Promise.all(
    promises.map((asset, index) => new Promise(async resolve => {

      let newAsset
      setTimeout(() => {
        try {
          newAsset = environment.createAsset({
            fields: asset
          })
          .then((asset) => asset.processForAllLocales())
          .then((asset) => asset.publish())
          .then((asset) => {
            console.log(`Published Asset: ${asset.fields.file['en-GB'].fileName}`);
            assets.push({
              assetId: asset.sys.id,
              fileName: asset.fields.file['en-GB'].fileName
            })
          })
        } catch (error) {
          throw(Error(error))
        }

        resolve(newAsset)
      }, 1000 + (5000 * index));
    }))
  );
}

Asset creation takes 3 steps: creation, processing, and publishing. When testing this, there were instances where it would upload, but not process - I did a deeper dive into the createAsset documentation and found that processForAllLocales worked, but process didn't. Your mileage may vary. If you have multiple locales, I think you'll just need to duplicate the objects created in buildContentfulAssets() for each locale.

You may notice the awful setTimeout() in each Promise - I found out very quickly that the Content Management API has a rate limit of 10 requests per second and 36000 requests per hour, which became a frequent problem, and while there's undoubtedly prettier ways to do this, a delay of 1000 + (5000 * index)) per iterated promise seemed to allow for enough time for the API to breathe.

You might want to go get some coffee at this point, it'll take a while.

Mapping Contentful Assets to WordPress Posts

If all has gone well, our WordPress images will now be populated on Contentful as assets. Great! But we're not done yet - we're now going to prepare our WordPress posts.

First - we're going to swap out any references to WordPress images with references to Contentful images. To do this, we need to understand how WordPress references images. Our wpData object for each post will look similar to this:

{
  "postTitle": "My amazing blog post",
  "content": "<p>My great blog post also has an inline image:</p>\n<img class=\"wp-image-572 size-large\" src=\"../path/to/image.png\" alt=\"My alt text.\"  />\n<p>",
  "featuredImage": 545
}

WordPress posts will reference images via ID, rather than storing all information about the image within the post itself. So whenever WordPress needs an image, it uses the ID to call the media library and bring details over. We can do the same thing here by calling https://website.com/wp-json/wp/v2/media/[MEDIA_ID].

As Contentful uses Markdown to render posts, the above snippet will be parsed to something like this:

My great blog post also has an inline image: ![My alt text]('../path/to/image.png')

We're going to be matching WordPress images to their Contentful counterpart by comparing filenames in two separate objects.

As we've already downloaded our WordPress data, we can just reference wpData.media for any image details - so now, we just need our Contentful assets. I'm not sure why, but the Content Management Javascript API doesn't support downloading of all assets. Luckily, we can just use an axios request to call the API directly and store our assets in a global object.

axios.get(`https://api.contentful.com/spaces/${ctfData.spaceId}/environments/${ctfData.environment}/public/assets`,

This script will convert WordPress Rich Text posts into Markdown, and as the API will output a HTML string for post content, we can use the absolutely fantastic Turndown to do this easily.

It's worth noting at this point that I originally wanted to use Rich Text for post formatting in Contentful, but you can't feed the Contentful API a HTML string and have it parse it correctly, you instead have to break down each node and create an item in the tree for it - I started doing it, but it honestly wasn't worth it.

Contentful Rich Text Structure

I've left sample code in script, if you really want to use Rich text.

Anyway - we're in the middle of converting our WordPress media references to Contentful Asset references, we'll be using Turndowns flexible addRule method. Simply put: this allows us to define rules for the markdown converter to adhere to during conversion, and in our case, I've made a rule to check for any inline images, grab the image src and and find the relevant Contentful asset. It'll then output the image using markdown syntax.

turndownService.addRule('replaceWordPressImages', {
  filter: ['img'],
  replacement: function(content, node, options) {
    let assetUrl = contentfulData.assets.filter(asset => {
        let assertFileName = asset.split('/').pop()
        let nodeFileName = node.getAttribute('src').split('/').pop()

        if (assertFileName === nodeFileName) {
            return asset
        }
    })[0];

    return `![${node.getAttribute('alt')}](${assetUrl})`
  }
})

Publishing our new Contentful Posts

Our inline images are now replaced with Contentful assets, so we're nearly at the end. It's finally time for us to publish our posts. This part of the script uses the same methodology as the assets, where we create an array of promises, format the data in a way that Contentful can understand, then process and publish the new posts.

The createContentfulPosts() function will run some final mapping, including:

  • Converting HTML Strings to Markdown
  • Checking for a Featured Image, and assigning a Contentful asset
  • Removing any additional data-points the script uses for references.

Once the data is prepared, we can publish:

function createContentfulEntries(environment, promises) {
  return Promise.all(promises.map((post, index) => new Promise(async resolve => {

    let newPost

    console.log(`Attempting: ${post.slug['en-GB']}`)
  
    setTimeout(() => {
      try {
        newPost = environment.createEntry('blogPost', {
          fields: post
        })
        .then((entry) => entry.publish())
        .then((entry) => {
          console.log(`Success: ${entry.fields.slug['en-GB']}`)
        })
      } catch (error) {
        throw(Error(error))
      }

      resolve(newPost)
    }, 1000 + (5000 * index));
  })));
}

These may take some time to publish, so just let it do it's thing. The script will tell you once everything is done.

Wrapping everything up

So that's it. When you run the script, things will take some time, but should be quite automatic.

This script can of course be expanded to much more than core post data, including Custom Post Types, inclusion of metafield data etc. but that wasn't really necessary for my needs.

I think as well that, in theory, this doesn't really need to be "from WordPress", as long as your post data is in readable JSON format, this script will parse and process it in Contentful.

This was quite a fun little mini project and a cool dive into the Contentful API. I'm looking forward to utilising Contentful and headless in future projects.

If you've made it this far - thank you! Please let me know if you benefited from this script and star the repo if you'd like.

Link to the repo and full instructions on how to use: https://github.com/jonashcroft/wp-to-contentful