Reactive Lions

How to Clone Your Medium Posts Using a Custom Gatsby Plugin


How to Clone Your Medium Posts Using a Custom Gatsby Plugin

A detailed guide on exporting Medium posts via Medium backup and republishing them on your Gatsby website.

How to clone your medium posts using custom Gatsby plugin
How to clone your Medium posts using a custom Gatsby plugin

Motivation

There’s a good ecosystem of plugins available on Gatsby’s website, and several of them address the functionality of getting your Medium posts to your Gatsby website. However, almost none of them allow you to copy the entire list of posts from your Medium account.

We will analyze existing solutions, and we will introduce a new plugin that does it efficiently.

Medium stopped developing their API back in 2019
Medium stopped developing their API back in 2019

Outline

  1. Existing approaches
  2. Pros & Cons of existing plugins
  3. Medium API and its state in 2021
  4. The workaround to get backup of your Medium posts
  5. Building new plugin to import Medium backup
  6. Building a basic Gatsby website to import generated markdown
  7. Publishing the plugin
  8. Next Steps

Existing approaches

Normally, you would start with Medium’s Developers portal to get some information. That will lead you to API documentation on GitHub.

Eventually, you will find out that Medium supports RSS feeds.

Example of the author’s feed https://medium.com/@eugenehauptmann/feed
Example of the author’s feed https://medium.com/@eugenehauptmann/feed

From the above page, you can even find a WebSub (formerly PubSubHubbub) API that Medium supports — http://medium.superfeedr.com/.

There’s also an unofficial Medium API — https://medium.com/@eugenehauptmann?format=json&limit=100

Here’s an example output in JSONP format. And another Python implementation https://github.com/enginebai/PyMedium

Another approach would be to just scrape posts written by Medium’s author using something like Puppeteer.

And the last one, which will be the focus of this article — exporting all your data from Medium.

To recap, here are the existing approaches:

  1. Official Medium API — https://github.com/Medium/medium-api-docs
  2. Official Medium RSS Feed — https://help.medium.com/hc/en-us/articles/214874118-Using-RSS-feeds-of-profiles-publications-and-topics
  3. Official Medium WebSub API — http://medium.superfeedr.com/
  4. Unofficial Medium JSONP API (https://github.com/enginebai/PyMedium)
  5. Scrapping pages manually
  6. Download your Medium data — https://help.medium.com/hc/en-us/articles/115004745787-Download-your-information

Pros & Cons of existing plugins

Unfortunately, most of the approaches above are flawed. Here’s why:

  1. Official Medium’s API doesn’t let you pull Posts only create them.

  2. RSS API returns only the most recent posts and gives you only a gist for each post, not a full copy of it.

  3. WebSub API also gives you updates on the newly added Posts and not previously published ones.

  4. Scrapping would hit the same roadblock with Web Application Firewall (WAF).

  5. Unofficial JSONP API is much more extensive than official APIs but it’s placed behind Cloudflare WAF, and scrapping a website behind WAF is a pain. 
    Ping us if you need help setting up your cloud infrastructure and WAFs.

  6. And downloading your Medium backup solves the main problem of getting all copies of the posts, but it’s semi-manual.

Medium API and its state in 2021

The Medium API team started really well in 2015 by opening up their third-party application process to everyone.

Unfortunately, they stopped allowing new applications in early 2019, without prior announcements. Those who were lucky enough to register beforehand can still continue using their apps and Official Medium API, which is only good for creating Posts, not downloading them.

The workaround to get backup of your Medium’s posts

Medium gives you the ability to export your personal data and stories as HTML files in a .zip archive.
Medium gives you the ability to export your personal data and posts as HTML files in a .zip archive.
Structure of Medium’s backup.
Structure of Medium’s backup.

Building new plugin to import Medium backup

Gatsby is a modern web framework for blazing-fast websites. — https://github.com/gatsbyjs/gatsby

We have been using GatsbyJS at our company for the past 4 years, and it ended up being the default framework we train our Junior Engineers to work with.

Gatsby has great documentation and a very detailed tutorial on how to create a new plugin from scratch. You can find the source code of the plugin we built on GitHub.

Most of the business logic calls are located in gatsby-node.js with the entry point in [onPreInit](https://www.gatsbyjs.com/docs/reference/config-files/gatsby-node/#onPreInit). Gatsby’s architecture allows you to run all necessary business logic via a variety of plugins during build time.

To make this plugin work we will need some dependencies:

  • Unzipper — a streaming library to access files inside of the zip archive.
  • Turndown — a tool to convert HTML to Markdown.
  • Cheerio — a JQuery like HTML selector and parser.
Plugin’s architecture
You can find the logic of the respective steps in the source code.
Plugin’s architecture and the source code.

Building basic Gatsby website to import generated markdown

Use any Gatsby starter to initialize your website, the configuration of the plugin should be done this way:

Make sure to define source with the absolute path to your zip archive you got from Medium, and define destination for the plugin to export markdown posts.

Note: gatsby-source-medium-backup should always be declared before gatsby-source-filesystem and gatsby-transformer-remark. Gatsby always executes plugins logic in series, so for other plugins to process markdown files, our plugin has to create them first.

Inside your starter, navigate to gatsby-node.js and declare createPages handler which will define how your pages are generated and what template is being used. If you want to change your template, point blogPostTemplate to your own template.

Here’s a well-written tutorial on how to use markdown pages in your Gatsby starter.

In case you want to change the location of your posts, change this line to:

path: `blog/` + node.frontmatter.slug

If you did everything well, and the plugin has generated markdown files, you’ll be able to see a list of the new posts created inside your Medium post.

cd example && gatsby develop
List of newly created posts
List of newly created posts

Publishing your plugin

Gatsby has an extensive tutorial on plugin development and publishing it here.

If you have done everything, and added the required keywords to your package.json and did npm publish to publish your plugin to the NPM registry, Gatsby’s plugin library will add your plugin within the next 12–48 hours.

Published gatsby-source-medium-backup plugin
Published gatsby-source-medium-backup plugin

Next Steps

We would love to hear your feedback here and learn more about how will you use this plugin in your project.

We are already working on the next version that will help to automate the backup generation and delivery of Medium exported zip files to the GitHub repository.

About the author

Eugene Hauptmann, CEO of Reactive Lions™
Eugene Hauptmann, CEO of Reactive Lions

Eugene is a faith-centric technologist, a serial entrepreneur, angel investor, advisor, and mentor.

He is the founder and CEO of REACTIVE LIONS INC. where he is implementing his vision of faith-driven entrepreneurship in the tech world. He is currently running a team of over 40 talented engineers across the US.

Eugene is an expert in building tech teams and he is a chief architect of scalable software products. His experience goes beyond B2B and B2C in multiple industries like Cyber Security, Deep Tech, FinTech, Media, AI, ML, Data platforms, Marketplaces, Wellness, Healthcare, Space, M&A, and more.

Contact us to learn how we can help your business build great tech.

More posts
So You Want to Build a Stock Trading App like Robinhood?

2021-06-08

Quick Links
For Startups
Learn how to overcome tech challanges
Subscribe
REACTIVE LIONS INC. © 2021