How to Clone Your Medium Posts Using a Custom Gatsby Plugin
How to Clone Your Medium Posts Using a Custom Gatsby Plugin
A detailed guide on exporting Medium posts via Medium backup and republishing them on your Gatsby website.
Motivation
There’s a good ecosystem of plugins available on Gatsby’s website, and several of them address the functionality of getting your Medium posts to your Gatsby website. However, almost none of them allow you to copy the entire list of posts from your Medium account.
We will analyze existing solutions, and we will introduce a new plugin that does it efficiently.
Outline
- Existing approaches
- Pros & Cons of existing plugins
- Medium API and its state in 2021
- The workaround to get backup of your Medium posts
- Building new plugin to import Medium backup
- Building a basic Gatsby website to import generated markdown
- Publishing the plugin
- Next Steps
Existing approaches
Normally, you would start with Medium’s Developers portal to get some information. That will lead you to API documentation on GitHub.
Eventually, you will find out that Medium supports RSS feeds.
From the above page, you can even find a WebSub (formerly PubSubHubbub) API that Medium supports — http://medium.superfeedr.com/.
There’s also an unofficial Medium API — https://medium.com/@eugenehauptmann?format=json&limit=100
Here’s an example output in JSONP format. And another Python implementation https://github.com/enginebai/PyMedium
Another approach would be to just scrape posts written by Medium’s author using something like Puppeteer.
And the last one, which will be the focus of this article — exporting all your data from Medium.
To recap, here are the existing approaches:
- Official Medium API — https://github.com/Medium/medium-api-docs
- Official Medium RSS Feed — https://help.medium.com/hc/en-us/articles/214874118-Using-RSS-feeds-of-profiles-publications-and-topics
- Official Medium WebSub API — http://medium.superfeedr.com/
- Unofficial Medium JSONP API (https://github.com/enginebai/PyMedium)
- Scrapping pages manually
- Download your Medium data — https://help.medium.com/hc/en-us/articles/115004745787-Download-your-information
Pros & Cons of existing plugins
Unfortunately, most of the approaches above are flawed. Here’s why:
-
Official Medium’s API doesn’t let you pull
Posts
only create them. -
RSS API returns only the most recent posts and gives you only a gist for each post, not a full copy of it.
-
WebSub API also gives you updates on the newly added Posts and not previously published ones.
-
Scrapping would hit the same roadblock with Web Application Firewall (WAF).
-
Unofficial JSONP API is much more extensive than official APIs but it’s placed behind Cloudflare WAF, and scrapping a website behind WAF is a pain.
Ping us if you need help setting up your cloud infrastructure and WAFs. -
And downloading your Medium backup solves the main problem of getting all copies of the posts, but it’s semi-manual.
Medium API and its state in 2021
The Medium API team started really well in 2015 by opening up their third-party application process to everyone.
Unfortunately, they stopped allowing new applications in early 2019, without prior announcements. Those who were lucky enough to register beforehand can still continue using their apps and Official Medium API, which is only good for creating Posts, not downloading them.
The workaround to get backup of your Medium’s posts
Building new plugin to import Medium backup
Gatsby is a modern web framework for blazing-fast websites. — https://github.com/gatsbyjs/gatsby
We have been using GatsbyJS at our company for the past 4 years, and it ended up being the default framework we train our Junior Engineers to work with.
Gatsby has great documentation and a very detailed tutorial on how to create a new plugin from scratch. You can find the source code of the plugin we built on GitHub.
Most of the business logic calls are located in gatsby-node.js
with the entry point in [onPreInit](https://www.gatsbyjs.com/docs/reference/config-files/gatsby-node/#onPreInit)
. Gatsby’s architecture allows you to run all necessary business logic via a variety of plugins during build time.
To make this plugin work we will need some dependencies:
- Unzipper — a streaming library to access files inside of the zip archive.
- Turndown — a tool to convert HTML to Markdown.
- Cheerio — a JQuery like HTML selector and parser.
Building basic Gatsby website to import generated markdown
Use any Gatsby starter to initialize your website, the configuration of the plugin should be done this way: