The Comprehensive Guide To Automating Screaming Frog

James Phoenix
James Phoenix

Learning Outcomes

  • To learn how to run Screaming Frog using the command line for Mac & Windows.

Screaming Frog (SF) is a fantastic desktop crawler that’s available for Windows, Mac and Linux.

This tutorial is separated across multiple blog posts:

You’ll learn not only how to easily automate SF crawls, but also how to automatically wrangle the .csv data using Python.

Then we’ll create a data pipeline which will push all of the data into BigQuery to view it in Google Data Studio.


Finally, we’ll step up the automation and upload our scripts to a Google Cloud virtual machine:

  • The virtual machine will turn on every day at a specific time.
  • Several python scripts will automatically execute and will perform the following:
    • A list of domains from a .txt file / via environment variables will be sequentially crawled.
    • We’ll wrangle the data and save it to BigQuery.
  • Then the virtual machine will shut down after all of the domains have either completed or failed.
  • The daily data will then be available via Google Data Studio.

In this blog post, you’ll learn how to automate screaming frog with the command line!


The Command Line

Many daily acitivites such as opening/closing programs or requesting a web page can be completely automated via the command line.

If you’d like a detailed overview of the different types of commands you can use on your computer, I’d recommend viewing these guides:


  • Mac/Linux Udemy Course
  • Mac/Linux Cheatsheet
  • Windows Udemy Course
  • Windows Command Prompt Cheatsheet

Part 1 – Screaming Frog CLI

Mac Terminal + Screaming Frog

This part of the tutorial is only for Mac OSX users, therefore if you’re using Windows, visit the Windows section instead here.

Opening Terminal

Firstly you will need to open terminal which can be done by the following commands:

  • ⌘ Cmd + Space
  • Type terminal
  • Press enter


Useful Linux Commands:

Several useful commands include:

  • cd ~ (cd allows you to change directory)
  • pwd (pwd prints your current working directory)
  • mkdir folder (mkdir allows you to create folders)
  • clear (clear removes any previous text from your terminal)

How To Open Screaming Frog With The Terminal

Assuming that Screaming Frog is installed in the default location, you can run Screaming Frog with:


/Applications/Screaming Frog SEO Spider.app/Contents/MacOS/ScreamingFrogSEOSpiderLauncher

How To Create An Alias In Terminal

Now let’s create a shortcut for the command that we just ran, this is called an alias.

All of your alias’ need to be created inside of either:


~/.bash_profile (Older Mac Terminals)
~/.zshrc (Newer Mac Terminals)

NB: You can easily find out whether you’re on a new Mac terminal with:


which $SHELL

If it says /bin/zsh, then you will need to update the .zsrc file instead.

You can edit this file with either:

cd ~ && sudo nano .bash_profile (Older Mac Terminals)
cd ~ && sudo nano .zshrc (Newer Mac Terminals)

We’ll create an alias called sf that will automatically run the Screaming Frog Application.

Add the following to either your .bash_profile or .zsrc file:


alias sf="/Applications/Screaming Frog SEO Spider.app/Contents/MacOS/ScreamingFrogSEOSpiderLauncher"

Then hit:


CTRL + X (to save the file)
Enter

Now close your terminal and reload it using:

  • ⌘ Cmd + Space
  • Type terminal
  • Press enter

Now type:


sf

As you can see, we’ve now successfully created a shortcut for loading Screaming Frog.


How To See All Of The Commands:

You can easily get a list of all of the available commands with:


sf --help

How To Run A Crawl

If you want to open Screaming Frog and crawl a website use this:


sf --crawl 

For example if you wanted to crawl https://sempioneer.com:


sf --crawl https://sempioneer.com

You can use any URL or domain name that you’d like and the above commands will:

  • Open your Screaming Frog Application.
  • Crawl the desired domain.

How To Run Screaming Frog Headless (Without A Graphical User Interface)

It’s possible for us to execute Screaming Frog without a graphical user interface, by adding â€“headless:


sf --headless --crawl 

Additionally we can save the crawl by adding â€“save-crawl:


sf --headless --save-crawl --crawl  

NB: You will need to purchase a license for executing Screaming Frog with the –save-crawl functionality.

An example would be:


sf --headless --save-crawl --crawl https://phoenixandpartners.co.uk/

How To Export Data

Instead of saving a crawl, we’ll export the data to a specific folder by adding two extra arguments:


--output-folder (This argument allows us to specify a folder where you would like to export the crawl data).
--timestamped-output (This argument will save the file under a time-stamped folder and as every file is saved as crawl.seospider, adding a timestamp prevents a conflict or overwriting an existing file).

  1. Locate your username by typing pwd in Terminal and excluding the $. For example my username is: jamesaphoenix
  1. Go back to either your .bash_profile file or .zshrc file and create a new alias:
cd ~ && sudo nano .bash_profile (Older Mac Terminals)
cd ~ && sudo nano .zshrc (Newer Mac Terminals)

Then add the following alias to the bottom of your file:

alias sf-headless="/Applications/Screaming Frog SEO Spider.app/Contents/MacOS/ScreamingFrogSEOSpiderLauncher --headless --save-crawl --output-folder /users/{username}/desktop --timestamped-output --crawl"

Please remember to replace {username} with your true username!


Save your file and load up a new Terminal window and enter:


sf-headless example.org

You’ll hopefully have a time-stamped folder on your desktop and inside of that folder, you’ll see a file called crawl.seospider




How To Export A Single Tab

As well as doing a crawl, its possible to automatically extract the .csv files.

You can export tabs, which are these:

For example if we wanted to crawl the website and export a .csv file with all of the images without alt text, we would do the following:

sf --crawl  --output-folder /users/{username}/desktop/sf --export-tabs "Images:Missing Alt Text" --headless

The snytax for exporting from tabs follows a generic structure:


--export-tabs "tab-parent:tab-child"

How To Export Multiple Tabs

You can also export multiple files at once by simplying separating them by a comma:


"parent1:child1,parent2:child2,parent3:child3"

In order to see the parent:child relationships for the tabs, simply look at how they nested inside of the right panel of Screaming Frog:



Let’s simulataneously extract duplicated title tags, missing title tags and meta descriptions:

sf --crawl  --timestamped-output --output-folder /users/{username}/Desktop --export-tabs "Page Titles:Duplicate,Page Titles:Missing,Meta Description:Missing" --headless

For example my desired URL + username is phoenixandpartners.co.uk + jamesaphoenix:

sf --crawl phoenixandpartners.co.uk --timestamped-output --output-folder /users/jamesaphoenix/Desktop --export-tabs "Page Titles:Duplicate,Page Titles:Missing,Meta Description:Missing" --headless


How To Export Reports

Also you can export reports:

The syntax is similar and uses the parent:child structure, however if there is no child then only the parent name is required.

Here’s an example where only the parent level is required:

sf --crawl  --timestamped-output --output-folder /users/{username}/desktop --save-report "Redirect & Canonical Chains" --headless

Here’s an example where the parent:child structure is required:

sf --crawl  --timestamped-output --output-folder /users/{username}/desktop --save-report "Redirects:All Redirects" --headless


How To Perform Bulk Exports

We can also extract the bulk exports too!

An example where only a parent level is required:


sf --crawl  --timestamped-output --output-folder /users/{username}/Desktop --bulk-export "All Images" --headless

An example where the parent:child structure is required:


sf --crawl  --timestamped-output --output-folder /users/{username}/Desktop --bulk-export "AMP:All Inlinks" --headless

How To Create A Sitemap

If you’re using a content management system such as WordPress, then I’d recommend using a plugin such as Yoast / TheSEOFramework / RankMath to automatically build your sitemap.xml files.

However if you’re working with a headless CMS or a static website, you can automatically create sitemaps with Screaming Frog:


sf --crawl  --create-sitemap --output-folder /users/{username}/desktop --headless

How To Create Configuration Files

Configuration files allow you to tune the crawl speed, choose specific user agents, crawl or not crawl specific pages and many more features!

After changing the configuration inside of Screaming Frog, you can save it as a configuration file.

We can then apply that configuration file to a headless terminal screaming frog crawl via the terminal.


Create Your Config File:

First open up Screaming Frog and go to Configuration > Spider > Extraction > Structured Data:

Then tick the following checkboxes:

  • JSON-LD
  • Microdata
  • RDFa

Click OK.

Then you’ll need to save the configuration file by:

File > Configuration > Save As

I will choose to call my file custom_crawl.seospiderconfig

Make sure to save it under a new folder in your desktop called config


How To Crawl With A Config File

Let’s crawl the example site with our newly created configuration file:

sf --crawl  --config /users/{username}/desktop/config/{configname}.seospiderconfig --output-folder /users/{username}/desktop --save-report "Redirect & Canonical Chains" --headless

So in my example it would be:

sf --crawl https://phoenixandpartners.co.uk/ --config /users/jamesaphoenix/desktop/config/custom_crawl.seospiderconfig --output-folder /users/jamesaphoenix/desktop --save-report "Redirect & Canonical Chains" --headless

How To Crawl Text Files

It’s possible to run Screaming Frog in list mode via the terminal.

Simply create a .txt file with a list of URLs that you’d like to crawl.

These can be from a single website or many websites. Save this .txt to your desktop:

The extra argument used here is â€“crawl-list like so:

sf --export-tabs "Response Codes:Client Error (4xx)" --output-folder /users/{username}/desktop --headless --crawl-list /users/{username}/desktop/{filename}.txt

My example looks like this:

sf --export-tabs "Response Codes:Client Error (4xx)" --output-folder /users/jamesaphoenix/desktop --headless --crawl-list /users/jamesaphoenix/desktop/urlstocrawl.txt

We’ve finished the Mac section, I hope that this post provides you with a good overview on how to get started with automating Screaming Frog.

Automation is powerful and I encourage you to practice your new found super powers!

In the next post, you’ll learn how to automatically wrangle your Screaming Frog data with Python + Pandas..


Windows Command Prompt + Screaming Frog

This section of the post is for Windows Users, if you’re using a Mac, click here.


How To use The Command Prompt

Firstly type in your Windows search bar Command Prompt :

After opening your Command Prompt it should look similar to this:


Now that your command prompt is running type start . and hit enter


Creating Shortcuts In Windows

We’ll create shortcuts that you can run via command crompt to automate Screaming Frog!

Let’s store all of these shortcuts in a folder on our desktop.

Additionally we’ll create a shortcut that will navigate to this specific shortcuts folder!


  • Create a new folder on your desktop called screaming-frog-commands
  • Go to your desktop, right click and then select Shortcut.

This will open a new window:

Change the following command so that the {username} is replaced with your actual username:


"C:WindowsSystem32cmd.exe" /k cd "C:Users{username}screaming-frog-commands"

Then enter the command inside of the Type the location of the item , click next and save the shortcut as sf

An icon will have been saved onto your desktop.


After you click the icon, the shortcut that you entered above will be executed which will:

  • Open command prompt.
  • Navigate to the screaming-frog-commands folder on your desktop.


How To Open Screaming Frog

Next we need to figure out whether you’re using a 32bit or 64bit version of Windows.

Try to run the 32-bit version in Command Prompt:

cd "C:Program FilesScreaming Frog SEO Spider"

If you receive this message “The system cannot find the path specified”, then you’ll need to use the 64-bit command:


cd "C:Program Files (x86)Screaming Frog SEO Spider"

To open Screaming Frog from your current working directory, type ScreamingFrogSEOSpiderCli.exe

Hopefully you’ll now have just opened Screaming Frog from the command line 🥰!


Close your Command Prompt and open the sf shortcut that we created earlier on. Then open this directory in the file explorer with:


start .

From this folder, let’s create a new command line shortcut to speed up the process:


"C:WindowsSystem32cmd.exe" /k cd "C:Program Files (x86)Screaming Frog SEO Spider" & ScreamingFrogSEOSpiderCli.exe

NB: If you’re running on a 32-bit version of Windows, simply change

“C:Program Files (x86)Screaming Frog SEO Spider“ to “C:Program FilesScreaming Frog SEO Spider“

Name this shortcut open-sf

Then close the Command Prompt and File Exporer, and navigate to your desktop.


Run the sf shortcut and enter open-sf.link

This should’ve opened Screaming Frog.

Basically how this works is:

  1. When we open our sf shortcut, we navigate into the screaming-frog-commands folder.
  2. Then there is an open shortcut called open.lnk. We then ran this by entering its name open-sf.lnk

So far we have the following shortcuts:

  • “C:WindowsSystem32cmd.exe” /k (This opens Command Prompt)
  • cd “C:Program Files (x86)Screaming Frog SEO Spider” & ScreamingFrogSEOSpiderCli.exe (This navigates to a specific folder and executes the Screaming Frog application).

Notice that the & symbol, which ensures that the first command is executed, then the second command is executed afterwards inside of the Command Prompt.


How To Run A Crawl

Now close Screaming Frog and the Command Prompt. Re-run your sf shortcut. In the future sections we’ll be adding on more arguments to our shortcut (open.lnk) file:

Enter:

open-sf.lnk –-crawl 

For example if you wanted to crawl https://phoenixandpartners.co.uk/ then it would be:

open-sf.lnk --crawl https://phoenixandpartners.co.uk/

Let’s create another shortcut in the screaming-frog-commands folder and call it crawl:

"C:WindowsSystem32cmd.exe" /k cd "C:Program Files (x86)Screaming Frog SEO Spider" & ScreamingFrogSEOSpiderCli.exe --crawl

Also notice above how the last argument is â€“crawl , which means we will only need to pass a URL into this shortcut for it to successfully execute.

  • Close everything down.
  • Open your sf shortcut.
  • Then enter:
crawl.lnk https://example.org/

This will then crawl from the above URL all via the shortcut!


How To Run Screaming Frog Headless (Without A Graphical User Interface)

We are going to add several extra arguments to our existing crawl shortcut:

It’s possible for us to execute Screaming Frog without a graphical user interface (GUI), by adding the â€“headless argument:


1. Run the sf shortcut
2. crawl  --headless

Additionally we can save the crawl by adding â€“save-crawl:


1. Run the sf shortcut
2. Enter crawl  --headless --save-crawl

NB: You will need to purchase a license for executing Screaming Frog with the –save-crawl functionality.


How To Save A Crawl:

We can also save our folders to a specific folder with the â€“output-folder argument . Additionally we can make sure that the created folder has a unqiue name by adding the â€“timestamped-output argument.

Let’s see all of the commands in action without any shortcuts to easily see what’s happening:

Unleash Your Potential with AI-Powered Prompt Engineering!

Dive into our comprehensive Udemy course and learn to craft compelling, AI-optimized prompts. Boost your skills and open up new possibilities with ChatGPT and Prompt Engineering.

Embark on Your AI Journey Now!
"C:WindowsSystem32cmd.exe" /k cd "C:Program Files (x86)Screaming Frog SEO Spider" & ScreamingFrogSEOSpiderCli.exe --headless --save-crawl --output-folder "C:Users{username}Desktop" --timestamped-output --crawl

Then save this as a shortcut called save-screaming-frog-crawl


You can now easily access this by:


1. Run the sf shortcut
2. Enter save-screaming-frog-crawl 

Now that we’ve covered the basic crawling applications, let’s explore how to export tabs, reports and bulk exports!


How To Export A Single Tab

As well as doing a crawl, its possible to automatically extract the .csv files.

You can export tabs, which are the following:


For example if we wanted to crawl the website and export a .csv file with all of the images without alt text, we would do the following:

1. Open your sf shortcut, then enter:
2. crawl.lnk  --output-folder "C:Users{username}Desktop" --timestamped-output --export-tabs "Images:Missing Alt Text" --headless

The snytax for exporting from tabs follows a generic structure:


--export-tabs "tab-parent:tab-child"

Exporting Multiple Tabs

You can easily export multiple tabs by separating the multiple tabs with a comma. Let’s simulataneously extract duplicated title tags, missing title tags and meta descriptions:

--export-tabs "Page Titles:Duplicate,Page Titles:Missing,Meta Description:Missing"

1. Run your sf shortcut
2. crawl.lnk  --output-folder "C:Users{username}Desktop" --export-tabs "Page Titles:Duplicate,Page Titles:Missing,Meta Description:Missing" --headless

To see the parent:child relationships for the tabs, simply look at how they nested on the right panel of Screaming Frog:



For example my username is jamesaphoenix:

1. Run the sf shortcut.
2. crawl.lnk  --output-folder "C:UsersjamesaphoenixDesktop" --export-tabs "Page Titles:Duplicate,Page Titles:Missing,Meta Description:Missing" --headless

How To Export Reports

Also you can export reports:

The syntax is similar and uses the parent:child structure, however if there is no child then only the parent name is required.


Here’s an example where only the parent level is required:


1. Run your sf shortcut
2. crawl.lnk  --timestamped-output --output-folder /--output-folder "C:Users{username}Desktop" --save-report "Redirect & Canonical Chains" --headless

Here’s an example where the parent:child structure is required:


1. Run your sf shortcut
2. crawl.lnk  --timestamped-output --output-folder "C:Users{username}Desktop" --save-report "Redirects:All Redirects" --headless

How To Perform Bulk Exports

We can also extract the bulk exports too!

An example where only a parent level is required:


1. Run your sf shortcut
2. crawl.lnk   --timestamped-output --output-folder "C:Users{username}Desktop" --bulk-export "All Images" --headless

An example where the parent:child structure is required:


1. Run your sf shortcut
2. crawl.lnk  --timestamped-output --output-folder "C:Users{username}Desktop"--bulk-export "AMP:All Inlinks" --headless

How To Create A Sitemap

If you’re using a content management system such as WordPress, then I’d recommend using a plugin such as Yoast/TheSEOFramework/RankMath to automatically build your sitemap.xml files.

However if you’re working with a headless CMS or a static website, you can automatically create sitemaps with Screaming Frog:


crawl.lnk  --output-folder "C:Users{username}Desktop" --headless --create-sitemap

How To Create Configuration Files

Configuration files allow you to tune the crawl speed, choose specific user agents, crawl or not crawl specific pages and many more features!

After changing the configuration inside of Screaming Frog, you can save it as a configuration file.

We can then apply that configuration file to a headless screaming frog crawl via the terminal.


Create Your Config File:

First open up Screaming Frog and go to Configuration > Spider > Extraction > Structured Data:

Then tick the following checkboxes:

  • JSON-LD
  • Microdata
  • RDFa


Click OK.

Then you’ll need to save the configuration file by:

File > Configuration > Save As

I will choose to call my file custom_crawl.seospiderconfig

Make sure to save it under a new folder in your desktop called config


How To Crawl With A Config File

Let’s crawl the example site with our newly created configuration file:


crawl.lnk  --config "C:Users{username}Desktop"{configname}.seospiderconfig --output-folder /"C:Users{username}Desktop" --save-report "Redirect & Canonical Chains" --headless

So in my example it would be:

crawl.lnk https://phoenixandpartners.co.uk/ --config "C:UsersjamesaphoenixDesktop"custom_crawl.seospiderconfig --output-folder "C:UsersjamesaphoenixDesktop" --save-report "Redirect & Canonical Chains" --headless

How To Crawl Text Files

It’s possible to run Screaming Frog in list mode via the terminal.

Simply create a .txt file with a list of URLs that you’d like to crawl. These can be from a single website or many websites.

Save this .txt to your desktop:

The extra argument used here is â€“crawl-list like so:

crawl.lnk --export-tabs "Response Codes:Client Error (4xx)" --output-folder "C:Users{username}Desktop" --headless --crawl-list "C:Users{username}Desktop{filename}.txt"

My example looks like this:

crawl.lnk --export-tabs "Response Codes:Client Error (4xx)" --output-folder "C:UsersjamesaphoenixDesktop" --headless --crawl-list "C:Users{username}Desktopurlstocrawl.txt

We’ve finished the Windows section, I hope that this post provides you with a good overview on how to get started with automating Screaming Frog.

Automation is powerful and I encourage you to practice your new found super powers!

In the next post, you’ll learn how to automatically wrangle your Screaming Frog data with Python + Pandas.

TaggedPython For SEO


More Stories

Cover Image for Soft Skills for Programmers: Why They Matter and How to Develop Them

Soft Skills for Programmers: Why They Matter and How to Develop Them

Overview You need a variety of soft skills in addition to technical skills to succeed in the technology sector. Soft skills are used by software professionals to collaborate with their peers effectively and profitably. Finding out more about soft skills and how they are used in the workplace will help you get ready for the job if you are interested in a…

James Phoenix
James Phoenix
Cover Image for What Are Webhooks? And How Do They Relate to Data Engineering?

What Are Webhooks? And How Do They Relate to Data Engineering?

Webhooks are a simple and powerful method for receiving real-time notifications when certain events occur. They enable a whole host of automated and interconnected applications. Broadly speaking, your apps can communicate via two main ways: polling and webhooks.  Polling is like going to a shop and asking for pizza – you have to ask whenever you want it. Webhooks are almost the…

James Phoenix
James Phoenix