How To Install Screaming Frog In The Cloud – Remote Desktop Version

James Phoenix
James Phoenix

Learning Outcomes

  • Learn to how to setup a linux environment with remote desktop on Google Cloud Platform.
  • Learn how to connect into your linux environment with remote desktop.
  • Learn how to run a screaming frog crawl in the cloud.

As websites and web applications grow larger, often crawling it to investigate technical SEO issues can be too much for your local computer to handle. James Finlayson came up with a great idea.

“Why don’t we just put screaming frog on the cloud?”


This guide will help you easily setup Screaming Frog on a virtual machine instance on Google Cloud Platform. We will also setup a remote desktop and a graphical interface in order to see the Screaming Frog GUI.

Let’s get to it.


Setup A Google Cloud Project

Firstly, you’ll need to have registered for a Google Cloud Platform account, then create a Google Cloud Project. If you need help with the setup, you can refer to this guide.


Enable A Virtual Machine

Click on the hamburger menu in the top left, then find the Compute Engine and Click on VM Instances. If this is the first time that you’ve setup a compute instance for this Google Project, you’ll likely need to wait several minutes. Then click create!


Setting Up Your Virtual Machine

  • Click on the create button to make your virtual machine.
  • Also rename your machine to be called: screaming-frog-crawler

Then click the create button at the bottom of the screen.


SSH’ing Into Your Virtual Machine

After your virtual machine has finished setting up, click on the SSH button. This will create a browser SSH session with the virtual machine. It will take 1 – 2.5 minutes to connect so please be patient.


Upgrading & Installing On The Virtual Machine (VM)

Now that you are inside the VM, copy and paste the entire code block of commands in one go into your browser based terminal.

These scripts will perform the following actions:

  • Upgrading The System.
  • Installing Google Chrome & Package Dependencies.
  • Installing Screaming Frog.
  • Installing Ubuntu Desktop & The vncserver Package.

☕ Copy And Paste All Of These Scripts In One Go, Then Grab A Coffee! ☕

# Upgrading the system; \
sudo apt-get update; \
yes y | sudo apt-get upgrade; \
yes y | sudo apt-get install wget; \
\
# Installing Google Chrome & Package Dependencies; \
yes y | wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb; \ 
yes y | sudo apt --fix-broken install ./google-chrome-stable_current_amd64.deb; \
yes y | sudo apt-get install cabextract xfonts-utils; \ 
yes y | wget http://ftp.de.debian.org/debian/pool/contrib/m/msttcorefonts/ttf-mscorefonts-installer_3.6_all.deb; \ 
yes y | sudo dpkg -i ttf-mscorefonts-installer_3.6_all.deb; \
yes y | sudo apt-get install xdg-utils zenity libgconf-2-4 fonts-wqy-zenhei;  \
\
# Installing Screaming Frog; \
yes y | wget https://download.screamingfrog.co.uk/products/seo-spider/screamingfrogseospider_12.6_all.deb; \
yes y | sudo apt --fix-broken install ./screamingfrogseospider_12.6_all.deb; \
\
# Installing Ubuntu Desktop & The vncserver Package; \
yes y | sudo apt-get install gnome-panel gnome-settings-daemon metacity nautilus gnome-terminal; \
yes y | sudo apt --fix-broken install tightvncserver;

Note: Remember to hit enter on the last tightvncserver installation too!


Modify The Start Up Vncserver Script

In your browser SSH enter the following two commands: (You will need to enter a password and skip the view only password here)

  • vncserver && vncserver -kill :1

Basically what we’re doing with the above script is just creating a VNC server, then deleting it so that the start-up script is generated.

  • nano ~/.vnc/xstartup

We’ll create custom start-up script using the Nano text editor:

nano ~/.vnc/xstartup

Then add the lines at the bottom of your ~/.vnc/xstartup script, and hit CTRL + X to save it.

gnome-panel &
gnome-settings-daemon

Start The vncserver:

  • To start a VNC session, enter the command: vncserver
  • You will be asked to create a new vnc password.
  • I would recommend skipping the view only password.

Install Google Cloud Software Development Kit (SDK)


You will need to install the Google Cloud Software Development (SDK) Kit on your local machine. This provides you with a command line interface for accessing and controlling your Google Cloud Platform resources.

Make sure you have Python 3x installed on your computer. If you haven’t installed Python, follow this tutorial on how to install Python Anaconda.

Unleash Your Potential with AI-Powered Prompt Engineering!

Dive into our comprehensive Udemy course and learn to craft compelling, AI-optimized prompts. Boost your skills and open up new possibilities with ChatGPT and Prompt Engineering.

Embark on Your AI Journey Now!

Mac

  1. https://cloud.google.com/sdk/docs/quickstart-macos Download either the 32bit or 64bit version.
  2. Extract the archive to any location on your file system; preferably, your home directory. On macOS, this can be achieved by opening the downloaded .tar.gz archive file in the preferred location.

  1. Run the following shell script:./google-cloud-sdk/install.sh

  • Navigate to your terminal and then type:
cd ~ && nano .bash_profile

Then copy and paste following this code at the bottom and hit CTRL + X to and type y to save it.

export PATH=/Users/jamesaphoenix/.local/bin:$PATH

(You will need to replace jamesaphoenix with your current username, which you easily find by doing: pwd).

Note: If you’re using a newer Mac, then you’ll be using a zsh shell. So please also export the same PATH in the following file:

cd ~; nano .zshrc

Windows

  1. Download the Google Cloud SDK installer
  2. Launch the installer and follow the prompts.
  3. After installation has completed, the installer presents several options. Make sure that the following are selected:
    • Start Google Cloud SDK Shell
    • Run ‘gcloud init’

Now go to either your terminal or command prompt and type:

gcloud init
gcloud login

Create an SSH Tunnel + Remote Desktop Server On The Same Port

In order to access the remote desktop which is currently running a vncserver on port 5901 by default, we will need to create an SSH tunnel.

Make sure to gather all of the relevant information:

  • Google Cloud Platform Project ID
  • Zone (This is the instance that you registered the virtual machine in)
  • The Name of Your Instance (which we called screaming-frog-crawler)

gcloud compute ssh insertvirtualinstanncenamehere –project=insertprojectidhere –zone=insertzoneidhere –ssh-flag “-5901:localhost:5901″


For example my command looks like this:


gcloud compute ssh screaming-frog-crawler --project=savvy-motif-278215 --zone=us-central1-a --ssh-flag "-L 5901:localhost:5901"

Installing Remote Desktop Software On Your Local Computer (VNCViewer)

Now that you have the following two things running:

  • A vncserver on the virtual machine on port 5901
  • An SSH tunnel which is listening to all of the traffic on the virutal machine and forwarding it over to your localhost port 5901.

We can setup a remote desktop session.


  1. Download RealVNC’s Viewer
  2. After the installation has finished, click on File –> New connection. Then enter: 127.0.01:5901 as the VNC server name it appropriately.
  1. Then a window will hopefully open and you’ll be able to see your linux remote desktop.

Connect To Your Virtual Machine Via Remote Desktop

After connecting to your remote desktop, click on applications in the top left –> Internet, then you’ll see Screaming Frog!

Now you can run your Screaming Frog crawls in the Cloud instead of on your local computer!


Key Things To Remember:

  • Always stop your virtual machines when you’re finished with them as they cost money whilst you run them.
  • Always remember to check that your vncserver and SSH tunnel ports are exactly the same.
  • The vncserver password is limited to 8 characters, so if you entered in a lengthy password, please rememeber to make a note of the 8 character version.

Common Errors:

channel 3: open failed: connect failed: Connection refused :

- When you connect to port 8783 on your local system, that connection is tunneled through your ssh link to the ssh server on server.com. From there, the ssh server makes TCP connection to localhost port 8783 and relays data between the tunneled connection and the connection to target of the tunnel.

- The "connection refused" error is coming from the ssh server on server.com when it tries to make the TCP connection to the target of the tunnel. "Connection refused" means that a connection attempt was rejected. The simplest explanation for the rejection is that, on server.com, there's nothing listening for connections on localhost port 8783. In other words, the server software that you were trying to tunnel to isn't running, or else it is running but it's not listening on that port.

connection refused on the VNC remote viewer : You will need to double check that you’re SSH tunnel and vncserver are the same (i.e 5901 and 5901). If you launch multiple vncservers, then the port number will increase by 1 (i.e. 5901 –> 5902 –> 5903). Therefore double check that you’re only running on vncserver!


Reference Articles + Research:

  • https://www.vervesearch.com/blog/running-screaming-frog-on-googles-own-servers/
  • https://www.databulle.com/blog/seo/screaming-frog-headless.html

TaggedPython For SEO


More Stories

Cover Image for Soft Skills for Programmers: Why They Matter and How to Develop Them

Soft Skills for Programmers: Why They Matter and How to Develop Them

Overview You need a variety of soft skills in addition to technical skills to succeed in the technology sector. Soft skills are used by software professionals to collaborate with their peers effectively and profitably. Finding out more about soft skills and how they are used in the workplace will help you get ready for the job if you are interested in a…

James Phoenix
James Phoenix
Cover Image for What Are Webhooks? And How Do They Relate to Data Engineering?

What Are Webhooks? And How Do They Relate to Data Engineering?

Webhooks are a simple and powerful method for receiving real-time notifications when certain events occur. They enable a whole host of automated and interconnected applications. Broadly speaking, your apps can communicate via two main ways: polling and webhooks.  Polling is like going to a shop and asking for pizza – you have to ask whenever you want it. Webhooks are almost the…

James Phoenix
James Phoenix