How To Install Google Chrome, Selenium & Chromedriver For AWS EC2 Instances

Cover Image for How To Install Google Chrome, Selenium & Chromedriver For AWS EC2 Instances
James Phoenix
James Phoenix

If you’re looking to use selenium and headless browsers on amazon web services (AWS) its essential that you install the relevant versions of selenium, ChromeDriver and Google Chrome to your EC2 instance. In this guide you’ll learn how to easily deploy and test a fully functional selenium python environment.


Install ChromeDriver

You will need to install ChromeDriver which allows you programmatic access to google chrome via the Webdriver API protocol.

cd /tmp/
sudo wget https://chromedriver.storage.googleapis.com/80.0.3987.106/chromedriver_linux64.zip
sudo unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/bin/chromedriver
chromedriver --version
  1. Navigating to the /tmp/ folder.
  2. Using wget to download chromedriver.
  3. Unzipping chromedriver.
  4. Moving chromedriver to the usr/bin folder.
  5. Inspecting the current version of chromedriver.

Install Google Chrome

Then you’ll need to download and install the Google Chrome binary for your EC2 instance.

sudo curl https://intoli.com/install-google-chrome.sh | bash
sudo mv /usr/bin/google-chrome-stable /usr/bin/google-chrome
google-chrome --version && which google-chrome
  1. Curl downloads the google-chrome binary.
  2. The google-chrome binary is then moved into usr/bin (by default linux expects your google chrome executable be within this directory.
  3. The google-chrome –version & which google-chrome commands allow you to view whether the installation was successful and the current version of chrome. Its’ worth double checking that your chromedriver and google chrome versions are the same version.

Installing Selenium

Now install selenium for python 3x with the following command:

pip3 install selenium --user

If you still need to setup pip3 on your EC2 instance, you can visit this post here.


Adding Arguments To Your Selenium ChromeDriver

From my initial testing, I would recommend adding all of the following arguments when initialising selenium in a production environment:

from selenium.webdriver.chrome.options import Options
from selenium import webdriver

options = Options()
options.add_argument("--headless")
options.add_argument("window-size=1400,1500")
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument("start-maximized")
options.add_argument("enable-automation")
options.add_argument("--disable-infobars")
options.add_argument("--disable-dev-shm-usage")

driver = webdriver.Chrome(options=options)

Test Your Connection

I’ve created a python script that you can use to make sure everything is working correctly. Simply SSH into your EC2 instance, enter a python3 environment and run the following script:

from selenium.webdriver.chrome.options import Options
from selenium import webdriver

url = 'https://github.com/'

options = Options()
options.add_argument("--headless")
options.add_argument("window-size=1400,1500")

driver = webdriver.Chrome(options=options)

# Navigate to github.com
driver.get(url)

# Extract the top heading from github.com
text = driver.find_element_by_class_name('h000-mktg').text

print(text)

If the test was successful, you should have printed the top headline from github.com which at this current time says: Built for developers

Taggedtutorial


More Stories

Cover Image for What Are Webhooks? And How Do They Relate to Data Engineering?

What Are Webhooks? And How Do They Relate to Data Engineering?

Webhooks are a simple and powerful method for receiving real-time notifications when certain events occur. They enable a whole host of automated and interconnected applications. Broadly speaking, your apps can communicate via two main ways: polling and webhooks.  Polling is like going to a shop and asking for pizza – you have to ask whenever you want it. Webhooks are almost the…

James Phoenix
James Phoenix
Cover Image for What is an API? And How Do They Relate to Data Engineering?

What is an API? And How Do They Relate to Data Engineering?

An API, or Application Programming Interface, is a set of rules and protocols that allow different software systems to communicate with each other. Put simply, APIs define how software components should interact and allow developers to create new applications by leveraging existing functionality from other systems. Research shows that business investment in API has boomed in recent years, and significant API investment…

James Phoenix
James Phoenix