If you’re looking to use selenium and headless browsers on amazon web services (AWS) its essential that you install the relevant versions of selenium, ChromeDriver and Google Chrome to your EC2 instance. In this guide you’ll learn how to easily deploy and test a fully functional selenium python environment.
Install ChromeDriver
You will need to install ChromeDriver which allows you programmatic access to google chrome via the Webdriver API protocol.
cd /tmp/
sudo wget https://chromedriver.storage.googleapis.com/80.0.3987.106/chromedriver_linux64.zip
sudo unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/bin/chromedriver
chromedriver --version
- Navigating to the /tmp/ folder.
- Using wget to download chromedriver.
- Unzipping chromedriver.
- Moving chromedriver to the usr/bin folder.
- Inspecting the current version of chromedriver.
Install Google Chrome
Then you’ll need to download and install the Google Chrome binary for your EC2 instance.
sudo curl https://intoli.com/install-google-chrome.sh | bash
sudo mv /usr/bin/google-chrome-stable /usr/bin/google-chrome
google-chrome --version && which google-chrome
- Curl downloads the google-chrome binary.
- The google-chrome binary is then moved into usr/bin (by default linux expects your google chrome executable be within this directory.
- The google-chrome –version & which google-chrome commands allow you to view whether the installation was successful and the current version of chrome. Its’ worth double checking that your chromedriver and google chrome versions are the same version.
Installing Selenium
Now install selenium for python 3x with the following command:
pip3 install selenium --user
If you still need to setup pip3 on your EC2 instance, you can visit this post here.
Adding Arguments To Your Selenium ChromeDriver
From my initial testing, I would recommend adding all of the following arguments when initialising selenium in a production environment:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
options = Options()
options.add_argument("--headless")
options.add_argument("window-size=1400,1500")
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument("start-maximized")
options.add_argument("enable-automation")
options.add_argument("--disable-infobars")
options.add_argument("--disable-dev-shm-usage")
driver = webdriver.Chrome(options=options)
Test Your Connection
I’ve created a python script that you can use to make sure everything is working correctly. Simply SSH into your EC2 instance, enter a python3 environment and run the following script:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
url = 'https://github.com/'
options = Options()
options.add_argument("--headless")
options.add_argument("window-size=1400,1500")
driver = webdriver.Chrome(options=options)
# Navigate to github.com
driver.get(url)
# Extract the top heading from github.com
text = driver.find_element_by_class_name('h000-mktg').text
print(text)
If the test was successful, you should have printed the top headline from github.com which at this current time says: Built for developers