Google My Business is a local SEO directory and a vital marketing channel for local businesses as it helps them to acquire customers within their local search market.

Agencies are able to capitalise on business owners that still haven’t claimed their Google My Business listing.

These make for fresh, easy prospects to veteran SEO’s.

In this guide we will be creating a simple web scraper that will:

  • Find businesses that don’t have a knowledge panel with the Google Search Engine Results Page (SERP).

We’ll be using a mixture of selenium and pandas for this tutorial.


Loading Python Libraries

#Module Dependencies
import time
from time import sleep
import datetime
import selenium
import urllib
import re
from bs4 import BeautifulSoup
from random import randint
from random import uniform

#Import Selenium Dependencies
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

import pandas as pd

1. Let’s get the dataset that we need to scrape

df = pd.read_csv('website_data.csv')
df.head(15)

2. Let’s study a Google Search Query so that we can understand how to structure our target URL:

It’s important that we understand how to structure the URL so that we can dynamically inject our custom brand queries into google searches.

single_keyword_url = 'https://www.google.com/search?q=vidioh&oq=vidioh+&aqs=chrome..69i57j69i60j69i61l2j69i60l2.3654j0j1&sourceid=chrome&ie=UTF-8'

single_word_query = 'https://www.google.com/search?q={0}&oq={1}+&aqs=chrome..69i57j69i60j69i61l2j69i60l2.3654j0j1&sourceid=chrome&ie=UTF-8'.format('vidioh', 'vidioh')


multiple_keyword_url = 'https://www.google.com/search?q=video+brochures&oq=video+brochures&aqs=chrome..69i57j69i60j69i61.1545j0j1&sourceid=chrome&ie=UTF-8'

multiple_word_query = 'https://www.google.com/search?q={0}&oq={1}&aqs=chrome..69i57j69i60j69i61.1545j0j1&sourceid=chrome&ie=UTF-8'.format('video+brochures', 'video+brochures')

3. Identify a GMB knowledge Panel HTML Div

From my initial testing, this isn’t a 100% foolproof method for determining whether a business has a Google My Business listing, but for a 25 minute python script, it’s certainly a good start.

Method:

If there is a div with the class knowledge-panel on the web page, then we can assume that the business brand query is triggered by showcasing a Google My Business page. Therefore we can use this to help with our digital marketing prospecting for local businesses that have yet to invest within a Google My Business page.


4. Extract specific queries for every brand

We will remove all of the website extensions such as .org, .co.uk or .com by simply looking for the first mention of the character: .

df['Queries'] = df['Site'].apply(lambda x: x[0 : x.find('.')])
Awesome, so now we've got some brand names that can be used as queries inside of a custom google search via Selenium!

5. Scrape Google Search Engine Results Page with A BIG TIMER

driver = webdriver.Chrome(executable_path='chromedrivers/chromedriver')

#urls = ['ferryads' , 'matthewfuneralhome'] < – I built the method to work on two search queries before moving to the entire 300+ list.

knowledge_panel_results = []

query_string = 'https://www.google.co.uk/search?source=hp&ei=_aegXcHTIsOVsAfNpLf4Bg&q={}&oq={}&gs_l=psy-ab.3..0i131j0l3j0i131j0l3j0i131j0.302585.302825..302898...0.0..0.46.172.4......0....1..gws-wiz.....0.GfS7vSMN0Qs&ved=0ahUKEwiBxtPaypTlAhXDCuwKHU3SDW8Q4dUDCAg&uact=5'

for url in list(df['Queries']):
    query = query_string.format(url, url)
    driver.get(query)
    
    try: 
        element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "knowledge-panel"))
            )
        knowledge_panel_results.append('True')
    except:
        knowledge_panel_results.append('False')
        
    sleep(randint(10,29))   

Then simply re-combine the list of True/False’s with your original dataframe and you’ll have some easy prospects to send digital marketing proposals to πŸ™‚

df['Knowledge_Panel_Results'] = knowledge_panel_results

And voila! We can save the data to a CSV and forward it on to any business development executives / managers to start making prospecting calls.

df.to_csv('results.csv')
What's your reaction?