How To Prospect For Companies Without Google My Business Using Python

Cover Image for How To Prospect For Companies Without Google My Business Using Python
James Phoenix
James Phoenix

Google My Business is a local SEO directory and a vital marketing channel for local businesses as it helps them to acquire customers within their local search market.

Agencies are able to capitalise on business owners that still haven’t claimed their Google My Business listing.

These make for fresh, easy prospects to veteran SEO’s.

In this guide we will be creating a simple web scraper that will:

  • Find businesses that don’t have a knowledge panel with the Google Search Engine Results Page (SERP).

We’ll be using a mixture of selenium and pandas for this tutorial.


Loading Python Libraries

#Module Dependencies
import time
from time import sleep
import datetime
import selenium
import urllib
import re
from bs4 import BeautifulSoup
from random import randint
from random import uniform

#Import Selenium Dependencies
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

import pandas as pd

1. Let’s get the dataset that we need to scrape

df = pd.read_csv('website_data.csv')
df.head(15)

2. Let’s study a Google Search Query so that we can understand how to structure our target URL:

It’s important that we understand how to structure the URL so that we can dynamically inject our custom brand queries into google searches.

single_keyword_url = 'https://www.google.com/search?q=vidioh&oq=vidioh+&aqs=chrome..69i57j69i60j69i61l2j69i60l2.3654j0j1&sourceid=chrome&ie=UTF-8'

single_word_query = 'https://www.google.com/search?q={0}&oq={1}+&aqs=chrome..69i57j69i60j69i61l2j69i60l2.3654j0j1&sourceid=chrome&ie=UTF-8'.format('vidioh', 'vidioh')


multiple_keyword_url = 'https://www.google.com/search?q=video+brochures&oq=video+brochures&aqs=chrome..69i57j69i60j69i61.1545j0j1&sourceid=chrome&ie=UTF-8'

multiple_word_query = 'https://www.google.com/search?q={0}&oq={1}&aqs=chrome..69i57j69i60j69i61.1545j0j1&sourceid=chrome&ie=UTF-8'.format('video+brochures', 'video+brochures')

3. Identify a GMB knowledge Panel HTML Div

From my initial testing, this isn’t a 100% foolproof method for determining whether a business has a Google My Business listing, but for a 25 minute python script, it’s certainly a good start.

Method:

If there is a div with the class knowledge-panel on the web page, then we can assume that the business brand query is triggered by showcasing a Google My Business page. Therefore we can use this to help with our digital marketing prospecting for local businesses that have yet to invest within a Google My Business page.


4. Extract specific queries for every brand

We will remove all of the website extensions such as .org, .co.uk or .com by simply looking for the first mention of the character: .

df['Queries'] = df['Site'].apply(lambda x: x[0 : x.find('.')])
Awesome, so now we've got some brand names that can be used as queries inside of a custom google search via Selenium!

5. Scrape Google Search Engine Results Page with A BIG TIMER

driver = webdriver.Chrome(executable_path='chromedrivers/chromedriver')

#urls = ['ferryads' , 'matthewfuneralhome'] <-- I built the method to work on two search queries before moving to the entire 300+ list.

knowledge_panel_results = []

query_string = 'https://www.google.co.uk/search?source=hp&ei=_aegXcHTIsOVsAfNpLf4Bg&q={}&oq={}&gs_l=psy-ab.3..0i131j0l3j0i131j0l3j0i131j0.302585.302825..302898...0.0..0.46.172.4......0....1..gws-wiz.....0.GfS7vSMN0Qs&ved=0ahUKEwiBxtPaypTlAhXDCuwKHU3SDW8Q4dUDCAg&uact=5'

for url in list(df['Queries']):
    query = query_string.format(url, url)
    driver.get(query)
    
    try: 
        element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "knowledge-panel"))
            )
        knowledge_panel_results.append('True')
    except:
        knowledge_panel_results.append('False')
        
    sleep(randint(10,29))   

Then simply re-combine the list of True/False’s with your original dataframe and you’ll have some easy prospects to send digital marketing proposals to 🙂

df['Knowledge_Panel_Results'] = knowledge_panel_results

And voila! We can save the data to a CSV and forward it on to any business development executives / managers to start making prospecting calls.

df.to_csv('results.csv')

Taggedtutorial


More Stories

Cover Image for What Are Webhooks? And How Do They Relate to Data Engineering?

What Are Webhooks? And How Do They Relate to Data Engineering?

Webhooks are a simple and powerful method for receiving real-time notifications when certain events occur. They enable a whole host of automated and interconnected applications. Broadly speaking, your apps can communicate via two main ways: polling and webhooks.  Polling is like going to a shop and asking for pizza – you have to ask whenever you want it. Webhooks are almost the…

James Phoenix
James Phoenix
Cover Image for What is an API? And How Do They Relate to Data Engineering?

What is an API? And How Do They Relate to Data Engineering?

An API, or Application Programming Interface, is a set of rules and protocols that allow different software systems to communicate with each other. Put simply, APIs define how software components should interact and allow developers to create new applications by leveraging existing functionality from other systems. Research shows that business investment in API has boomed in recent years, and significant API investment…

James Phoenix
James Phoenix