How To Prospect For Companies Without Google My Business Using Python

James Phoenix
James Phoenix

Google My Business is a local SEO directory and a vital marketing channel for local businesses as it helps them to acquire customers within their local search market.

Agencies are able to capitalise on business owners that still haven’t claimed their Google My Business listing.

These make for fresh, easy prospects to veteran SEO’s.

In this guide we will be creating a simple web scraper that will:

  • Find businesses that don’t have a knowledge panel with the Google Search Engine Results Page (SERP).

We’ll be using a mixture of selenium and pandas for this tutorial.


Loading Python Libraries

#Module Dependencies
import time
from time import sleep
import datetime
import selenium
import urllib
import re
from bs4 import BeautifulSoup
from random import randint
from random import uniform

#Import Selenium Dependencies
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

import pandas as pd

1. Let’s get the dataset that we need to scrape

df = pd.read_csv('website_data.csv')
df.head(15)

2. Let’s study a Google Search Query so that we can understand how to structure our target URL:

It’s important that we understand how to structure the URL so that we can dynamically inject our custom brand queries into google searches.

single_keyword_url = 'https://www.google.com/search?q=vidioh&oq=vidioh+&aqs=chrome..69i57j69i60j69i61l2j69i60l2.3654j0j1&sourceid=chrome&ie=UTF-8'

single_word_query = 'https://www.google.com/search?q={0}&oq={1}+&aqs=chrome..69i57j69i60j69i61l2j69i60l2.3654j0j1&sourceid=chrome&ie=UTF-8'.format('vidioh', 'vidioh')


multiple_keyword_url = 'https://www.google.com/search?q=video+brochures&oq=video+brochures&aqs=chrome..69i57j69i60j69i61.1545j0j1&sourceid=chrome&ie=UTF-8'

multiple_word_query = 'https://www.google.com/search?q={0}&oq={1}&aqs=chrome..69i57j69i60j69i61.1545j0j1&sourceid=chrome&ie=UTF-8'.format('video+brochures', 'video+brochures')

3. Identify a GMB knowledge Panel HTML Div

From my initial testing, this isn’t a 100% foolproof method for determining whether a business has a Google My Business listing, but for a 25 minute python script, it’s certainly a good start.

Method:

If there is a div with the class knowledge-panel on the web page, then we can assume that the business brand query is triggered by showcasing a Google My Business page. Therefore we can use this to help with our digital marketing prospecting for local businesses that have yet to invest within a Google My Business page.


4. Extract specific queries for every brand

We will remove all of the website extensions such as .org, .co.uk or .com by simply looking for the first mention of the character: .

df['Queries'] = df['Site'].apply(lambda x: x[0 : x.find('.')])
Awesome, so now we've got some brand names that can be used as queries inside of a custom google search via Selenium!

5. Scrape Google Search Engine Results Page with A BIG TIMER

driver = webdriver.Chrome(executable_path='chromedrivers/chromedriver')

#urls = ['ferryads' , 'matthewfuneralhome'] <-- I built the method to work on two search queries before moving to the entire 300+ list.

knowledge_panel_results = []

query_string = 'https://www.google.co.uk/search?source=hp&ei=_aegXcHTIsOVsAfNpLf4Bg&q={}&oq={}&gs_l=psy-ab.3..0i131j0l3j0i131j0l3j0i131j0.302585.302825..302898...0.0..0.46.172.4......0....1..gws-wiz.....0.GfS7vSMN0Qs&ved=0ahUKEwiBxtPaypTlAhXDCuwKHU3SDW8Q4dUDCAg&uact=5'

for url in list(df['Queries']):
    query = query_string.format(url, url)
    driver.get(query)
    
    try: 
        element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "knowledge-panel"))
            )
        knowledge_panel_results.append('True')
    except:
        knowledge_panel_results.append('False')
        
    sleep(randint(10,29))   

Then simply re-combine the list of True/False’s with your original dataframe and you’ll have some easy prospects to send digital marketing proposals to 🙂

Unleash Your Potential with AI-Powered Prompt Engineering!

Dive into our comprehensive Udemy course and learn to craft compelling, AI-optimized prompts. Boost your skills and open up new possibilities with ChatGPT and Prompt Engineering.

Embark on Your AI Journey Now!
df['Knowledge_Panel_Results'] = knowledge_panel_results

And voila! We can save the data to a CSV and forward it on to any business development executives / managers to start making prospecting calls.

df.to_csv('results.csv')

Taggedtutorial


More Stories

Cover Image for Why I’m Betting on AI Agents as the Future of Work

Why I’m Betting on AI Agents as the Future of Work

I’ve been spending a lot of time with Devin lately, and I’ve got to tell you – we’re thinking about AI agents all wrong. You and I are standing at the edge of a fundamental shift in how we work with AI. These aren’t just tools anymore; they’re becoming more like background workers in our digital lives. Let me share what I’ve…

James Phoenix
James Phoenix
Cover Image for Supercharging Devin + Supabase: Fixing Docker Performance on EC2 with overlay2

Supercharging Devin + Supabase: Fixing Docker Performance on EC2 with overlay2

The Problem While setting up Devin (a coding assistant) with Supabase CLI on an EC2 instance, I encountered significant performance issues. After investigation, I discovered that Docker was using the VFS storage driver, which is known for being significantly slower than other storage drivers like overlay2. The root cause was interesting: the EC2 instance was already using overlayfs for its root filesystem,…

James Phoenix
James Phoenix