Redefining Traffic Opportunity Analysis With Ahrefs & Python

James Phoenix
James Phoenix

Keyword research is a fundamental process that helps search engine marketers to understand where the market opportunity is and what searchers care about.

After using tools such as Ahrefs/SEOMoz or SEMrush, you can obtain either a list of top pages or keywords which highlight all of the potential market opportunity with metrics including monthly search volume, traffic value ($) and top keyword.


Often marketers will use simple math formulas for prioritising the massive amount of data produced by these tools, then analyse the data into a recommended list of landing pages/blog posts or resources for content creation.


For example Siegemedia recommends the following KOB score:

From looking at this formula, we would assume that if there was a difficulty level of 75 it would be approximately 3 times larger than a difficulty level of 25.

As a side note, I recently found a very comprehensive list of interview questions that you can ask an SEO to ensure that you hire the right person by Diggity Marketing.

However let’s check how Ahrefs calculates keyword difficulty. Because yes, we love you Ahrefs.


Investigating How Ahrefs Calculates Keyword Difficulty

Ahref’s keyword difficulty metric ranges from 0 – 100 and thankfully they’ve provided us with some numerical data showing how the metric is calculated.


As rightly stated by Ahrefs, this data is not linear. This means that assuming 25 is three times smaller than 75 is likely to be a flawed assumption.

However we can’t quite use Ahref’s metric yet because the data is spread across a range (0 – 10 – 20).

So let’s see if we can calculate the values in-between these numbers to rescale the KOB traffic formula.


TLDR:


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp
import numpy as np
import selenium
import pickle
%matplotlib inline
df = pd.read_csv('ahrefs_keyword_difficulty.csv')
df.rename(columns={'Keyword Difficulty - Y':'Keyword_Difficulty', 'Referring Domains - X':'Referring_Domains'}, inplace=True) 
df
sns.scatterplot(data = df, x = 'Keyword_Difficulty', y = 'Referring_Domains') plt.show()

This is the original relationship of Keyword Difficulty vs Referring Domains. When both scales are in a linear form, the relationship between X to Y appears to be logarithmic.

Now let’s apply a log+1 transformation to the Y-axis in an attempt to make the relationship between X and Y more linear.

df['Log_Referring_Domains'] = df['Referring_Domains'].apply(lambda x: np.log1p(x))
sns.scatterplot(data = df, x = 'Keyword_Difficulty', y = 'Log_Referring_Domains') plt.show()

Okay great! We’ve got approximately a straight line that we can model our data against. It has some wiggles, but it’ll do.

Unleash Your Potential with AI-Powered Prompt Engineering!

Dive into our comprehensive Udemy course and learn to craft compelling, AI-optimized prompts. Boost your skills and open up new possibilities with ChatGPT and Prompt Engineering.

Embark on Your AI Journey Now!

Let’s apply a simple linear regression to the data using the least squares method to minimise the error.

# Regression Function
def regress(x, y):
    """Return a tuple of predicted y values and parameters for linear regression."""
    p = sp.stats.linregress(x, y)
    b1, b0, r, p_val, stderr = p
    y_pred = sp.polyval([b1, b0], x)
    return y_pred, p

# Plotting
x, y = df['Keyword_Difficulty'], df['Log_Referring_Domains']                      # transformed data
y_pred, _ = regress(x, y)

plt.plot(x, y, "mo", label="Data")
plt.plot(x, y_pred, "k--", label="Pred.")
plt.xlabel("Keyword Difficulty")
plt.ylabel("Log Referring Domains")                            # label axis
plt.legend()
plt.show()

Results

ŷ = b0 + b1x, where:

  • b0 is a constant (the intercept).
  • b1 is the slope (regression coefficient).
  • x is the value of the independent input variable.
  • ŷ is the predicted value of the dependent output variable.

Insight

This means that for every 1 Keyword Difficulty added, there will be a ~0.06 increase of Log(1 + Referring Domains).

Now that we can approximate the relationship between X and y, we can now fill in the gaps for every keyword difficulty value.

np.expm1(Slope + Intercept)
Slope = 0.06001338738238661 
Intercept = 1.330338619969123

y_predictions = []
for i in range(2, 101): 
    print(np.expm1(Intercept + (Slope * i)))

Conclusion

So in this guide:


Enjoy the rest of your week and thanks for reading!

Taggedtutorial


More Stories

Cover Image for Soft Skills for Programmers: Why They Matter and How to Develop Them

Soft Skills for Programmers: Why They Matter and How to Develop Them

Overview You need a variety of soft skills in addition to technical skills to succeed in the technology sector. Soft skills are used by software professionals to collaborate with their peers effectively and profitably. Finding out more about soft skills and how they are used in the workplace will help you get ready for the job if you are interested in a…

James Phoenix
James Phoenix
Cover Image for What Are Webhooks? And How Do They Relate to Data Engineering?

What Are Webhooks? And How Do They Relate to Data Engineering?

Webhooks are a simple and powerful method for receiving real-time notifications when certain events occur. They enable a whole host of automated and interconnected applications. Broadly speaking, your apps can communicate via two main ways: polling and webhooks.  Polling is like going to a shop and asking for pizza – you have to ask whenever you want it. Webhooks are almost the…

James Phoenix
James Phoenix