Redefining Traffic Opportunity Analysis With Ahrefs & Python

James Phoenix
James Phoenix

Keyword research is a fundamental process that helps search engine marketers to understand where the market opportunity is and what searchers care about.

After using tools such as Ahrefs/SEOMoz or SEMrush, you can obtain either a list of top pages or keywords which highlight all of the potential market opportunity with metrics including monthly search volume, traffic value ($) and top keyword.


Often marketers will use simple math formulas for prioritising the massive amount of data produced by these tools, then analyse the data into a recommended list of landing pages/blog posts or resources for content creation.


For example Siegemedia recommends the following KOB score:

Unleash Your Potential with AI-Powered Prompt Engineering!

Dive into our comprehensive Udemy course and learn to craft compelling, AI-optimized prompts. Boost your skills and open up new possibilities with ChatGPT and Prompt Engineering.

Embark on Your AI Journey Now!

From looking at this formula, we would assume that if there was a difficulty level of 75 it would be approximately 3 times larger than a difficulty level of 25.

As a side note, I recently found a very comprehensive list of interview questions that you can ask an SEO to ensure that you hire the right person by Diggity Marketing.

However let’s check how Ahrefs calculates keyword difficulty. Because yes, we love you Ahrefs.


Investigating How Ahrefs Calculates Keyword Difficulty

Ahref’s keyword difficulty metric ranges from 0 – 100 and thankfully they’ve provided us with some numerical data showing how the metric is calculated.


As rightly stated by Ahrefs, this data is not linear. This means that assuming 25 is three times smaller than 75 is likely to be a flawed assumption.

However we can’t quite use Ahref’s metric yet because the data is spread across a range (0 – 10 – 20).

So let’s see if we can calculate the values in-between these numbers to rescale the KOB traffic formula.


TLDR:


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp
import numpy as np
import selenium
import pickle
%matplotlib inline
df = pd.read_csv('ahrefs_keyword_difficulty.csv')
df.rename(columns={'Keyword Difficulty - Y':'Keyword_Difficulty', 'Referring Domains - X':'Referring_Domains'}, inplace=True) 
df
sns.scatterplot(data = df, x = 'Keyword_Difficulty', y = 'Referring_Domains') plt.show()

This is the original relationship of Keyword Difficulty vs Referring Domains. When both scales are in a linear form, the relationship between X to Y appears to be logarithmic.

Now let’s apply a log+1 transformation to the Y-axis in an attempt to make the relationship between X and Y more linear.

df['Log_Referring_Domains'] = df['Referring_Domains'].apply(lambda x: np.log1p(x))
sns.scatterplot(data = df, x = 'Keyword_Difficulty', y = 'Log_Referring_Domains') plt.show()

Okay great! We’ve got approximately a straight line that we can model our data against. It has some wiggles, but it’ll do.

Let’s apply a simple linear regression to the data using the least squares method to minimise the error.

# Regression Function
def regress(x, y):
    """Return a tuple of predicted y values and parameters for linear regression."""
    p = sp.stats.linregress(x, y)
    b1, b0, r, p_val, stderr = p
    y_pred = sp.polyval([b1, b0], x)
    return y_pred, p

# Plotting
x, y = df['Keyword_Difficulty'], df['Log_Referring_Domains']                      # transformed data
y_pred, _ = regress(x, y)

plt.plot(x, y, "mo", label="Data")
plt.plot(x, y_pred, "k--", label="Pred.")
plt.xlabel("Keyword Difficulty")
plt.ylabel("Log Referring Domains")                            # label axis
plt.legend()
plt.show()

Results

ŷ = b0 + b1x, where:

  • b0 is a constant (the intercept).
  • b1 is the slope (regression coefficient).
  • x is the value of the independent input variable.
  • ŷ is the predicted value of the dependent output variable.

Insight

This means that for every 1 Keyword Difficulty added, there will be a ~0.06 increase of Log(1 + Referring Domains).

Now that we can approximate the relationship between X and y, we can now fill in the gaps for every keyword difficulty value.

np.expm1(Slope + Intercept)
Slope = 0.06001338738238661 
Intercept = 1.330338619969123

y_predictions = []
for i in range(2, 101): 
    print(np.expm1(Intercept + (Slope * i)))

Conclusion

So in this guide:


Enjoy the rest of your week and thanks for reading!

Taggedtutorial


More Stories

Cover Image for Why I’m Betting on AI Agents as the Future of Work

Why I’m Betting on AI Agents as the Future of Work

I’ve been spending a lot of time with Devin lately, and I’ve got to tell you – we’re thinking about AI agents all wrong. You and I are standing at the edge of a fundamental shift in how we work with AI. These aren’t just tools anymore; they’re becoming more like background workers in our digital lives. Let me share what I’ve…

James Phoenix
James Phoenix
Cover Image for Supercharging Devin + Supabase: Fixing Docker Performance on EC2 with overlay2

Supercharging Devin + Supabase: Fixing Docker Performance on EC2 with overlay2

The Problem While setting up Devin (a coding assistant) with Supabase CLI on an EC2 instance, I encountered significant performance issues. After investigation, I discovered that Docker was using the VFS storage driver, which is known for being significantly slower than other storage drivers like overlay2. The root cause was interesting: the EC2 instance was already using overlayfs for its root filesystem,…

James Phoenix
James Phoenix