Redefining Traffic Opportunity Analysis With Ahrefs & Python

James Phoenix
James Phoenix

Keyword research is a fundamental process that helps search engine marketers to understand where the market opportunity is and what searchers care about.

After using tools such as Ahrefs/SEOMoz or SEMrush, you can obtain either a list of top pages or keywords which highlight all of the potential market opportunity with metrics including monthly search volume, traffic value ($) and top keyword.


Often marketers will use simple math formulas for prioritising the massive amount of data produced by these tools, then analyse the data into a recommended list of landing pages/blog posts or resources for content creation.


For example Siegemedia recommends the following KOB score:

From looking at this formula, we would assume that if there was a difficulty level of 75 it would be approximately 3 times larger than a difficulty level of 25.

As a side note, I recently found a very comprehensive list of interview questions that you can ask an SEO to ensure that you hire the right person by Diggity Marketing.

However let’s check how Ahrefs calculates keyword difficulty. Because yes, we love you Ahrefs.


Investigating How Ahrefs Calculates Keyword Difficulty

Ahref’s keyword difficulty metric ranges from 0 – 100 and thankfully they’ve provided us with some numerical data showing how the metric is calculated.


As rightly stated by Ahrefs, this data is not linear. This means that assuming 25 is three times smaller than 75 is likely to be a flawed assumption.

However we can’t quite use Ahref’s metric yet because the data is spread across a range (0 – 10 – 20).

So let’s see if we can calculate the values in-between these numbers to rescale the KOB traffic formula.


TLDR:


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp
import numpy as np
import selenium
import pickle
%matplotlib inline
df = pd.read_csv('ahrefs_keyword_difficulty.csv')
df.rename(columns={'Keyword Difficulty - Y':'Keyword_Difficulty', 'Referring Domains - X':'Referring_Domains'}, inplace=True) 
df
sns.scatterplot(data = df, x = 'Keyword_Difficulty', y = 'Referring_Domains') plt.show()

This is the original relationship of Keyword Difficulty vs Referring Domains. When both scales are in a linear form, the relationship between X to Y appears to be logarithmic.

Now let’s apply a log+1 transformation to the Y-axis in an attempt to make the relationship between X and Y more linear.

df['Log_Referring_Domains'] = df['Referring_Domains'].apply(lambda x: np.log1p(x))
sns.scatterplot(data = df, x = 'Keyword_Difficulty', y = 'Log_Referring_Domains') plt.show()

Okay great! We’ve got approximately a straight line that we can model our data against. It has some wiggles, but it’ll do.

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated
Claude Code + agentic systems
View Book

Let’s apply a simple linear regression to the data using the least squares method to minimise the error.

# Regression Function
def regress(x, y):
    """Return a tuple of predicted y values and parameters for linear regression."""
    p = sp.stats.linregress(x, y)
    b1, b0, r, p_val, stderr = p
    y_pred = sp.polyval([b1, b0], x)
    return y_pred, p

# Plotting
x, y = df['Keyword_Difficulty'], df['Log_Referring_Domains']                      # transformed data
y_pred, _ = regress(x, y)

plt.plot(x, y, "mo", label="Data")
plt.plot(x, y_pred, "k--", label="Pred.")
plt.xlabel("Keyword Difficulty")
plt.ylabel("Log Referring Domains")                            # label axis
plt.legend()
plt.show()

Results

ŷ = b0 + b1x, where:

  • b0 is a constant (the intercept).
  • b1 is the slope (regression coefficient).
  • x is the value of the independent input variable.
  • ŷ is the predicted value of the dependent output variable.

Insight

This means that for every 1 Keyword Difficulty added, there will be a ~0.06 increase of Log(1 + Referring Domains).

Now that we can approximate the relationship between X and y, we can now fill in the gaps for every keyword difficulty value.

np.expm1(Slope + Intercept)
Slope = 0.06001338738238661 
Intercept = 1.330338619969123

y_predictions = []
for i in range(2, 101): 
    print(np.expm1(Intercept + (Slope * i)))

Conclusion

So in this guide:


Enjoy the rest of your week and thanks for reading!

Topics
Tutorial

More Insights

Cover Image for Memory Engineering as Data Modelling

Memory Engineering as Data Modelling

Agent memory is not a feature. It is a data modelling problem with a lifecycle.

James Phoenix
James Phoenix
Cover Image for Concept Template

Concept Template

Use this template for each new concept. Copy and rename.

James Phoenix
James Phoenix