Redefining Traffic Opportunity Analysis With Ahrefs & Python

James Phoenix
James Phoenix

Keyword research is a fundamental process that helps search engine marketers to understand where the market opportunity is and what searchers care about.

After using tools such as Ahrefs/SEOMoz or SEMrush, you can obtain either a list of top pages or keywords which highlight all of the potential market opportunity with metrics including monthly search volume, traffic value ($) and top keyword.


Often marketers will use simple math formulas for prioritising the massive amount of data produced by these tools, then analyse the data into a recommended list of landing pages/blog posts or resources for content creation.


For example Siegemedia recommends the following KOB score:

From looking at this formula, we would assume that if there was a difficulty level of 75 it would be approximately 3 times larger than a difficulty level of 25.

As a side note, I recently found a very comprehensive list of interview questions that you can ask an SEO to ensure that you hire the right person by Diggity Marketing.

However let’s check how Ahrefs calculates keyword difficulty. Because yes, we love you Ahrefs.


Investigating How Ahrefs Calculates Keyword Difficulty

Ahref’s keyword difficulty metric ranges from 0 – 100 and thankfully they’ve provided us with some numerical data showing how the metric is calculated.


As rightly stated by Ahrefs, this data is not linear. This means that assuming 25 is three times smaller than 75 is likely to be a flawed assumption.

However we can’t quite use Ahref’s metric yet because the data is spread across a range (0 – 10 – 20).

So let’s see if we can calculate the values in-between these numbers to rescale the KOB traffic formula.


TLDR:


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp
import numpy as np
import selenium
import pickle
%matplotlib inline
df = pd.read_csv('ahrefs_keyword_difficulty.csv')
df.rename(columns={'Keyword Difficulty - Y':'Keyword_Difficulty', 'Referring Domains - X':'Referring_Domains'}, inplace=True) 
df
sns.scatterplot(data = df, x = 'Keyword_Difficulty', y = 'Referring_Domains') plt.show()

This is the original relationship of Keyword Difficulty vs Referring Domains. When both scales are in a linear form, the relationship between X to Y appears to be logarithmic.

Now let’s apply a log+1 transformation to the Y-axis in an attempt to make the relationship between X and Y more linear.

df['Log_Referring_Domains'] = df['Referring_Domains'].apply(lambda x: np.log1p(x))
sns.scatterplot(data = df, x = 'Keyword_Difficulty', y = 'Log_Referring_Domains') plt.show()

Okay great! We’ve got approximately a straight line that we can model our data against. It has some wiggles, but it’ll do.

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated
Claude Code + agentic systems
View Book

Let’s apply a simple linear regression to the data using the least squares method to minimise the error.

# Regression Function
def regress(x, y):
    """Return a tuple of predicted y values and parameters for linear regression."""
    p = sp.stats.linregress(x, y)
    b1, b0, r, p_val, stderr = p
    y_pred = sp.polyval([b1, b0], x)
    return y_pred, p

# Plotting
x, y = df['Keyword_Difficulty'], df['Log_Referring_Domains']                      # transformed data
y_pred, _ = regress(x, y)

plt.plot(x, y, "mo", label="Data")
plt.plot(x, y_pred, "k--", label="Pred.")
plt.xlabel("Keyword Difficulty")
plt.ylabel("Log Referring Domains")                            # label axis
plt.legend()
plt.show()

Results

ŷ = b0 + b1x, where:

  • b0 is a constant (the intercept).
  • b1 is the slope (regression coefficient).
  • x is the value of the independent input variable.
  • ŷ is the predicted value of the dependent output variable.

Insight

This means that for every 1 Keyword Difficulty added, there will be a ~0.06 increase of Log(1 + Referring Domains).

Now that we can approximate the relationship between X and y, we can now fill in the gaps for every keyword difficulty value.

np.expm1(Slope + Intercept)
Slope = 0.06001338738238661 
Intercept = 1.330338619969123

y_predictions = []
for i in range(2, 101): 
    print(np.expm1(Intercept + (Slope * i)))

Conclusion

So in this guide:


Enjoy the rest of your week and thanks for reading!

Topics
Tutorial

Newsletter

Become a better AI engineer

Weekly deep dives on production AI systems, context engineering, and the patterns that compound. No fluff, no tutorials. Just what works.

Join 306K+ developers. No spam. Unsubscribe anytime.


More Insights

Cover Image for How to Easily Translate High Fidelity Prototypes into Functional Apps

How to Easily Translate High Fidelity Prototypes into Functional Apps

Vague specs do not converge. Scalar loss functions do. If you can hand the agent a number that says “you are 0.66 wrong,” it will close the gap on its own.

James Phoenix
James Phoenix
Cover Image for The Four-Layer Wall Around Your Library’s Public API

The Four-Layer Wall Around Your Library’s Public API

When an agent loop writes most of your library, the largest risk is not a bug in a feature. It is the loop helpfully exporting an internal helper, an experimental type, or a half-finished module. Once that ships in a minor release, you own it forever. Four package-level layers stop the loop from doing this without anyone having to remember.

James Phoenix
James Phoenix