Keyword research is a fundamental process that helps search engine marketers to understand where the market opportunity is and what searchers care about.

After using tools such as Ahrefs/SEOMoz or SEMrush, you can obtain either a list of top pages or keywords which highlight all of the potential market opportunity with metrics including monthly search volume, traffic value ($) and top keyword.

Often marketers will use simple math formulas for prioritising the massive amount of data produced by these tools, then analyse the data into a recommended list of landing pages/blog posts or resources for content creation.

For example Siegemedia recommends the following KOB score:

From looking at this formula, we would assume that **if there was a difficulty level of 75 it would be approximately 3 times larger than a difficulty level of 25. **

However let’s check how Ahrefs calculates keyword difficulty. Because yes,** we love you Ahrefs.**

### Investigating How Ahrefs Calculates Keyword Difficulty

Ahref’s keyword difficulty metric ranges from 0 – 100 and thankfully they’ve provided us with some numerical data showing how the metric is calculated.

As rightly stated by Ahrefs, this data is **not linear.** This means that** assuming 25 is three times smaller than 75 is likely to be a flawed assumption.**

However we can’t quite use Ahref’s metric yet because the data is spread across a range (0 – 10 – 20).

So let’s see if we can **calculate the values in-between these numbers to rescale the KOB traffic formula.**

## TLDR:

```
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp
import numpy as np
import selenium
import pickle
%matplotlib inline
```

```
df = pd.read_csv('ahrefs_keyword_difficulty.csv')
df.rename(columns={'Keyword Difficulty - Y':'Keyword_Difficulty', 'Referring Domains - X':'Referring_Domains'}, inplace=True)
df
```

`sns.scatterplot(data = df, x = 'Keyword_Difficulty', y = 'Referring_Domains') plt.show()`

This is the original relationship of Keyword Difficulty vs Referring Domains. When both scales are in a linear form, the relationship between X to Y appears to be **logarithmic.**

Now let’s apply a log+1 transformation to the Y-axis in an attempt to make the relationship between **X and Y more linear.**

```
df['Log_Referring_Domains'] = df['Referring_Domains'].apply(lambda x: np.log1p(x))
sns.scatterplot(data = df, x = 'Keyword_Difficulty', y = 'Log_Referring_Domains') plt.show()
```

Okay great! We’ve got approximately a straight line that we can model our data against. It has some wiggles, but it’ll do.

Let’s apply a simple linear regression to the data using the least squares method to minimise the error.

```
# Regression Function
def regress(x, y):
"""Return a tuple of predicted y values and parameters for linear regression."""
p = sp.stats.linregress(x, y)
b1, b0, r, p_val, stderr = p
y_pred = sp.polyval([b1, b0], x)
return y_pred, p
# Plotting
x, y = df['Keyword_Difficulty'], df['Log_Referring_Domains'] # transformed data
y_pred, _ = regress(x, y)
plt.plot(x, y, "mo", label="Data")
plt.plot(x, y_pred, "k--", label="Pred.")
plt.xlabel("Keyword Difficulty")
plt.ylabel("Log Referring Domains") # label axis
plt.legend()
plt.show()
```

## Results

Ε· = b0 + b1x, where:

- b0 is a constant (the intercept).
- b1 is the slope (regression coefficient).
- x is the value of the independent input variable.
- Ε· is the predicted value of the dependent output variable.

## Insight

This means that for every ** 1 Keyword Difficulty added, there will be a ~0.06 increase of Log(1 + Referring Domains). **

Now that we can approximate the relationship between X and y, we can now fill in the gaps for every keyword difficulty value.

```
np.expm1(Slope + Intercept)
Slope = 0.06001338738238661
Intercept = 1.330338619969123
y_predictions = []
for i in range(2, 101):
print(np.expm1(Intercept + (Slope * i)))
```

## Conclusion

So in this guide:

- We’ve obtained the coefficients and have modelled the relationship between KW difficulty and Referring Domains.
- We can now create a slightly more tuned traffic analysis formula:
**Monthly Search Volume / Estimated Number of Referring Domains (obtained from the non-linear keyword difficulty metric).** - Please find attached a look-up table for all of the model’s results. Simply make a copy if you’d like to use this!

Enjoy the rest of your week and thanks for reading!