Keyword research is a fundamental process that helps search engine marketers to understand where the market opportunity is and what searchers care about.

After using tools such as Ahrefs/SEOMoz or SEMrush, you can obtain either a list of top pages or keywords which highlight all of the potential market opportunity with metrics including monthly search volume, traffic value ($) and top keyword.

Often marketers will use simple math formulas for prioritising the massive amount of data produced by these tools, then analyse the data into a recommended list of landing pages/blog posts or resources for content creation.

For example Siegemedia recommends the following KOB score:

From looking at this formula, we would assume that **if there was a difficulty level of 75 it would be approximately 3 times larger than a difficulty level of 25. **

However let’s check how Ahrefs calculates keyword difficulty. Because yes,** we love you Ahrefs.**

### Investigating How Ahrefs Calculates Keyword Difficulty

Ahref’s keyword difficulty metric ranges from 0 – 100 and thankfully they’ve provided us with some numerical data showing how the metric is calculated.

As rightly stated by Ahrefs, this data is **not linear.** This means that** assuming 25 is three times smaller than 75 is likely to be a flawed assumption.**

However we can’t quite use Ahref’s metric yet because the data is spread across a range (0 – 10 – 20).

So let’s see if we can **calculate the values in-between these numbers to rescale the KOB traffic formula.**

## TLDR:

1 2 3 4 5 6 7 8 | import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import scipy as sp import numpy as np import selenium import pickle %matplotlib inline |

1 2 3 | df = pd.read_csv('ahrefs_keyword_difficulty.csv') df.rename(columns={'Keyword Difficulty - Y':'Keyword_Difficulty', 'Referring Domains - X':'Referring_Domains'}, inplace=True) df |

1 | sns.scatterplot(data = df, x = 'Keyword_Difficulty', y = 'Referring_Domains') plt.show() |

This is the original relationship of Keyword Difficulty vs Referring Domains. When both scales are in a linear form, the relationship between X to Y appears to be **logarithmic.**

Now let’s apply a log+1 transformation to the Y-axis in an attempt to make the relationship between **X and Y more linear.**

1 2 3 | df['Log_Referring_Domains'] = df['Referring_Domains'].apply(lambda x: np.log1p(x)) sns.scatterplot(data = df, x = 'Keyword_Difficulty', y = 'Log_Referring_Domains') plt.show() |

Okay great! We’ve got approximately a straight line that we can model our data against. It has some wiggles, but it’ll do.

Let’s apply a simple linear regression to the data using the least squares method to minimise the error.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # Regression Function def regress(x, y): """Return a tuple of predicted y values and parameters for linear regression.""" p = sp.stats.linregress(x, y) b1, b0, r, p_val, stderr = p y_pred = sp.polyval([b1, b0], x) return y_pred, p # Plotting x, y = df['Keyword_Difficulty'], df['Log_Referring_Domains'] # transformed data y_pred, _ = regress(x, y) plt.plot(x, y, "mo", label="Data") plt.plot(x, y_pred, "k--", label="Pred.") plt.xlabel("Keyword Difficulty") plt.ylabel("Log Referring Domains") # label axis plt.legend() plt.show() |

## Results

Ε· = b0 + b1x, where:

- b0 is a constant (the intercept).
- b1 is the slope (regression coefficient).
- x is the value of the independent input variable.
- Ε· is the predicted value of the dependent output variable.

## Insight

This means that for every ** 1 Keyword Difficulty added, there will be a ~0.06 increase of Log(1 + Referring Domains). **

Now that we can approximate the relationship between X and y, we can now fill in the gaps for every keyword difficulty value.

1 2 3 4 5 6 7 | np.expm1(Slope + Intercept) Slope = 0.06001338738238661 Intercept = 1.330338619969123 y_predictions = [] for i in range(2, 101): print(np.expm1(Intercept + (Slope * i))) |

## Conclusion

So in this guide:

- We’ve obtained the coefficients and have modelled the relationship between KW difficulty and Referring Domains.
- We can now create a slightly more tuned traffic analysis formula:
**Monthly Search Volume / Estimated Number of Referring Domains (obtained from the non-linear keyword difficulty metric).** - Please find attached a look-up table for all of the model’s results. Simply make a copy if you’d like to use this!

Enjoy the rest of your week and thanks for reading!