Future-proof your skills with Linux, Python, vim & git as I share with you the most timeless and love-worthy tools in tech through my two great projects that work great together.

Optimizing MOZ Pro True Competitor Keyword Overlap For Quota Limits

I'm using the MOZ Pro True Competitor report to optimize keyword overlap, and have identified 128 competitors. After filtering them to those with at least 2% overlap, I'm left with 59 unique sites to download. To ensure accuracy, I'm adding the original sites and their subdomains back into the list. Now, I'm feeling rested and ready to tackle a mini-project of renaming the downloads without subdomains. Join me as I take on this task!

Ready to Tackle a Mini-Project: Optimizing MOZ Pro True Competitor Keyword Overlap for Quota Limits

By Michael Levin

Tuesday, April 18, 2023

Alright, I’m liking this chisel-strike approach. I just keep powering through the next baby-step, documenting it as much as reasonably possible. I also have access to the Snowflake system now where I’ll be doing this same project in parallel, with a different set of benefits.

Let me first get the next round of data. I want to know the keywords that are producing traffic for each of these sites. I’ll be using up my MOZ Pro query quotas on this, most likely.

If I were to look at all the competitors:

competitors = set(df["Top Competitor URLs"].unique())

I see that there’s 128. I only have 130 keyword queries left on MOZ Pro, so I want to get that number down a bit. This brings it down to 75 competitors based on the most overlap. I’ve filtered those competitors who have less than 2% overlap.

from pathlib import Path
import pandas as pd

pd.set_option("display.width", 1000)

DATA_DIR = "../data"

dfs = []
for f in Path(DATA_DIR).glob("Competitive Research_ True Competitor - Moz Pro*.csv"):
    fname = f.name
    site = fname.split()[-1][4:-4]
    df = pd.read_csv(f, encoding="utf-8")
    df["Competitor"] = site

df = pd.concat(dfs)

# Convert the percentage column to a float
df["Overlap"] = df["Overlap"].str.strip().str.rstrip("%").astype(float) / 100

# Filter to sites that have at least 2% keyword overlap
df = df[df["Overlap"] > 0.02]

# Sort by where there's the most overlap
df.sort_values("Overlap", ascending=False, inplace=True)

        Top Competitor URLs  Domain Authority  Overlap  Rivalry          Competitor
0             www.webmd.com                94     0.19       62  everydayhealth.com
0           www.semrush.com                88     0.15       78             moz.com
1        www.healthline.com                88     0.15       61  everydayhealth.com
2        www.mayoclinic.org                91     0.14       56  everydayhealth.com
0      www.whattoexpect.com                76     0.14       74      babycenter.com
1                ahrefs.com                84     0.13       69             moz.com
3           www.thebump.com                74     0.12       65      babycenter.com
4           www.pampers.com                66     0.11       61      babycenter.com
5    www.verywellfamily.com                71     0.10       59      babycenter.com
5        www.wordstream.com                83     0.10       64             moz.com
6  www.medicalnewstoday.com                91     0.10       49  everydayhealth.com
2     americanpregnancy.org                81     0.10       68      babycenter.com
1                www.nhs.uk                91     0.09       74      babycenter.com
7           medlineplus.gov                87     0.09       47  everydayhealth.com
2           www.youtube.com               100     0.09       66           pcmag.com

And only 59 of them are unique:

competitors = set(df["Top Competitor URLs"].unique())

That’s 59 different MOZ Pro data downloads I need to do. This manual work in MOZ Pro is arguably the most difficult (and certainly tedious) part of the project. Let’s write them out to a local file.

It’s also worth pointing out that the sites returned from MOZ Pro True Competitor report keeps those subdomains included! This is a very important fact because when we pull keyword reports, there is options for:

And the number of keywords that come back WILL VARY, sometimes quite dramatically. Sites that have gobs of unknown subdomains may produce vastly more keywords than those in which you are actually interested, which brings up the fact that we need to put our original sites back into this list if they don’t exist, but with the subdomains for which we’re interested.

I didn’t use subdomains when I renamed the downloads from the True Competitor report, so that becomes our next mini-project.