How to Know Which Links to Disavow in Google

SMS Text

In the aftermath of Google’s Disavow Links Tool announcement, SEOs and webmasters across the industry are screaming one question, loud and clear: How do I know which links to disavow? 

What follows is a step-by-step, tactical walkthrough of exactly how to perform a link profile audit, and how to figure out which links should be removed and/or disavowed.

What you’ll need:

  • Scrapebox (A tool every SEO must have in their arsenal)
  • Proxies for Scrapebox (optional, recommended. I recommend going for the “Bonanza” package from the “Exclusive Proxies” section.)
  • Microsoft Excel

Find Your Anchor Text Ratio

To get started, we need to analyze the most important signal that Google’s Penguin algorithm looks for: over-optimization of anchor text.

Step 1: Get a list of your website’s inbound links and put the list in your Excel spreadsheet. You can get this information from the following sources:

For the most complete information, try to combine data from all four sources. However, I recommend just using the data from your Google Webmaster Tools account. It’s free, and usually about as thorough as you’ll get from the other sources. Plus, it’s straight from Google. For this walkthrough, we’ll assume you’re using the list from your Webmaster Tools account.

Note: To get a list of your inbound links from Google Webmaster Tools, follow the steps below:

  1. Login to Google Webmaster Tools
  2. Click your Website
  3. Click “Traffic” on the left navigation
  4. Click “Links to your site”
  5. Click “Who links the most”
  6. Click “Download latest links”

Step 2: Run your list of links through Scrapebox to get the anchor text of each link. For a detailed walkthrough of how to set up Scrapebox, load proxies, etc., please see my post on how to use Scrapebox to find guest blogging opportunities. Depending on how long your list of links is, and how many proxies you’re using, this step could take a long time.

For lists of links that are 1,000 or less, it shouldn’t take more than 10 minutes. But several nights ago, I ran a report on a list of links that was over 43,000, and I had to let Scrapebox run over night in order to complete.

Step 3: Export the report to Excel on your desktop. You may need to open and re-save the file after you export it, because for some reason it often corrupts immediately after export. Opening and re-saving the spreadsheet should fix it.

Step 4: Within your spreadsheet, sort your columns as such:

  • Column A: Source URL
  • Column B: Destination URL
  • Column C: Anchor Text
  • Column D: Found?

Step 5: Sort column D by alphabetical order and remove all rows in which column D’s value is anything other than “Found.” You’ll likely see lots of “Not Found,” “Error 404” and such from the Scrapebox output, which should be removed.

Step 6: Delete Column D (it’s no longer necessary).

Step 7: Add a new Column D with header “Number of Anchor Occurrences.”

Step 8: In cell D2, enter the following formula: =COUNTIF($C$2:$C$6633,C2).

Note: Change “6633” in the above formula to whatever the number of the last row of your data set is.

Step 9: Apply this formula to all rows in column D by clicking in cell D2 and then clicking the box in the lower-right of the cell, and dragging it down the entire length of Column D. You’ll now have a list of the number of occurrences of each anchor text in the spreadsheet.

Step 10: Open a new tab (or worksheet) within your spreadsheet and paste in the data from Columns C and D.

Step 11: That data will still contain the formulas in the cells, so we need to remove that. To do so, copy/paste the data from columns C and D into notepad. Then, re-copy and paste it back into your new worksheet. The values for “Number of anchor occurrences” will now be absolute values rather than formulas.

Step 12: Now, it’s time to remove duplicates. Remove duplicates by highlighting your two columns, then going to the “Data” tab in Excel and clicking “Remove Duplicates.” In the ensuing popup box, make sure both columns are checked and then click OK.

Step 13: Add a new column C with header “Percent of Total.”

Step 14: Sort by Column B (“Number of anchor occurrences”) from largest to smallest.

Step 15: Scroll down to the last row containing data, and in column B, in the cell directly below the cell containing the last piece of data, enter the following formula: =SUM(B2:B6633).

This will result in the total number of links.

Note: Change “6633” in the above formula to whatever the number of the last row of your data set is.

Step 16: In Column C (“Percent of Total”), click in cell C2 and type the following formula: =B2/$B$422.

Note: Change “422” in the above formula to the number of the row that contains the total number of links, which you created in step 15.

Step 17: Change the format of the values in Column C to “Percentage” with two decimal points. You can do this by highlighting the column, right-clicking, and selecting “Format Cells” then changing the “Category” setting to “Percentage.”

Step 18: Apply this formula to all rows in column C. You should now have a list of percentages of anchor text as a ratio of the entire link profile.

Step 19: Highlight in red any rows in which the anchor text exceeds 2 percent of the overall link profile, EXCEPT the following anchor types:

  • Brand anchors
  • Naked URLs
  • Images (i.e. no anchor text)

The remaining highlighted anchor text is the anchor text for which your inbound link profile is over-optimized.

If you’ve made it this far and found no over-optimized anchor text in your inbound link profile, congratulations! You’re probably not a target of Google Penguin. If you did find over-optimized anchor text, read on.

Analyze Your Referring Domains

Next, it’s time to get a list of referring domains, and gather some metrics on each one so we can determine whether we have any domains that need to be completely disavowed.

Step 20: Copy/paste your list of links into a Notepad file.

Step 21: Load that file into Scrapebox using the “Import URL list” button.

Step 22: Click “Trim to Root”

Step 23: Click “Remove/Filter” then click “Remove Duplicate Domains.”

Step 24: Click “Check PageRank” and “Get Domain PageRank” to get the domain PR of each domain.

Step 25: Export the list of domains using the “Import/Export URLs & PR” button.

Step 26: Copy/paste the output from your newly exported file back into your Excel spreadsheet and sort by PR from largest to smallest.

Find Out Which Links and Domains Need to Be Disavowed or Removed

Now, it’s time to figure out which links and domains need to be removed or disavowed.

Step 27: Refer to your list of anchor text percentages. Find the first highlighted anchor (from Step 19) and note what the anchor is.

Step 28: Return to your Scrapebox output with the column that includes anchor text, and sort by anchor text, in alphabetical order.

Step 29: Scroll down the list of anchors until you find the first occurrence of the anchor you noted in step 27.

Step 30: Copy/paste all link URLs containing that anchor into a new worksheet titled “links to disavow.”

Step 31: Repeat steps 27-30 for all anchor texts highlighted in red from Step 19.

Step 32: Refer again to your list of anchor text percentages. Go through each anchor and eyeball any anchors that are completely unrelated to the niche or maliciously and obviously spam (for example, porn, gambling, or viagra-related anchors). Add all links containing these anchors to your “links to disavow” worksheet in addition to a new, separate list.

Step 33: Load your list of links from the “links to disavow” worksheet into Scrapebox and get the domain PageRank of each link.

Step 34: Copy/paste the output from your newly exported file back into your Excel spreadsheet and sort by PR from largest to smallest.

Step 35: Highlight all links with a PR of 4 or below, and all links with malicious or completely unrelated anchor text.

Step 36: Add the highlighted links to your “links to disavow” list. Now, it’s time to figure out which domains to completely disavow.

Step 37: Copy/paste your list of links from Step 33 (your “links to disavow” spreadsheet) into a Notepad file.

Step 38: Load that Notepad file into Scrapebox and repeat steps 20-26.

Step 39: Add all domains with PR 2 or below to your disavow list.

Step 40: Eyeball the remaining domains and highlight any that don’t end in the following extensions (unless you’re sure you don’t want to remove them):

  • .com
  • .net
  • .org
Step 41: Add the highlighted domains to your “links to disavow” list.

You should now have a list that contains the following:

  • A list of links that contain anchor text for which your inbound link profile is over-optimized, which reside on a domain that’s PR 4 or less
  • A list of links that contain spammy, malicious, or completely unrelated anchor text
  • A list of domains that contain links to your website with over-optimized anchor text and are also PR 2 or less
  • A list of domains with domain extensions that are not .com, .net or .org

To disavow an entire domain, use the following format:

domain:spamdomain1.com
domain:spamdomain2.com
domain:spamdomain3.com

To disavow individual links from a domain, use the following format:

http://spamdomain4.com/contentA.html
http://spamdomain5.com/contentB.html
http://spamdomain6.com/contentC.html

Your disavow list should look like this:

domain:spamdomain1.com
domain:spamdomain2.com
domain:spamdomain3.com
http://spamdomain4.com/contentA.html
http://spamdomain5.com/contentB.html
http://spamdomain6.com/contentC.html

Step 42: When you’re ready to submit your list of links to disavow, follow Google’s official instructions on how to do so.

Closing Thoughts

  • If you have access to the SEOMoz API, feel free to substitute domain authority (DA) as your metric rather than PageRank. This is a more accurate metric to use, but it’s expensive to use it in bulk. In step 35, substitute PR 4 with DA 40 or below. In Step 39, substitute PR 2 with DA 30 or below.
  • Why did I choose 2 percent as the threshold for over-optimization? I’ve done at least 50 inbound link profile audits, and in my experience, the sweet spot appears to be about 2 percent.  The 2 percent figure is purely based on my hands-on experience in the field working with real clients who were penalized by Google Penguin.
  • How did I come up with the specific PR and DA thresholds for disavowal? Again, this is based purely on my experience in the field. There’s no textbook that’ll tell you the “right” number(s) or even metrics to use.

I hope you find this guide handy in figuring out exactly which links to disavow and/or remove. If you want to audit your inbound link profile and get a list of links & domains to disavow but you don’t want to follow these steps, there are link removal & disavowal services that might be what you need.

Did you try it? Did it work? Do the steps make sense? Did I miss anything? Leave a message in the comments and let us know!

Jayson DeMers
Jayson DeMers is the founder & CEO of AudienceBloom, a Seattle-based content marketing & social media agency. You can contact him on LinkedIn, Google+, or... Read Full Bio
Jayson DeMers
Subscribe to SEJ!
Get our weekly newsletter from SEJ's Founder Loren Baker about the latest news in the industry!