SEO

How to Know Which Links to Disavow in Google

In the aftermath of Google’s Disavow Links Tool announcement, SEOs and webmasters across the industry are screaming one question, loud and clear: How do I know which links to disavow? 

What follows is a step-by-step, tactical walkthrough of exactly how to perform a link profile audit, and how to figure out which links should be removed and/or disavowed.

What you’ll need:

  • Scrapebox (A tool every SEO must have in their arsenal)
  • Proxies for Scrapebox (optional, recommended. I recommend going for the “Bonanza” package from the “Exclusive Proxies” section.)
  • Microsoft Excel

Find Your Anchor Text Ratio

To get started, we need to analyze the most important signal that Google’s Penguin algorithm looks for: over-optimization of anchor text.

Step 1: Get a list of your website’s inbound links and put the list in your Excel spreadsheet. You can get this information from the following sources:

For the most complete information, try to combine data from all four sources. However, I recommend just using the data from your Google Webmaster Tools account. It’s free, and usually about as thorough as you’ll get from the other sources. Plus, it’s straight from Google. For this walkthrough, we’ll assume you’re using the list from your Webmaster Tools account.

Note: To get a list of your inbound links from Google Webmaster Tools, follow the steps below:

  1. Login to Google Webmaster Tools
  2. Click your Website
  3. Click “Traffic” on the left navigation
  4. Click “Links to your site”
  5. Click “Who links the most”
  6. Click “Download latest links”

Step 2: Run your list of links through Scrapebox to get the anchor text of each link. For a detailed walkthrough of how to set up Scrapebox, load proxies, etc., please see my post on how to use Scrapebox to find guest blogging opportunities. Depending on how long your list of links is, and how many proxies you’re using, this step could take a long time.

For lists of links that are 1,000 or less, it shouldn’t take more than 10 minutes. But several nights ago, I ran a report on a list of links that was over 43,000, and I had to let Scrapebox run over night in order to complete.

Step 3: Export the report to Excel on your desktop. You may need to open and re-save the file after you export it, because for some reason it often corrupts immediately after export. Opening and re-saving the spreadsheet should fix it.

Step 4: Within your spreadsheet, sort your columns as such:

  • Column A: Source URL
  • Column B: Destination URL
  • Column C: Anchor Text
  • Column D: Found?

Step 5: Sort column D by alphabetical order and remove all rows in which column D’s value is anything other than “Found.” You’ll likely see lots of “Not Found,” “Error 404″ and such from the Scrapebox output, which should be removed.

Step 6: Delete Column D (it’s no longer necessary).

Step 7: Add a new Column D with header “Number of Anchor Occurrences.”

Step 8: In cell D2, enter the following formula: =COUNTIF($C$2:$C$6633,C2).

Note: Change “6633” in the above formula to whatever the number of the last row of your data set is.

Step 9: Apply this formula to all rows in column D by clicking in cell D2 and then clicking the box in the lower-right of the cell, and dragging it down the entire length of Column D. You’ll now have a list of the number of occurrences of each anchor text in the spreadsheet.

Step 10: Open a new tab (or worksheet) within your spreadsheet and paste in the data from Columns C and D.

Step 11: That data will still contain the formulas in the cells, so we need to remove that. To do so, copy/paste the data from columns C and D into notepad. Then, re-copy and paste it back into your new worksheet. The values for “Number of anchor occurrences” will now be absolute values rather than formulas.

Step 12: Now, it’s time to remove duplicates. Remove duplicates by highlighting your two columns, then going to the “Data” tab in Excel and clicking “Remove Duplicates.” In the ensuing popup box, make sure both columns are checked and then click OK.

Step 13: Add a new column C with header “Percent of Total.”

Step 14: Sort by Column B (“Number of anchor occurrences”) from largest to smallest.

Step 15: Scroll down to the last row containing data, and in column B, in the cell directly below the cell containing the last piece of data, enter the following formula: =SUM(B2:B6633).

This will result in the total number of links.

Note: Change “6633” in the above formula to whatever the number of the last row of your data set is.

Step 16: In Column C (“Percent of Total”), click in cell C2 and type the following formula: =B2/$B$422.

Note: Change “422” in the above formula to the number of the row that contains the total number of links, which you created in step 15.

Step 17: Change the format of the values in Column C to “Percentage” with two decimal points. You can do this by highlighting the column, right-clicking, and selecting “Format Cells” then changing the “Category” setting to “Percentage.”

Step 18: Apply this formula to all rows in column C. You should now have a list of percentages of anchor text as a ratio of the entire link profile.

Step 19: Highlight in red any rows in which the anchor text exceeds 2 percent of the overall link profile, EXCEPT the following anchor types:

  • Brand anchors
  • Naked URLs
  • Images (i.e. no anchor text)

The remaining highlighted anchor text is the anchor text for which your inbound link profile is over-optimized.

If you’ve made it this far and found no over-optimized anchor text in your inbound link profile, congratulations! You’re probably not a target of Google Penguin. If you did find over-optimized anchor text, read on.

Analyze Your Referring Domains

Next, it’s time to get a list of referring domains, and gather some metrics on each one so we can determine whether we have any domains that need to be completely disavowed.

Step 20: Copy/paste your list of links into a Notepad file.

Step 21: Load that file into Scrapebox using the “Import URL list” button.

Step 22: Click “Trim to Root”

Step 23: Click “Remove/Filter” then click “Remove Duplicate Domains.”

Step 24: Click “Check PageRank” and “Get Domain PageRank” to get the domain PR of each domain.

Step 25: Export the list of domains using the “Import/Export URLs & PR” button.

Step 26: Copy/paste the output from your newly exported file back into your Excel spreadsheet and sort by PR from largest to smallest.

Find Out Which Links and Domains Need to Be Disavowed or Removed

Now, it’s time to figure out which links and domains need to be removed or disavowed.

Step 27: Refer to your list of anchor text percentages. Find the first highlighted anchor (from Step 19) and note what the anchor is.

Step 28: Return to your Scrapebox output with the column that includes anchor text, and sort by anchor text, in alphabetical order.

Step 29: Scroll down the list of anchors until you find the first occurrence of the anchor you noted in step 27.

Step 30: Copy/paste all link URLs containing that anchor into a new worksheet titled “links to disavow.”

Step 31: Repeat steps 27-30 for all anchor texts highlighted in red from Step 19.

Step 32: Refer again to your list of anchor text percentages. Go through each anchor and eyeball any anchors that are completely unrelated to the niche or maliciously and obviously spam (for example, porn, gambling, or viagra-related anchors). Add all links containing these anchors to your “links to disavow” worksheet in addition to a new, separate list.

Step 33: Load your list of links from the “links to disavow” worksheet into Scrapebox and get the domain PageRank of each link.

Step 34: Copy/paste the output from your newly exported file back into your Excel spreadsheet and sort by PR from largest to smallest.

Step 35: Highlight all links with a PR of 4 or below, and all links with malicious or completely unrelated anchor text.

Step 36: Add the highlighted links to your “links to disavow” list. Now, it’s time to figure out which domains to completely disavow.

Step 37: Copy/paste your list of links from Step 33 (your “links to disavow” spreadsheet) into a Notepad file.

Step 38: Load that Notepad file into Scrapebox and repeat steps 20-26.

Step 39: Add all domains with PR 2 or below to your disavow list.

Step 40: Eyeball the remaining domains and highlight any that don’t end in the following extensions (unless you’re sure you don’t want to remove them):

  • .com
  • .net
  • .org
Step 41: Add the highlighted domains to your “links to disavow” list.

You should now have a list that contains the following:

  • A list of links that contain anchor text for which your inbound link profile is over-optimized, which reside on a domain that’s PR 4 or less
  • A list of links that contain spammy, malicious, or completely unrelated anchor text
  • A list of domains that contain links to your website with over-optimized anchor text and are also PR 2 or less
  • A list of domains with domain extensions that are not .com, .net or .org

To disavow an entire domain, use the following format:

domain:spamdomain1.com
domain:spamdomain2.com
domain:spamdomain3.com

To disavow individual links from a domain, use the following format:

http://spamdomain4.com/contentA.html

http://spamdomain5.com/contentB.html

http://spamdomain6.com/contentC.html

Your disavow list should look like this:

domain:spamdomain1.com
domain:spamdomain2.com
domain:spamdomain3.com
http://spamdomain4.com/contentA.html

http://spamdomain5.com/contentB.html

http://spamdomain6.com/contentC.html

Step 42: When you’re ready to submit your list of links to disavow, follow Google’s official instructions on how to do so.

Closing Thoughts

  • If you have access to the SEOMoz API, feel free to substitute domain authority (DA) as your metric rather than PageRank. This is a more accurate metric to use, but it’s expensive to use it in bulk. In step 35, substitute PR 4 with DA 40 or below. In Step 39, substitute PR 2 with DA 30 or below.
  • Why did I choose 2 percent as the threshold for over-optimization? I’ve done at least 50 inbound link profile audits, and in my experience, the sweet spot appears to be about 2 percent.  The 2 percent figure is purely based on my hands-on experience in the field working with real clients who were penalized by Google Penguin.
  • How did I come up with the specific PR and DA thresholds for disavowal? Again, this is based purely on my experience in the field. There’s no textbook that’ll tell you the “right” number(s) or even metrics to use.

I hope you find this guide handy in figuring out exactly which links to disavow and/or remove. If you want to audit your inbound link profile and get a list of links & domains to disavow but you don’t want to follow these steps, there are link removal & disavowal services that might be what you need.

Did you try it? Did it work? Do the steps make sense? Did I miss anything? Leave a message in the comments and let us know!

 How to Know Which Links to Disavow in Google
Jayson DeMers is the founder & CEO of AudienceBloom, a Seattle-based content marketing & social media agency. You can contact him on LinkedIn, Google+, or Twitter.
 How to Know Which Links to Disavow in Google

You Might Also Like

Comments are closed.

25 thoughts on “How to Know Which Links to Disavow in Google

  1. Sorry but following this could actually hurt your rankings.

    This is a big, fancy, and totally non-required way to take along time to delete your less than PR 2 links and “what looks spammy to you” sites.

    Cemper link research tools does most this for you already. and with a few tweaks (that is way easier than this) can be perfect.

    You need a professional SEO to tell you what links are ACTUALLY hurting your site and ONLY disavow those, or you will effectively negative SEO yourself.

    Anyone who wants to know how to disavow their links just contact me – I would be happy to help.

    1. Doing anything related to your website, in any capacity, could hurt your rankings. This process is designed to help folks whose rankings are already tanked due to a Penguin penalty.

  2. Has anyone given any thought to how Google might use this disavow information to slap the sites you decided you don’t want links from through no fault of theirs?

  3. It’s important that you look at the whole picture, not just the anchor text. Yes, abusing exact match anchor text can get you in hot water with Penguin but you don’t want to remove great links because you used that anchor text too many times. I would suggest being very light handed with the disavow tool and only remove links you are 100% positive aren’t worth it.

  4. Hi,
    It strikes me that if you remove links based on pagerank then you are skewing your overall link profile. That in itself is not such a worry depending on what the link profile is of other sites in your vertical.

    If I delete a lot of low quality sites based purely on pagerank, then I will end up with a less natural overall profile. If that is out of kilter with other similar sites then I am going to be flagged.

    Obviously, if you have been hit by Penguin then it is possible that you will have to do some link trimming, but it should be as a last resort imho. What is trimmed should be based on Google guidelines rather than arbitrary pagerank metrics. I don’t believe there is any substitute for checking links individually (or by site if there is a lot of spun content), even if there are thousands. Low pagerank does not equate to low quality all the time. The site may be new, may not have a lot of inbound links or simply be in a niche vertical that doesn’t have a lot of cross referencing sites.

    Nor should people be rushing to use the disavow links tool. Did you receive a “unnatural links” warning? If not then you may be barking up the wrong tree.

    If you make a mistake it’s going to take months to see your links reinstated using the tool. While link quantity is not the be all and end all, stripping hard earned links that are not doing damage makes no sense.

    I do worry about how the information from the disavow links tool will be used. In theory it could be great for pointing out spamming sites, but if people use pagerank as a metric for disavowing, then it could potentially lead to damage for young sites, or even just sites that never did any SEO.

    That’s my tuppence worth.

    Ian.

  5. Jayson, I thought you did a great job on this article, especially how detailed your Excel steps were. I have a client right now who was spinning low quality content on a bunch of low quality blogs that they owned. They finally got hit with the Penguin update and are now in frantic mode. Thanks for sharing.

  6. I tired using that scrape box thing but to no avail. I imported my links from webmasters tools but I do not see away to get scrapebox to get me the keywords that are being used for the anchor text.

    Can you enlighten me on this please.

  7. I thought this was very insightful especially if you need to do this analysis and disavow list for many clients.
    Just a few quick excel notes to speed this up:

    1. you can remove formulas by right clicking and doing a ‘paste values’ rather then going through notepad.
    2. Also in order to find trim URL domains, you can use a formula like: =LEFT(B2,FIND(“/”,B2,8)), all that does is bring back the text after the 8th character and before the “/” within the cell. No need to run the whole list through scrapebox.
    3. Steps 27-31 would be much quicker done using a pivot table which automatically matches the anchors to the URL’s and also removes duplicate anchors within the pivot. Furthermore you can add your url’s underneath anchor automatically using a pivot table, i’d be happy to share and paste a screenshot- this would speed things up and tidy up your sheet dramatically.
    But all in all, nice post.

  8. @ Barksmart. These days you need to use the Page analyzer. The old backlinks checker used to allow you to then interrogate each page for the outbound (and site internal) links – this feature no longer works. The get around is to use the page analyzer when checking backlinks. It’s not quite as simple as the old checker – but it is a lot more powerful. I found a blog post from a couple of weeks ago showing exactly how to use it (with an “over the shoulder video” here) It doesn’t create spam – it doesn’t call Google, it seems completely ethical – visiting a site and just checking whether the underlying HTML has your link in it – reports if it does – then leaves. So glad I found this – I don’t know what other method you could use to check in bulk. I’ve checked lists of many thousands of URL’s in a few minutes no need for proxies or anything fancy like that as it only visits each site once and does nothing other than “look” when it gets there. Heres a link to the video page :)
    http://www.demondemon.com/2013/05/14/check-if-urls-contain-backlinks-to-your-website/

  9. You can also just use Link Detective to see anchor text for each inbound link… for those who can’t get Scrapebox to do the job! To do this you need to upload a csv from open site explorer of all links from only external pages to the root domain. Select these options, filter and then export to csv. Then you can start analysing… Simples :)

  10. A couple of things.
    1)In step 11 you dont need to post into notepad first, you can use Paste Special in Excel and Choose Values, and it will paste the data without the formulas

    2)From what i have read elsewhere if google is penalizing you for an unnatural link profile, you may want to file a reconsideration request. If you are going to do that google supposedly wants to see that you have made an effort to have the links removed manually BEFORE you just disavow individual links or whole domains. We have created a google doc spreadsheet with all the links and documented our efforts to contact the linking domains, either through the email source code or a screenshot of our message on their contact page. Once we have tried to contact them twice with no reply, THEN we will compile which links have been removed, sort them and just disavow the domains. I can see no point in just disavowing a single link from a domain, when you may not know how or why the link to your page is there or if another may reappear, better to just disavow the domains entirely. Ill update on how our ranking progresses once we make our reconsideration request and disavow.

  11. Are disavowed links only considered internally by Google in evaluating the link profile or any signs to know if they did actually consider our request. Just to track progress apart from the ultimate signs of penalty removal which definitely takes time . Understandable that the links still keep showing in the Webmaster Tools “Links to your site” cause they actually still exist.

  12. Why can I not see the comments??? it says there are 24 but i cant see any, it would be really helpful to see other peoples success or failure with this method of removing links

  13. Actually, having spent all afternoon on Scrapebox it looks like this is impossible. GWT exports a list of your inbound links, i.e. URLs that contain a link (or more) to your site. There is no way of having Scrapebox crawl those URLs to find the link(s) on the page that point only to your site. Or if there is, the instructions for doing so aren’t anywhere online, or included in the article you linked to above. Or am I missing something?