SEO Audits – Large Scale Link Evaluation

SMS Text

When you perform as many SEO audits as I do (60 to 80 a year), you need to find ways to become consistently more efficient in your work without sacrificing quality.  This is also true when working on very big sites or sites with other forms of big data.

It’s too easy to get bogged down in repetitive analysis, or in dealing with hundreds of categories or millions of pages. And It’s just as easy, or even more so, when evaluating link profiles.

Google detected unnatural linksWith the advent of Google’s war on unnatural link profiles, the need to examine inbound link profiles has become more prevalent in my audit work.

While all audits I perform have always involved inbound link footprints, a site that’s been notified by Google that they have been identified as having bad links (and in turn has seen rankings drop ensue), requires a more comprehensive effort to that link evaluation process.


Soft Eyes – Rapidly Identifying Unnatural Link Patterns

Soft Eyes

So how do you do link evaluation in this scenario? What’s the best way to ensure you really have identified the bad links? Personally, I just use the same “soft eyes” approach I have always used in all my audit work (and referred to in an article I wrote in 2011 over on on how the approach helped me discover criminal activity during an audit).

Many years ago, while on a meditation and visualization retreat in the Santa Cruz mountains, attendees were taught the practice of “soft eyes”.  The technique is not unique to meditation – in fact, it’s used by military personnel, high performance athletes, race car drivers…   There are a wide range of uses and benefits of the technique.

The primary concept of this technique lies in the notion that as humans, we’re usually either focused on a lot of things, objects, thoughts, feelings at once, and by nature or upbringing, we tend to “miss” or otherwise “drown out” most of it as we go about out lives.  Driving on a highway, we fail to absorb the beauty of the surrounding landscape.  Sitting in a two hour meeting, we fail to observe the changing weather right outside the conference room window.

In worst case scenarios, this problem can be deadly. That highway driver fails to notice the driver next to them drifting into their lane of traffic, or the soldier scanning the horizon fails to notice the commando low-crawling up on their entrenched position.

Of course, those are extreme examples of how our power of observation usually works. Yet the same concept happens during the SEO audit process.  And can translate, during an inbound link review, into our feeling overwhelmed with all the data. Or becoming lost in the vortex of reviewing thousands, tens of thousands or hundreds of thousands of links…

Start With The Raw Data

The first thing I do during a link evaluation audit is to go to Open Site Explorer and perform a data export.  For sites with hundreds or thousands of links, I just go directly to the “Inbound Links’ tab, then export a CSV file.  For sites with tens of thousands, hundreds of thousands or millions of links, the export needs to come from the “Advanced Reports” functionality OSE provides, or that you can generate from BrightEdge or another source. (Although most rely on OSE data anyhow).

The key is that I want to get access to as many links as possible, across as many domains as possible.  Even this isn’t going to be enough if you’ve got millions of links, but it’s definitely the first big step to take. One of the audits currently on my plate has over 5 million links.  And though the data I was able to get hold of at this point is a limited portion, there are over 750,000 links in the CSV file.  That’s a pretty good start.

Relaxing Into Unnatural Pattern Identification

So how do I apply “soft eyes” techniques to link evaluation?  It’s a matter of taking several steps in a sequential process.

Thinning The Data

Note – when you have 750,000 links, your Excel program may blow up on you.  When that happens, you’ll need a more powerful program.  For this current project, I am using MS Access, because it can actually open the entire file at once, and then even crunch and sort the data where Excel would choke.

If you have exported “all” links, be sure to refine that down to only show external, followed or 301 links for this process. (Or just limit the export to those in the first place)

After refining down, the first thing I do is delete (or hide) all columns except URL, Title, Anchor Text and Target. While you may want to keep other columns, I find that by completely hiding other columns, I have that many fewer potential distractions visually.

Sorting Links To Scan For Patterns

Once I have thinned the data and hidden unneeded columns,  I sort alphabetically by Anchor Text, with a second sort factor on page Titles, and domains. Alternately you can switch it up and group by domain, then sort on the other columns, to give you another way to scan for patterns.  I usually run through a couple sort variations to be sure I didn’t miss anything obvious.

With my spreadsheet in front of me sorted by anchor text and domains, I can rapidly jump through the list to go right to those phrases associated with important keywords, or phrases you know the site was hammered on in rankings.

Then you begin scanning the domain and page Title columns for patterns.  And here’s an example of what can jump out at you: (NOTE – the site I used for this example is a site that offers a particular product type for sale within the automotive market)…

Unnatural Link Patterns by Domain and Title

The above links are a great example of how fast patterns can jump out at you when you’re just scanning in this sort mode.  Using “soft eyes”, you’re just gazing over the contents of the domain and Title columns, however it’s pretty obvious what’s going on with these links.  Note how the Titles are almost identical even though they’re coming from two different domains.

Unnatural Link Patterns in page Titles

Note how the domains are clearly odd in the similarity of their pattern, though on very quick glance they “may” be legitimate, if only low quality links that aren’t necessarily a severe problem.  Yet when you look at the Titles for these, the combination pretty much screams “link network”.

Page Titles Reveal Unnatural Link Patterns

These are a bit more deceptive, and though the sort is not what it was when I discovered them, (I copied these into a new set so I could make a screen-capture for this article), just by looking at the page Titles, it was obvious they’re bad links given that the phrase is an automotive product.  Sure you might have detected a couple of these just from looking at the domain, however it’s much more obvious in the Titles.

Domains from irrelevant countries show unnatural link patternsThe domains in the screen capture to the left are all common in that they’re not .com domains.  The site these links point to is a site that offers products to U.S. customers, and there’s very little justification to have a high volume of links from other countries for it.

Of course, the .edu links have “/blogs” in front of them, instantly making those suspicious as well…

Yet even if there is a possibility of legitimacy, by scanning these and flagging them for closer examination, either by page Title or clicking and viewing them, you can get clarity.

In the case of this site, the contents of the pages these links come from were completely obvious.

Linking Site Review

That’s an important factor – if you have any doubt on your initial scan based assessment, all you have to do is visit a site to know if it is legitimate or bogus.

Another benefit of checking actual pages links come from is seeing the actual sites helps you further improve your ability to spot the signs and signals that “this link is trash” from the domain and page Title perspective.

Pattern Identification Limits in SEO and Link Review

While using pattern identification methods is far from perfect, it will take you a long way in your SEO audit work.  And the more you remember to relax into the process, the more efficient your process will become.  It is not, however, any kind of magic bullet solution.  The reality is there are many other considerations, and spammers / myopic SEO implementers don’t all fit the “lazy and blatantly obvious” mold.

It is, however, a great way to more efficiently evaluate links.  And reduces the tedious process of examining links more closely.

Link to Root Ratio

Link to Root Ratio is yet one more technique you can employ to identify potential bad links, and it’s often one I use as an additional step in the process.  I do so because even if someone has done a good job at crafting page titles to fit, and the domain looks legit purely by it’s name, a site that sends too man links to one destination is something that requires further investigation.

The good news here is that if you see 500 domains sending links, and if 20 of those have a high link to root ratio, you don’t need to visit every link from that domain.  Just one or two is all it usually takes to reveal what’s going on.

Link Research Tool Limitations

Of course, none of this helps if you have more links than a link reporting tool will allow you to examine.  And for that we need to get companies like SEOmoz to provide a better solution for this ever increasingly important aspect of enterprise SEO.

For example, since OSE has, at the time of the writing of this article, 892 trillion links in its database, even if export limits need to be maintained to address server processing issues, I already know several enterprise clients who would be willing to pay an additional fee to have a feature where you can export multiple data sets – split out the entirety of the report into server-reasonable chunks. That way, a site owner can truly get a comprehensive exported result.

The Mozscape API gives power users access to a much bigger data set, though at the moment that’s $10,000 a month for access.  So I reached out to Rand Fishkin about this and he liked the concept I suggested – where a user could possibly get  access to the data on a one-time fee based basis, or another way to get at the entire set.

There are serious challenges to making this happen given how resource intense it would be, yet the Moz team will be looking into a way to come up with a solution in the coming months, possibly Q2 of 2013.  Given current high priority plans and scheduling, I truly appreciate that they’re even going to be able to work the possibility into the queue…


Share Your Techniques

I’d love for those of you reading this article who have experience using other techniques to share them in the comments – I’m sure there are many other ways to go about this otherwise overwhelming and tedious work…

And on a final note, I wish to thank @makeclevertools  and @Marie_Haynes for poking me on Twitter about the techniques I’ve described in this article…

Cat eyes photo courtesy Chiot’s Run 

Alan Bleiweiss
Alan Bleiweiss is a Forensic SEO audit consultant with audit client sites consisting of upwards of 50 million pages and tens of millions of visitors a month. A noted industry speaker, author and blogger, his posts are quite often as much controversial as they are thought provoking.
Alan Bleiweiss
Get the latest news from Search Engine Journal!
We value your privacy! See our policy here.
  • Broman

    Great article Alan. Many great ideas, thanks!
    Going through large amount of data is hard, no doubt.
    I haven’t tried working with Access yet, but i think i’ll try. Good idea.
    So far i’m “waiting” for Excel ūüôā

    • Alan Bleiweiss

      I still live in Excel most of the time, however I was tired of having it freeze up on me when I went to do some “simple” sort on a large volume of data ūüôā

  • Adam Humphreys

    I tend to grab reports from Ahrefs for more recent IBL *Inbound Links) since most events are not just algorithmic updates but what happens most recently to a site in the form of links as I’ve not found OSE to be nearly as good or as aggressive in the discovery areas. Majestic also tends to see quite a bit of link data as well. Like yourself analysis of these reports can be also really positive for clients. I did a competitive analysis of a client’s competitor and found wholesaling opportunities for them that lined up major business for them as a result. It’s important to know associations of established businesses especially when moving into a new area to recognize opportunities and that should be part of every business start and ongoing focus.

    • Alan Bleiweiss

      Good points Adam – after I wrote this article last week, in fact, one of the projects I’m working on required merging data from OSE, AHrefs and a 3rd channel, to get a more complete picture for evaluation…

  • adam

    Interesting approach to scanning large data sets. How many times will you scroll through the data to scope for patterns?

    Also, have you tried filtering out branded terms or general keywords to narrow the data set?

    Doing so may make it easier to spot those completely irrelevant patterns like having hair in the title for a link to an automotive web page.

    • Alan Bleiweiss

      I don’t have a specific consistent number of times, or sort methods – each audit is as much intuitive as anything, for me. And yes, sometimes I will further filter out to make it less taxing on the review cycle…

  • Dirk

    I’ve been itching to try out Google Refine for such a big dataset. Unfortunately haven’t come across a big enough case. Perhaps I’ll write up something on the topic… Great article Alan!

    • Alan Bleiweiss

      Google Refine? Why am I not surprised that I’ve not heard of this before? And how would you see using it on link evaluations?

      • Dirk

        It’s a tool for working with large datasets that helps you weed out or group together duplicates or similar entries. It’s obviously much more complicated than that., but I thought it could perhaps be helpful for large datasets. I still need to find a large dataset to get me to use it though, so can’t confirm yay or nay at this stage. The pattern identification is what triggered the thought for me though, as the introductory video explained this. Just Google it and check that video ūüôā

  • Adam

    I’ve had to do link audits on both mine and my client’s site. I’ve also been in touch with some experts on the issue, which has taught me a few more things to look at.

    This includes: IP of incoming links (links from sites on the same c-class or IP are probably spammy or devalued at the very least), links that point to other sites on the same IP or c-class (most link buyers use the same network for buying links to their sites).

    Also check the quality of links (are the sites indexed in Google), what type of sites do they link out to (use in Bing) and what’s their link velocity like? Links from sites with natural link velocity probably have more trust.

    • Alan Bleiweiss


      These are all great tips and tactics to use when looking for patterns – thanks for sharing them here!

  • Adam Humphreys

    One can easily isolate a lot by CCTLD and garbage domains like .info/.us which tend to be the spammers inexpensive domain of choice. The 2600 list of Google’s banned Kw’s to run through the url’s for exact match to find any blatant violations. WMT can also help you identify primary and then often the many related to that are easy to identify as it sits usually in pattern.

    • Alan Bleiweiss


      I’m curious about the Google 2600 as it relates to link evaluations. Have you found many sites that have links coming from sites with keywords from the list? I haven’t ever seen this as a pattern, and would think that since they’re banned from Google’s auto-suggest, spammers and link networks would avoid their presence as a result…

  • Alex Dumpfree

    A website owner must ensure that his or her site is performing at an optimal level and must providing the best possible experience for your visitors. make sure that all of site links active and ditect to visitor to come the right pages.

    • Alan Bleiweiss

      Thanks for adding those tips Alex – ensuring all links to a site are active and point to the correct page can be a very time consuming task, however it’s something site owners can consider doing from time to time to get maximum value…

  • Laura Martinez

    Wow I feel myself like a classical newbie so far – I only began from tracking external links to my website. And came across lots of tools of that kind –, and The latter is relatively new but I am still satisfied with it. Thank you for an article, by the way – really helpful!

  • Monty Elsabbagh

    Excellent post Alan. I haven’t used OSE to pull up backlinks yet since I am familiar with Majestic SEO. Have you used this? Any pros/cons compared to OSE?

    Also, after finding these ‘spammy” links, what are the steps you take along with the software used to reach out and try to get these links removed?

    • Alan Bleiweiss

      Monty, I personally haven’t used Majestic.

      Once all bad links are identified, a proper link clean-up process should take place. This process should include the following steps:
      1. A spreadsheet listing all links should be maintained during the clean-up process.
      2. An attempt should be made to contact all site owners requesting they remove the link(s) that exist(s) to your site from theirs.
      3. The date and contact method used should be recorded in the spreadsheet.
      4. If no response comes within two weeks of contact attempt, this should be noted in the spreadsheet.
      5. Once all links that need to be removed have been processed this way, any remaining links still in existence should be marked as such.
      6. At this point, a list of all domains in the list that have at least one bad link pointing to your site should be submitted to Google through their Disavow tool. Use Google’s Domain wildcard method to flag all domains so that Google knows not to count any links from that domain in the future. This will allow you to not have to ensure you’ve listed every link from each site.
      Note that under some circumstances, you may not want to submit the disavow list. This really depends on your situation. Obviously if you got a manual penalty, it’s part of the process. If you didn’t, and if you didn’t get all the links, it may (anecdotal evidence) trigger a manual review that leads to a penalty. So just be aware of that.