When you perform as many SEO audits as I do (60 to 80 a year), you need to find ways to become consistently more efficient in your work without sacrificing quality. This is also true when working on very big sites or sites with other forms of big data.
It’s too easy to get bogged down in repetitive analysis, or in dealing with hundreds of categories or millions of pages. And It’s just as easy, or even more so, when evaluating link profiles.
With the advent of Google’s war on unnatural link profiles, the need to examine inbound link profiles has become more prevalent in my audit work.
While all audits I perform have always involved inbound link footprints, a site that’s been notified by Google that they have been identified as having bad links (and in turn has seen rankings drop ensue), requires a more comprehensive effort to that link evaluation process.
Soft Eyes – Rapidly Identifying Unnatural Link Patterns
So how do you do link evaluation in this scenario? What’s the best way to ensure you really have identified the bad links? Personally, I just use the same “soft eyes” approach I have always used in all my audit work (and referred to in an article I wrote in 2011 over on SearchEnginePeople.com on how the approach helped me discover criminal activity during an audit).
Many years ago, while on a meditation and visualization retreat in the Santa Cruz mountains, attendees were taught the practice of “soft eyes”. The technique is not unique to meditation – in fact, it’s used by military personnel, high performance athletes, race car drivers… There are a wide range of uses and benefits of the technique.
The primary concept of this technique lies in the notion that as humans, we’re usually either focused on a lot of things, objects, thoughts, feelings at once, and by nature or upbringing, we tend to “miss” or otherwise “drown out” most of it as we go about out lives. Driving on a highway, we fail to absorb the beauty of the surrounding landscape. Sitting in a two hour meeting, we fail to observe the changing weather right outside the conference room window.
In worst case scenarios, this problem can be deadly. That highway driver fails to notice the driver next to them drifting into their lane of traffic, or the soldier scanning the horizon fails to notice the commando low-crawling up on their entrenched position.
Of course, those are extreme examples of how our power of observation usually works. Yet the same concept happens during the SEO audit process. And can translate, during an inbound link review, into our feeling overwhelmed with all the data. Or becoming lost in the vortex of reviewing thousands, tens of thousands or hundreds of thousands of links…
Start With The Raw Data
The first thing I do during a link evaluation audit is to go to Open Site Explorer and perform a data export. For sites with hundreds or thousands of links, I just go directly to the “Inbound Links’ tab, then export a CSV file. For sites with tens of thousands, hundreds of thousands or millions of links, the export needs to come from the “Advanced Reports” functionality OSE provides, or that you can generate from BrightEdge or another source. (Although most rely on OSE data anyhow).
The key is that I want to get access to as many links as possible, across as many domains as possible. Even this isn’t going to be enough if you’ve got millions of links, but it’s definitely the first big step to take. One of the audits currently on my plate has over 5 million links. And though the data I was able to get hold of at this point is a limited portion, there are over 750,000 links in the CSV file. That’s a pretty good start.
Relaxing Into Unnatural Pattern Identification
So how do I apply “soft eyes” techniques to link evaluation? It’s a matter of taking several steps in a sequential process.
Thinning The Data
Note – when you have 750,000 links, your Excel program may blow up on you. When that happens, you’ll need a more powerful program. For this current project, I am using MS Access, because it can actually open the entire file at once, and then even crunch and sort the data where Excel would choke.
If you have exported “all” links, be sure to refine that down to only show external, followed or 301 links for this process. (Or just limit the export to those in the first place)
After refining down, the first thing I do is delete (or hide) all columns except URL, Title, Anchor Text and Target. While you may want to keep other columns, I find that by completely hiding other columns, I have that many fewer potential distractions visually.
Sorting Links To Scan For Patterns
Once I have thinned the data and hidden unneeded columns, I sort alphabetically by Anchor Text, with a second sort factor on page Titles, and domains. Alternately you can switch it up and group by domain, then sort on the other columns, to give you another way to scan for patterns. I usually run through a couple sort variations to be sure I didn’t miss anything obvious.
With my spreadsheet in front of me sorted by anchor text and domains, I can rapidly jump through the list to go right to those phrases associated with important keywords, or phrases you know the site was hammered on in rankings.
Then you begin scanning the domain and page Title columns for patterns. And here’s an example of what can jump out at you: (NOTE – the site I used for this example is a site that offers a particular product type for sale within the automotive market)…
The above links are a great example of how fast patterns can jump out at you when you’re just scanning in this sort mode. Using “soft eyes”, you’re just gazing over the contents of the domain and Title columns, however it’s pretty obvious what’s going on with these links. Note how the Titles are almost identical even though they’re coming from two different domains.
Note how the domains are clearly odd in the similarity of their pattern, though on very quick glance they “may” be legitimate, if only low quality links that aren’t necessarily a severe problem. Yet when you look at the Titles for these, the combination pretty much screams “link network”.
These are a bit more deceptive, and though the sort is not what it was when I discovered them, (I copied these into a new set so I could make a screen-capture for this article), just by looking at the page Titles, it was obvious they’re bad links given that the phrase is an automotive product. Sure you might have detected a couple of these just from looking at the domain, however it’s much more obvious in the Titles.
The domains in the screen capture to the left are all common in that they’re not .com domains. The site these links point to is a site that offers products to U.S. customers, and there’s very little justification to have a high volume of links from other countries for it.
Of course, the .edu links have “/blogs” in front of them, instantly making those suspicious as well…
Yet even if there is a possibility of legitimacy, by scanning these and flagging them for closer examination, either by page Title or clicking and viewing them, you can get clarity.
In the case of this site, the contents of the pages these links come from were completely obvious.
Linking Site Review
That’s an important factor – if you have any doubt on your initial scan based assessment, all you have to do is visit a site to know if it is legitimate or bogus.
Another benefit of checking actual pages links come from is seeing the actual sites helps you further improve your ability to spot the signs and signals that “this link is trash” from the domain and page Title perspective.
Pattern Identification Limits in SEO and Link Review
While using pattern identification methods is far from perfect, it will take you a long way in your SEO audit work. And the more you remember to relax into the process, the more efficient your process will become. It is not, however, any kind of magic bullet solution. The reality is there are many other considerations, and spammers / myopic SEO implementers don’t all fit the “lazy and blatantly obvious” mold.
It is, however, a great way to more efficiently evaluate links. And reduces the tedious process of examining links more closely.
Link to Root Ratio
Link to Root Ratio is yet one more technique you can employ to identify potential bad links, and it’s often one I use as an additional step in the process. I do so because even if someone has done a good job at crafting page titles to fit, and the domain looks legit purely by it’s name, a site that sends too man links to one destination is something that requires further investigation.
The good news here is that if you see 500 domains sending links, and if 20 of those have a high link to root ratio, you don’t need to visit every link from that domain. Just one or two is all it usually takes to reveal what’s going on.
Link Research Tool Limitations
Of course, none of this helps if you have more links than a link reporting tool will allow you to examine. And for that we need to get companies like SEOmoz to provide a better solution for this ever increasingly important aspect of enterprise SEO.
For example, since OSE has, at the time of the writing of this article, 892 trillion links in its database, even if export limits need to be maintained to address server processing issues, I already know several enterprise clients who would be willing to pay an additional fee to have a feature where you can export multiple data sets – split out the entirety of the report into server-reasonable chunks. That way, a site owner can truly get a comprehensive exported result.
The Mozscape API gives power users access to a much bigger data set, though at the moment that’s $10,000 a month for access. So I reached out to Rand Fishkin about this and he liked the concept I suggested – where a user could possibly get access to the data on a one-time fee based basis, or another way to get at the entire set.
There are serious challenges to making this happen given how resource intense it would be, yet the Moz team will be looking into a way to come up with a solution in the coming months, possibly Q2 of 2013. Given current high priority plans and scheduling, I truly appreciate that they’re even going to be able to work the possibility into the queue…
Share Your Techniques
I’d love for those of you reading this article who have experience using other techniques to share them in the comments – I’m sure there are many other ways to go about this otherwise overwhelming and tedious work…
And on a final note, I wish to thank @makeclevertools and @Marie_Haynes for poking me on Twitter about the techniques I’ve described in this article…
Cat eyes photo courtesy Chiot’s Run