Ninja Link Prospecting – An Afternoon of Epic Geekery

SMS Text

I’m not a programmer. I can hack things together, and I’m great at copy / paste, but creating code from scratch that isn’t massively buggy? Not really one of my talents. As an SEO, however, that isn’t really important. Will Critchlow recently said that “Being able to make your own tools is going to become an increasingly important skill of the modern day SEO”. That’s a sentiment I completely agree with, so to demonstrate how this can help I set myself a challenge:

Find, qualify, and categorise a few thousand link prospects in an afternoon. Without spending a penny.

My chosen area for this is pets, and in particular dogs. As co-owner of Pet365, everything needs to be done as quickly and efficiently (ie. cheaply) as possible. So I started by searching for “top lists of dog blogs”. Copy and pasting the URL of each of these lists into a text document (without even checking their contents) gave me 20 sites that I thought would have some great link prospects. There were more out there but time was of the essence.

Next up, Technorati. There’s a huge section dedicated to pet blogs on there which expands to over 100 pages. I treated each of these as a list of potential targets and, with some quick excel wizardry, had a collection of Technorati URLs that I knew would give me some more link targets. The great thing about this is that if a blogger has gone to the effort of getting listed then chances are that they’re reasonably serious about their site, and therefore more likely to be active and producing good content.

Finally, Blog rolls. I’ll come clean here – I already had some insider knowledge and knew where I was most likely to be able to find blogs that had a list of other potentially relevant sites. This saved me some time but, to be honest, I could have easily skipped this step and it wouldn’t really have affected the outcome. Within the pet blogosphere there are networks like DogTime and BlogPaws – chances are that, whatever your niche, something similar will exist.

Less than an hour in and I had 150 URLs that I thought would be both potential link targets themselves, as well as linking to other sites that could be useful. But how would I know for sure? This was going to be the tricky bit.

Finding the Potentials

The first step was to create an account at Citation Labs Tools and use Garrett’s ‘pay with a tweet’ option to supersize my account. The cost model for these tools is based on the amount of bandwidth you use so, with my initial 10mb and an extra 250mb for the Tweet I knew I’d have more than enough for this little challenge.

The first thing to do was to scrape all of those URLs from earlier for any outbound links. You could do this manually, but we only have an afternoon here, so fire up the Outbound Link Checker, copy and paste your list of URLs into the ultra user friendly page layout and hit ‘Go’.

Being British it’s impossible to work for more than 39 minutes without a cup of tea so while that was running it was time to make a brew.

Upon my return I was greeted with a completed and ready to download CSV file. It contained a total of 15,000 rows which, I deduced, meant that I was onto a winner. Going through each of those individually, however, would be pretty much impossible. The first thing to do was filter out any duplicates. So, again, I fired up Citation Labs, went to the URL conversion tool, copy and pasted in my list, and within a few seconds had a slightly smaller list of domains.

Using an epic amount of common sense I knew that not all of these domains would still be active. Some would have expired, some would’ve been 301 redirected to other sites, whilst others would have errors. Fortunately Garret had the answer for this, and within a few minutes I’d checked the status of each domain (1000 at a time) and them all combined into a new CSV file. Finally, I re-ran the duplicate removal tool (just in case any of these sites had been 301’d to another that was already in my list) and saved my final list of prospects.

Easy as that. Kind off. This was really just a big list of websites that I knew nothing about. The outbound link checker would have found everything from blog comments to paid links and banner ads. There’d definitely be some pure gold in there – I just needed to find it.

Page Semantics

TextWise is, in short, API heaven. It allows you to input a URL and will return you a list of keywords associated with that page which are based on the old school DMOZ categories. You need a little bit of programming knowledge to get this to work, but not a huge amount.

Essentially all we need to do is create a MySQL table, and then from inside a PHP script loop through all of the URLs we’ve found and see what keywords are associated with them. I started by creating a file that would do one domain at a time, played around with the TextWise API a little to make sure that results were consistent, and then updated my script to allow as many sites as I wanted to be inputted. In total this took an hour or so to create a very basic (and buggy) version, but you could easily hire someone on oDesk to do the same thing to a better standard very cheaply ($100 – $200).

The result was astonishing. From my 20,000 URLs I ended up with about 500,000 rows of keywords, weighted to show how important they were on the page. Running a few quick queries through phpMyAdmin shows that I had 2000 sites with content related to either dogs, cats or pets. My only issue was that I didn’t know how to contact them and, more importantly, whether I should be spending time doing so. Fortunately SEOMoz has a (free) API.


Doing a quick Google search for ‘SEOMoz PHP API’, found the relevant library, and hit the download button. Within a few seconds I’d copy and pasted this into my FTP client, uploaded to the web server, and was ready to start playing around. The great thing here was that the library included some examples (although they’re not great) and, along with the documentation, I was quite quickly able to get SEOMoz to check an individual URL for its page and domain authority. All that needed to happen now was to loop through each URL in my database, check its stats, and store them.

Again, this is fairly straightforward but due to SEOMOz now imposing a rate limit you’re going to need to do some extra tweaking to make sure that you’re not firing off too many requests and ended up with the dreaded ‘Throttled’ error message. Have a word with your programmer (or create a job on the earlier-mentioned elance or oDesk) and point them to this post.

Finally, it’s a case of finding the contact information for each site. I use a mixture of automated and manual checking for this but the process is basically as follows:

  1. Export sites from my database that match the required keyword;
  2. Filter those with a Domain Authority within my chosen range (typically 30 or higher);
  3. Run Citation Labs’ contact finder on the results and download the CSV;
  4. Do some excel VLOOKUP wizardry to drag in email address / contact forms to my list of sites;
  5. Outsource the manual finding of any missing email addresses and, to increase conversion rate, the first and last names of the site owner along with their Twitter / Facebook profiles.

My afternoon challenge ended after step 4, but by this point I had around 2000 sites that I knew were related to my chosen niche and would potentially be worth getting in touch with. The final stage, step 5, is entirely dependent on what you have to offer. In my case it was some fantastic infographics but the important thing to remember is that if you’re contacting people it needs to be for a reason other than just begging for a link. Good luck!

Matt Beswick
Matt Beswick is the co-founder of Aira - a UK based web agency with a strong background in running SEO and Social Media campaigns for... Read Full Bio
Matt Beswick
Subscribe to SEJ!
Get our weekly newsletter from SEJ's Founder Loren Baker about the latest news in the industry!
  • Gids

    Hi Matt
    Thank you for one of the most interesting and useful articles I’ve read for a while… now, I just need to find the time for half a day of copycat geekery!

    • Matt Beswick

      Thanks Gids – glad you liked the article!

  • Vikram

    One word – Amazing. A sensible article after a long time a professional SEO can use.

  • Ashley Balstad

    You had me at the title!

    This was an incredible article. Thank you so much for sharing!

    • Matt Beswick

      Heheh, thanks Ashley… it actually took me a few months to finish this post, and most of that was trying to come up with a decent title 😉

  • Suzannah Hastings

    If it’s as simple as this to start building links, everything screams that Google’s going to continue to downgrade link importance. Real websites built for real people, please. Manipulating search engines just yells ‘short term gains and long term trouble’ to me.

    • Matt Beswick

      Hi Suzannah,

      Everything that we, as SEO’s do, is search engine manipulation – it’s just semantics as to whether that’s done in what is traditionally described as ‘black’ and ‘white’ hat. Dumping a few thousand links into spun content is one thing. Promoting some of your fantastic content to help get it traction is another. Kidding ourselves that Google will take down the spammers and that one day everything will be all white and fluffy is a complete misnomer – the people that sit around waiting for that are going to carry on getting left behind

      This post wasn’t just about building links (even if that’s what the title implies) – it was about prospecting for other sites that are relevant to the content that you’ve created and, therefore, likely to be open to forming a relationship. Whether that leads to a link, social sharing, or some other kind of business-lead link up it doesn’t really matter.

      In my opinion, links are always going to be an indicator… it’ll be the quality of those links that becomes more and more important. Social metrics will probably become more important, particularly evident with the ‘Google Search Plus Spying on Your World’ launch, but those can be faked too – just as you can buy links you can also buy social shares.

      Also, one quick thing to clear up – the ‘added importance’ article on Crowdbait needs a bit of clarification. Contextual links aren’t about exact match anchor text – they’re about having a link back to your site from within a relevant piece of content. If you’re writing about fridge freezers and add a sneaky link back to a site that sells dodgy pharmaceuticals it’s not a relevant link… but if you write a piece about how engaging content is the key to a successful site with a link back to your homepage with your CEO’s name then it is.

      Ramble over 🙂

      – Matt

      • Suzannah Hastings

        Hi Matt,

        I appreciate the long ramble – always good to hear all the different sides!

        Of course, I realise that waiting for everything to be white and fluffy is a ridiculous notion, but it doesn’t stop me wanting it. I would love an internet where the genuinely relevant and interesting websites were more likely to be found than the ones who had simply spent a lot of money on some clever SEO techniques. I do realise that it’s like wishing for World peace or a cure for all the nasties though…

        I do think links will be an indicator, but I think the way they’re used is likely to change – especially, as you mentioned, with the increasing use of social shares and all the data that can bring. Where social holds an edge is that the circles people move in (at least initially before the spammers get smarter) create particular patterns of networks, as will the shares associated with any content. These will look quite different to ‘fake’ social networks and social shares – again, at least initially. I think that by concentrating on such shares, Google and Bing in particular have come up with a rather cunning plan for finally tackling the artificial link creation problem. But, of course, SEO and search engines are locked in a constant arms race, so it’s merely a matter of time until there’s a new way around it.

        Ta for having a nosey over our site too – I’ve been toying with editing that exact part of the article, so you’ve given me that little extra push I needed to get to it!

        – Su 🙂