Three Ways a Test Crawl Could Uncover Hidden SEO Dangers

SMS Text

In November I wrote a post explaining how just one line of code could destroy your SEO.  It underscored the fact that sometimes hidden dangers can kill your SEO efforts.  In addition, it also explained how a thorough audit can reveal those issues and get your site back on track SEO-wise.  Well, I’m back with a new post about audits and SEO gremlins.  And as part of this post, I’m going to include information about one of my favorite tools (one that I’ve used for a long time – Xenu Link Sleuth).

In this post, I’m going to explain the power of a test crawl when auditing a web site.  You can learn a lot by crawling a web site, and I’m going to focus on three hidden dangers that can be uncovered by using a tool like Xenu.  And since the information is actionable, you just might be running to your development team once the crawl is done.

Xenu Link Sleuth – Free, Simple, and Powerful

I’m going to focus on Xenu Link Sleuth for running a test crawl, although Screaming Frog is another solution you can try (but it’s a paid solution).  I’ve been using Xenu for years and it’s a great tool for spidering a website.  There’s a wealth of information Xenu returns, including status codes, server errors, inbound links, external links, page metadata, etc.  As part of the process of auditing a web site, it’s always a good idea to run a test crawl.  You never know what you’re going to find.

Note, this post isn’t meant to be a deep Xenu tutorial that covers all of its functionality.  Instead, my goal is to show you three things you can uncover during a test crawl that could be hurting your SEO efforts.  Let’s take a look at what Xenu can find after you unleash it on your website.

3 SEO Dangers Xenu Can Uncover:

1. Server Errors

There are times I run Xenu on a medium to larger site and the report comes back with hundreds (or thousands) of server errors.  As you can imagine, having Googlebot encounter thousands of server errors isn’t a good thing.  The crazy part is that clients often don’t know this is happening, and that’s especially true with larger sites.

In addition, there are times those server errors are occurring on pages that the client didn’t even know existed.  Confused?  They were too.  In a recent audit, the CMS being used was dynamically building URL’s based on a glitch.  Those URL’s threw server errors.  And Googlebot was finding them too.  Not good.

2. 404’s

If you’re not familiar with header response codes, a 404 is “Page Not Found”. For SEO, you want to hunt down 404’s for several important reasons.  For example, are powerful pages on your site throwing 404’s?  Why is that happening?  And how much search equity are you losing if they are throwing 404’s?  A search engine will remove a page that 404’s from its index, and you could lose valuable inbound links that are pointing to those pages.  As you can imagine, correcting pages that 404, that shouldn’t 404, is an important task for SEO.

On the flip side, if web pages have been removed correctly (and throwing 404’s), the last thing you want to do is to drive Googlebot to a “Page Not Found” via your own internal linking structure.  That’s a waste of a link on your site (and it’s bad for usability as well).  Using Xenu, you can easily view 404’s in the reporting, export that report, and then work with your developers on fixing navigation issues.

3. Weird (and Potentially Dangerous) Outbound Links

Let’s focus on a more sinister issue for a minute.  Unfortunately, hackers are continually trying to infiltrate web sites, CMS packages, etc. to benefit their own web sites (or client web sites).

Here’s a simple question for you.  Do you know all of the pages you are linking to from your website?  Are you 100% sure you know?  Are you hesitating?  🙂

For larger sites in particular, tracking down all outbound links isn’t a simple task.  If your server or CMS package was infiltrated, and outbound links were injected into your site, it can be hard to manually find those links in a short period of time (if you’re even lucky enough to know this is going on).

There are times I run a test crawl and find some crazy outbound links to less-than-desirable web sites.  And as you can guess, they contain rich anchor text that can boost the rankings of those sites.  And worse, some web site owners have no idea this is going on.  Xenu can pick up these external links and provide reporting that you can analyze.

Summary – Crawling To Action

Running a test a crawl on a web site is a smart task to complete on a regular basis.  There are several actionable findings a tool like Xenu can uncover, and at a very low cost.  Finding these simple, yet destructive problems can help webmasters improve the SEO health of their web sites.  And that’s exactly the goal of an SEO Audit.

Again, I just covered a few insights you can glean from a test crawl.  Try it out for yourself today.  As I mentioned earlier, you never know what you’re going to find.

Glenn Gabe
Featured SEO Writer for SEJ Glenn Gabe is a digital marketing consultant at G-Squared Interactive and focuses heavily on SEO, SEM, Social Advertising, Social Media Marketing, and Web Analytics. Glenn has over 18 years of experience and has held leadership positions both in-house and at a global interactive agency. During his career, Glenn has helped clients across a wide range of industries including consumer packaged goods (CPG), ecommerce, startups, pharmaceutical, healthcare, military, education, non-profits, online auctions, real-estate, and publishing. You can follow Glenn on Google+ here.
Get the latest news from Search Engine Journal!
We value your privacy! See our policy here.
  • Margarita

    Great post! I use Xenu too, and I was just wondering what’s your take on getting lots of “timeouts” as the status? Would that mean the site is extremely slow…?

    • Sourav Saha

      It sounds like xenu is hammering the website with requests, and the site can’t keep up so it is returning timeouts. Try setting threads to a low value eg 2.

    • Glenn Gabe

      Thanks Margarita. I’m glad you liked this post. Sourav has a good point. It could be that Xenu is hitting the site with too many requests and forcing timeouts. That said, you would want to make sure that’s the case. You can also check specific URL’s to ensure they are resolving versus timing out.

  • Sourav Saha

    Thanks Glenn for such a detailed post. Other than Xenu I love using Screaming Frog SEO Spider Tool to check which pages on my site are showing a 301 redirect, and which ones are showing 404 errors or 500 errors.

    • Glenn Gabe

      Thanks Sourav. Yes, Screaming Frog is a great tool as well. I just wanted to explain how to crawl a site using a free tool like Xenu. I use a number of tools for checking header response codes (depending on the size of the site). Both Xenu and Screaming Frog can help identify problematic pages (and then you dig in with other tools and analysis if needed).

  • Mike

    In terms of SEO auditing, is there anything Xenu does better that the IIS crawler?

    • Glenn Gabe

      Great question Mike. The IIS SEO Toolkit is outstanding. I come from a web application development background (primarily developing with, so the IIS Toolkit was right up my alley. I’m glad you brought it up here.

      The only problem is not everyone has a system that can run the toolkit. If you can, it provides a wealth of data. I wouldn’t say that someone should use one over the other. I use both, and that’s probably the strongest way to go.

  • Matthew

    Great article, thanks Glenn. Xenu is a great tool that I haven’t used in quite some time. Your article has inspired me to get back into it 🙂

    • Glenn Gabe

      Hey, thanks Matt. I’m glad you’ll start taking a look at Xenu again. I’ve been using it for a long time. Very helpful. 🙂

  • Marcin

    I’ve heard about Xenu in the past, but I’ve never actually used it. I was missing a lot – thanks for sharing these tips with us!

    I usually use Website Auditor (part of SEO PowerSuite). It can return pretty detailed results, but I really like the speed of Xenu – it beats WA hands down. I’m sure it will become a regular addition to my toolkit.

    • Glenn Gabe

      Excellent, glad you found my post helpful. I think you’ll love the value Xenu provides.

  • Adam Brook

    A rule i try and stick to is 0-3 out bound links. Its difficult when promoting but i have found it helps a great deal. I set two websites up (very simular) one i agreed to outbound a alot of links (all PR over 5) and the other i had only 2. The second site is doing much better!!

    • Anthony

      Can someone explain why having a few more out bound links (5 vs 2) in Adam’s example would have such a drastic effect on SEO performance?

      • Darrius

        Also curious about that, do you mean per page or for the whole site?

  • Myron Rosmarin

    Love Xenu. One of the things I do for clients is use Xenu to perform a crawl depth analysis. It demonstrates very clearly how much of a site’s content is “close to the top” (within 4 or 5 clicks from the home page) vs content that’s so deep as to be virtually hidden beyond discovery. It’s another eye-opener – and it’s free and easy to run the report.

    • Glenn Gabe

      Absolutely Myron. That’s a great way to use Xenu. Viewing the depth of each page can definitely help webmasters build a flatter structure (when warranted). And you’re right, it can be eye-opening. 🙂

  • dan johns

    thanks for sharing the information .
    personally i use Google web master tools to check crawl errors on my site , but good to know some new tools.

  • Jonny Ross

    Hi Glenn,

    you know this is the FIRST thing i do with any new client, run Xenu, even before I load the site in my browser in most cases!

    you can learn so much, and you can fix so much with such a simple view point, that most “SEO agencies” don’t even look at. Most SEO focus on offsite and dont even think about onsite, its annoys me, but at the same time pleases me that i CAN make the difference!


  • Gids

    Just a mention of Screaming Frog SEO Spider – a good Xenu equivalent that works on Macs.

  • Stéphane Arnoult

    Nice post.
    Sorry for my english (I’m french). Xenu is a very helpful tool and that’s what this post shows. Myron Rosmarin explained that the tool is an easy way to work on depth of the pages and I totally agree… In an excell file, you can analyse how the depth is distributed.
    Another metric I work with is internal links : this is a nice way to estimate how internal links are distributed among all the different pages and if the repartition correspond to the aims we follow… And, with a “group by nb_links” approach, you can see which parts of your site have very few links and may not be as powerfull as they should.

  • Eyal Azerad

    After my website dramatically dropped in ranking in a course of 30 days, I scrambled for info on SEO optimization and stumbled on your posts. Great article, Easy to understand (thank you).

    I have a question, if you don’t mind. We redesigned our website and the url went from www. Com to the name of the company .com (basically dropping the www). Our ranking dropped from po 1-8 on google (been there for over a year) to position 40+ and in some cases, we are no longer listed. As you can imagine, this is quite scary. One seo “pro” suggested that it was due to the dropping of the www. Stating that all the created backlinks were pointing to www. And not just the company . I personally don’t see how this could change anything but was wondering if this explanation make sense ? Seems like a strange as the url itself remained the same, despite the www. “status”.

    I would greatly appreciate your feedback on this

    Thank you,

    • Jonny Ross


      I am guessing it could be lots of factors. firstly you said “we redesigned out website and the url then changed” so we need to take a complete redesign into account, this could be the biggest factor.

      With regards www and non www versions, one version should redirect to the other, and therefore if there are lots of links for the www version and the actual version is non www, then the links to the www would redirect to the www. More importantly you need to ensure you don’t have both versions live ( it doesn’t sound like you do ) because this can then create a duplicate site, and google will see this as dupe content.

      In short ensure you redirect the www version to the non www version, but to answer why you have dropped, I would suggest it is other factors. I guess if you dont have the www version redirected you could reaccesss your position once you have that in place…