SEO

Diagnose Critical Website Architecture Issues for SEO

I was pleased to be invited to write a guest post on Search Engine Journal, but it took me a little while to settle on what exactly I was going to talk about. Today, finally, I made my decision – tips to help diagnose problems with your website architecture. I’ve seen lots of posts covering different aspects of problems associated with website architecture for SEO, but I can’t think of any I’ve seen for a while that try to bring all of those ideas together.

So what ideas should you be looking at when you’re trying to figure out if there’s a problem with your website architecture?

How flat is your site architecture?

As Rand points out, flatter site architecture isn’t just for SEO, it’s for the benefit of your users, too. How many clicks are there between your homepage and a product pages? Have you made it easy for users (and search engines) to get all the way to the bottom of your site map? Think of this from the point of view of the number of levels between your (often most authoritative) homepage and the very lowest content type on the site (e.g. job pages if you’re in recruitment and product pages if you’re in retail). The fewer of these levels there are, the more likely you’ll be in distributing page rank nicely throughout your site architecture.

One of the issues that can come up when you’re categorising your pages is the maximum number of links you can have on a page. Take a look at this post from Matt Cutts on some good background on how far you can go. According to data collected from Linkscape, the average number of links found on a page is 75! That means there are some pages out there with a lot of links!

When you’re designing your site architecture, try to keep as flat as possible, following the homepage>category>brand>product principle, a little like this diagram (click to enlarge):

seo sitemap structure keywords Diagnose Critical Website Architecture Issues for SEO

You can add cross links between categories (or products) that are related too. A great tip is to identify your most authoritative pages using the “Top Pages On Domain” tool at SEOmoz and use those pages to add some authority to others by linking out to them.

Broken internal and external links

Identifying broken internal links is pretty easy – download Xenu’s Link Sleuth and leave it to crawl your site. Get the broken links report together and you’ll be able to quickly identify whether your site has any internal broken links leading to 404 error pages. If you’ve got a really big site to crawl, Xenu can get a little overwhelmed. I really like using Web Link Validator for sites with more than 100,000 pages – the software stays together and can easily export its results without any problems.

For checking inbound links that may be leading to a broken URL or a 404 error page, get Google Webmaster Tools and check out the pages with external links report. The report will give you a list of pages producing 404′s. Work through the list and get free links.

Server header responses

A “server header response”, like a 200 or a 404 is a status code defined by the HTTP/1.1 protocol. Giving the right messages when your page loads fully, is broken or if there’s a server problem is a vital part of the way your website communicates with search engine crawlers. Check every single response code you get by installing live http headers for Firefox and testing a few scenarios such as a broken URL and an ordinary page load. If anything looks wrong, talk to your developers!

Do you have a duplicate content problem?

Diagnosing for duplicate content and fixing any problems you find is, in my mind, a best practice element of SEO. Even though the search engines gave us the canonical tag, you should still be making sure there aren’t too many problems. Use your common sense by taking a look at your total indexed pages doing a simple “site:yourdomain.com” query in Google and by using Yahoo’s Site Explorer. Does it seem like there are too many pages in the index? 75,000 pages when you know there’s more like 20,000? When you’re investigating your site in a search engine index, you’re looking out for malformed urls, query strings (like ?=sessionid or ?first_page etc) or many repeated results with the same title / description. Don’t forget that Google will only display the first 1000 pages urls with the site: operator, so you need to get creative when investigating your site.

Orphaned pages

Orphaned pages on a big site can be a problem, particularly if you’ve migrated to a new platform or fundamentally impacted your site design and URL structure recently. Not linking internally to a page on your site can often be a death move for the ranking position of the URL in question, particularly if there are no / few external links to the page. Checking for orphaned pages on 100,000+ page sites is enormously difficult too – especially with dynamic sites. Here are a few things you can look out for to make sure you’ve not orphaned any pages:

- Significant change in total pages in Google and Yahoo site index
- Changes in the numbers for pages with internal links from the Google WMT report
- On recrawling your site with Xenu or Web Link Validator, did you notice a change in the number of URLs the software was able to crawl? Save your first crawl, export the data and compare it to a recrawl. Where are the gaps?

Canonical redirects

Check to see if your site is indexed at http://yourdomain.com and http://www.yourdomain.com and set up a canonical 301 redirect to sort the problem out. Don’t forget to add a trailing slash at the end or the URLs (or remove it, depending on which you prefer). There are plenty of guides to setting these redirects up in IIS or Apache – here’s a nice guide from SEObook.com

Is your development server indexed?

One of the most suicidal moves to make in SEO is to allow your development site to get indexed. There are lots of ways to remove a website from a search engine index and usually we try to avoid that happening. In the case of an indexed development server, you need to get rid as soon as you can. Make sure you use a robots.txt file at the root of your development server and if at all possible, restrict traffic to the development site from outside of your network, perhaps by specifying IP ranges that are allowed or user agents that are not allowed in. It doesn’t take long to find people who have forgotten this very basic rule – take a look at this Google query.

Keeping a close eye on your site on a regular basis by watching your site index, crawling and performing regular checkups can really pay dividends and give you a fantastic early warning system against future issues with your site architecture. Do you have any more tips you’d like to share? Tell us about your experiences below…

Richard Baxter is an SEO Consultant and chief blogger at SEOgadget.co.uk, a UK SEO Company. Come check out our latest SEO Jobs or, if you’re recruiting, post a job free.

 Diagnose Critical Website Architecture Issues for SEO

Richard Baxter

Richard Baxter is an SEO Consultant and chief blogger at SEOgadget.co.uk, a UK SEO Company. Come check out our latest SEO Jobs or, if you’re recruiting, post a job free.
 Diagnose Critical Website Architecture Issues for SEO

Latest posts by Richard Baxter (see all)

You Might Also Like

Comments are closed.

15 thoughts on “Diagnose Critical Website Architecture Issues for SEO

  1. Excellent article summarizing site architecture and problems with it, and offering solutions (always important!)

    Note: You’ve got active links to http://www.yourdomain.com and http://yourdomain.com, both of which are pushing page rank off site uselessly to non-domains. Might want to put rel=nofollow on those, or not make them links at all.

    Thanks for all the links to tools, too.

    One question: Is there a tool you know of that will show a graph (something like the graphs at touchgraph.com) of the internal links on a site?

  2. Excellent overview. Didn’t know about Web Link Validator. Always used Xenu and indeed witnessed problems when dealing with *big* sites.

    I do have a suggestion:
    to check HTTP headers, i find Firefox’s Live HTTP headers a pain, and use something else:
    - sam spade, which (among others) checks http headers of only the html document (and not every single image on it also) -> and you can easily follow redirects and see which headers they have
    - but i prefer good ‘ol Lynx from a terminal, just enter ‘lynx -head -dump ‘ and you have a neat http header. When used on mac os x, it looks better in screenshots in management reports also :p

  3. What a great work list for anyone who is serious about making their website perform well. The problem is knowing how best to allocate your time.

    Using Google Analytics to find where your visitors are going on the website is a good way to know what you should be checking first.

  4. @Jere – I really like the idea of a link graph generator for internal links – though I don’t think such a thing exists. Sounds like a great SEO tool idea!

    @Ramon thanks for the tip on the header checker. There were additional plugins suggested in this post on Firefox extensions too. Take a look!

  5. Good compilation Richard!

    The flat architecture with a proper category/ sub-category hierarchy also helps to prevent pagination issues to some extent.

  6. This is an excellent list of services and recommendations for good traffic and page rankings. I never knew you could actually find that you have duplicate content out there. I mean I know its out there but with the link you sent really spells out the details. I also really appreciate the section on not allowing your dev site to be searched. Now I know why a few of my sites might have some problems.

  7. This is an excellent list of services and recommendations for good traffic and page rankings. I never knew you could actually find that you have duplicate content out there. I mean I know its out there but with the link you sent really spells out the details. I also really appreciate the section on not allowing your dev site to be searched. Now I know why a few of my sites might have some problems.