SEO 101: An Expert’s Guide to Auditing a Website’s Onsite SEO Health

SMS Text
A Guide to Auditing a Website’s Onsite SEO Health | SEJ

I just had to audit one of my websites, and while it was a long and painful process (as always), I’ve discovered and fixed many problems I didn’t know existed.

There are a number of reasons why you might want to audit the SEO health of your own site. Maybe you own the site and you haven’t checked in a while, and you’re looking for a DIY solution rather than an expensive external audit.

Maybe you just bought the website and you want to make sure everything is in order before you proceed with your plans. Maybe you’re the third-party auditor and you’ve been contracted to check on the health of a client site.

No matter the reason, there will be a lot of factors to check, and you have to be comprehensive. Missing one factor can be a hit to SEO moving forward, and an old, lingering problem can compound if left alone. Here’s what to check, and how to check it.

Author Note: This is all about SEO on your own site. For off-site SEO or competitive analysis, you’ll have to check other guides.

Check for Broken Links with Screaming Frog

Screaming Frog

Screaming Frog will provide you with a lot of information that will be useful for a number of other steps on this list as well as this one. For this particular step, you just want to run a crawl of your site and check on the integrity of your links. Any link on your site that points to a page that doesn’t exist needs to be changed or removed. Find an updated version of the previous destination, or remove the link entirely.

Check Sitemap Integrity

Your sitemap is essentially a list of every page you want the search engines to find on your site. You need to make sure it’s well-formed and lists every page on your site. The exact process can be a little complex, so you have two options. Your first option is to check your current sitemap. Your second option is to just generate a new sitemap, preferably once the rest of your audit is complete and any changes have been made.

Check for 404s in Google Webmaster Tools

Crawl Errors

Assuming your site is linked to your Google Webmaster Tools account, you can go to your Crawl -> Crawl Errors report. This report will show you any point where the Googlebot has attempted to crawl a page only to find that the page doesn’t exist. If the page does exist, the error may be old, or you may be blocking the bot. If the page doesn’t exist, it’s a good opportunity to either create a real page for that spot, or to redirect to the page it should be loading.

Check for Duplicate Content with Copyscape

Copyscape Example

Duplicate content, either on your own site or on other sites, can hurt your site ranking. Copyscape is the premier source for checking for duplications online. You’ll find a few potential issues.

  • Duplicate content on your own site. If the same content is registering in more than one place, you either need to fix it – as in the case of duplicate product descriptions on thin pages – or canonicalize it, such as when a dynamic search makes multiple URLs for the same page.
  • Duplicate content on another site, part 1. This is when the other site was the originator of the content, such as when you copied manufacturer’s descriptions for products, or a content writer blatantly copied content. Remove the offending content and replace it with original content.
  • Duplicate content on another site, part 2. This is when another site has copied your content. Chances are this won’t penalize you, but you may want to report it to Google nonetheless.

Check for Thin Content

The exact definition of thin content is nebulous, but you can guess that any page with under 300 words of content is probably going to be considered thin. Any page with more navigational, header, or footer content than actual body content is going to be a thin page. Thin pages should be merged with similar pages, removed entirely, or expanded to become valuable pages.

Check for Content Errors

There are two types of errors that can crop up in content; grammatical and factual. Grammatical errors require some proofreading to fix, and it’s a simple fix to make. Just make sure you update your “last updated” date in your sitemap, so Google knows to reindex the page with the error-free version.

Factual errors are a bit harder to deal with. If your page is old and what it says was factual at the time, you don’t need to do anything except maybe add a disclaimer that the advice is out of date. It’s also an opportunity to create new, updated content, if it’s still relevant. The choice is yours.

Check the Number of Indexed Pages

Once again, go into your Google Webmaster Tools. This time, pull the Google Index -> Index Status report. At the same time, go to Google and run a search for site:yourURL. Each will provide you with a number of pages.

Do those numbers match? If so, Google is indexing everything you want indexed. If the index count is smaller, you might have an issue with accessibility or with robots directives. If the index count is larger, you may have duplicate content issues coming from dynamic URLs, as mentioned above. More on fixing both of those later.

Check for Well-Formed Meta Tags

There are three types of meta code you should check for each page on your site.

  • Meta title. Your title should be succinct, and under 70 characters whenever possible. You should append brand information to the end, not the beginning. Your title should also be descriptive, to avoid a bait and switch scenario. Likewise, it should include a keyword, but not an overly optimized keyword. Finally, every page should have a unique title, to avoid duplication errors.
  • Meta description. Your description isn’t directly an SEO factor, but it is important for clicks and attractiveness in previews. Keep it short and relevant, with a keyword that matches the title and content.
  • Meta robots directives. Typically, you can handle most robots directives in the main txt. Using them at the page level is asking for contradictions and trouble.

You should also make sure paginated pages use the rel=prev/next tags, and that you have proper canonicalization. Also, if any of your pages have meta keywords, remove them. Only spam sites use the keywords tag these days.

Check for an Optimized 404 Page

Optimized 404

Ideally, no one visiting your site will land on the 404 page, but it will happen no matter how much care you put into things. Optimize your 404 page. Don’t just redirect it to your homepage.

Check for Proper Canonicalization

Canonicalization can be complicated. When you have multiple URLs for the same page, such as when URL parameters are involved, session data is stored in the URL or the page is dynamically generated, you can end up with dozens or hundreds of URLs all pointing to the same page. Google, however, operates by the URL. This means any two URLs are assumed to be different pages. A dozen dynamic URLs for the same page, to Google, look like different pages with duplicate content. Avoid this issue by canonicalizing the pages.

Check Site Load Times with Pingdom

Pingdom Example

Users don’t like to wait, so don’t make them. Use Pingdom to check for the load times and performance of your site in general and specific pages in particular. Ideally, none of your pages will take longer than 2-3 seconds to load. The fastest pages should be measured in milliseconds, while the slowest should rarely take longer than five seconds. If anything takes longer than ten seconds to load, you have a dangerous error you need to take care of.

There can be any number of problems leading to slow response times. You may have to make a drastic change in your server architecture or your web hosting, or you may just be able to remove a broken plugin or fix a broken script.

Check Robots.txt for Errors

Your robots.txt file is important for telling well-behaved search engines what do to. Make sure you’re not accidentally blocking the bots from your entire site. If you have any blocked pages, make sure they’re blocked for a reason.

Check for Proper Redirects

Moz comes to the rescue for this one. There are several different types of redirects, each serving a particular purpose. If you have any redirects on your site – and you might, if you’ve changed your site architecture or your URL structure for another step – make sure the redirects are properly implemented.

Check URL Format

Permalink Settings Screenshot

A sane URL structure that is human readable is an important aspect of SEO. Called HRULs or Semantic URLs, these are very common today. Any time you see a site with, you’re looking at a semantic URL. This is opposed to strings of letters and numbers that don’t make any sense.

If you don’t currently use semantic URLs, you have a very important change to make, and it’s a major change. It will likely involve a lot of redirects and a lot of work, so be careful when you’re implementing the changes.

Check Image Alt Tags

Every time an image is used on your site, it needs an alt description, even if it’s just your logo up in the corner. Alt text is important for usability, because any time an image doesn’t load, the text loads in its place. This helps users with accessibility issues and slow connections. Alt text also helps an image rank in Google’s image search, which can be important for bringing in traffic as well. With that in mind, try to craft a keyword-optimized alt description for every image you use.

Check Proper H Tag Utilization

Every page should have an H1 tag for the primary title. You don’t necessarily need to use any other H tags, though using H2 for subtitles is a good idea. Avoid common mistakes, like using H2 for the first subhead, H3 for the second, and so on. Think of them as nested elements, not as numbered lists.

While you’re at it, make sure that you don’t skip a number. Never have an H3 on a page without an H2 before it. Likewise, never use any H# tags if you don’t have an H1 at the top. It might seem like a minor factor, but every little bit helps, and properly formed code is important.

Check for Keyword Cannibalism

Keyword cannibalism is a phenomenon that happens more often on large, old sites and less often on small, new sites. The idea is there are only a limited number of valuable keywords in a given niche, so blogs end up repeating themselves. However, if a keyword is targeted multiple times on the same site, the cumulative SEO value of that keyword is split up amongst those pages. This means that each individual page is less potent than one combined page would be.

The result is that you may end up holding the rank 6, 7, and 9 spaces on Google with three cannibalized pages rather than holding the rank 1 spot with a single focused page.

To find keyword cannibalization, you will need to figure out what the targeted keywords are for every relevant piece of content on your site and make sure there are as few duplicates or overlaps as possible.

Check on Proper Site Architecture

If you were to draw a circle on a piece of paper representing your homepage, how splayed out of a spider’s web would the rest of your site look, with each page a circle and each link a line? There’s actually a science to it, and if your site violates some of the basic rules, like hiding content too many clicks away from the homepage, you may have a redesign in your future.

Check for Search Penalties

Manual Action

If your site is old and ill-monitored or freshly purchased, you may have a lingering search penalty.

  • First, check for hints of a penalty, like significantly lower than expected search ranks or missing indexing.
  • Second, check for an actual penalty. If the pages you suspect are penalized are just noindexed, it’s an easy fix. If you’re actually penalized, you have some work to do.
  • Fix the problem. It might be links, it might be content, it might be code; whatever it is, you need to fix it to get your ranking restored.
  • Request reconsideration. For most penalties, Google will detect the changes and lift the penalty automatically, but a reconsideration request can’t hurt.

Check for External Link Quality

Pull a profile of all of the links on your site pointing to other domains. By now, you should have already fixed any broken links. For the rest, you need to go through and determine if you want to keep them. Are they pointing at sites that have since changed, been parked or hacked?

If so, you may want to remove them and replace them. Are they pointing at sites you consider low quality? If so, consider removing them or adding the nofollow attribute. If they are high-quality sites you trust, leave them as they are.

In Summary

Your page stats on Google analytics will be a strong indicator of which pages are affected the most, and after auditing your old content, you may realize your content creation habits have improved and you have some old content that needs to go. The only way to properly audit a website is to leave no stone unturned, and to audit every aspect of your website content. Don’t forget to use tools and software to make your life easier.

James Parsons
James Parsons is an entrepreneur, marketer, web designer, growth hacker and Apple fanboy. When he's not writing at his blog, he's working on his next big project.
Get the latest news from Search Engine Journal!
We value your privacy! See our policy here.
  • Dan Mitroi

    Hi James,
    As we know On Site SEO has been around from the beginning and will live long enough. It’s all about making the web faster, cleaner and relevant. They are many ways to tweak a site architecture for SEO purpose.
    Having a solid website: few broken links, missing tags etc can and I believe that is the base of any successful online campaign. Awesome guide!
    You may want to try out our tool at Darcy SEO Checker. Let me know how it goes!

    • James Parsons

      Thanks for your comment Dan! I agree, I only recently installed Broken Link Checker on my blog and it helped me quite a bit.

      Don’t be too quick to remove newly discovered broken links though, give it a few days to make sure that they aren’t just having temporary server downtime and then recheck.

  • R.Rogerson

    Broken Links – Screaming Frog
    Sitemap Integrity
    404s – Google Tools
    Duplicate Content – Copyscape
    Thin Content
    Content Errors
    Number of Indexed Pages – SERPs?
    Well formed meta tags
    Optimised 404 pages
    Page Load Times
    Robots txt
    Proper redirects
    URL Format
    Image Alt Tags
    Header tags
    Keyword cannibalism
    Site architecture
    Search Penalty
    Check outbound links

    That’s a fairly comprehensive – so I admit to being impressed 😀
    I admit to laughing when you mentioned doing an audit after buying a site – if you’re doing that sort of thing, I’ve got a bunch of lovely moon-rocks with NASA certificates, some prime Mississippi real estate and some quality barely-used cars for sale 😀

    I do have to ask some questions though (else I wouldn’t be me :D).

    * Is there a reason there isn’t any sort of “priority” to the list?
    Nothing major, I just would have thought some things appearing first would have made more sense.

    What has an “optimised” 404 page got to do with rankings?
    Now, I’ll admit – I’ve never actually tested this … but are you suggesting that a user-friendly 404 page influences rankings? (Actually, it would be a sensible thing…).
    Or is this more a UX thing (not an SEO thing)?

    Sitemap Integrity.
    The idea of the Sitemap is to ensure the SE knows of the URLs you want indexed.
    This does not mean it has to include every page on your site.
    Further – if you are smart, you will be submitting multiple sitemaps.
    For starters, it’s a good move to ensure you have sitemaps for different sections of your site … as this will enable you to see roughly what parts of your site are more indexed than others. Further, if you split by page type, you can again see performance (think of landing pages, sales pages, site pages … all serve different functions).

    Indexed pages.
    Has G recently changed the way their SERPs work?
    As far as I remember, the number indicated is seldom accurate (esp. on the first SERP). Further, the difference shown could be due to numerous reasons, including non-crawled pages, pages not indexed due to meta-robots, canonical elements, incorrect status codes, or filtered due to be found as being thin/cookie-cutter content, or internally duplicated, or highly similar, of duplicated to external content etc.
    In fact, I don’t think I’ve ever seen a match between the number of pages on a site and the number in the SERPs if the site has any real volume of pages.
    (It’s also worth noting that the number of pages reported as indexed in the sitemaps section of Googles tools may also be lower due to the same set of reasons)

    Thin Pages.
    First – it is Not nebulous at all.
    Google have defined it quite clearly – and did so some time ago!
    Thin content is content that is not original, lacks any real purpose other than as SE bot fodder for keywords.
    Further, there is nothing wrong with short-pages, so long as the page is useful.
    And that there is the key part.
    A page filled with 150 to 5000 words that serves no purpose, helps no one, informs no one and generally does nothing is useless.
    If the page has 200 words on it, yet is 200 words of highly informative content that people find useful … then it is Not Thin!

    Duplicated content can be problematic.
    CS may not be accurate/up to date. It’s a good general purpose tool – but it’s also worth doing a shingle test yourself (use any 5 word string from your page, put it in quotes and google it) – I know this isn’t optimal for large sites, but for newer content, it’s more reliable (always worth doing a 1/2/4/8 weeks after new content).
    You also need to check for internally duplicated content/highly similar/cookie-cutter content.
    + Duplicate Content Penalty?
    You stated that content being copied from your site likely won’t get you penalised.
    Any copied content won’t get you penalised – there is No Duplicate Content Penalty!
    There is a Filter – G will remove numerous identical copies from the SERPs if it see’s too many … (actually – I suppose if a huge amount of your content is found elsewhere, you could be removed from the SERPs, but that would likely be manual, and not a penalty … but I don’t think I’ve ever heard of G doing it?).

    Canonical URLs.
    Why does everyone simplify this?
    You can have canonical issues due to multiple domains, subdomains, test-sites in directories etc., then homepage being under domain root and home or index, – not to mention things like pagination causing it (page.php and page.php?startnum=1 is the most common, as well as variant URL parameters that result in the same or highly similar pages.
    Another big issue that wasn’t hit was Infinite URLs … these are often caused by things like clickable calendars… and poorly built pagers – which are technically infinite. G can waste a lot of resources crawling those URLs.

    Page Load Times.
    So you link to Pingdom – but not the tools specifically built by Google or Yahoo for examining and reporting … as well as spelling out what you need to fix???
    At least you didn’t say faster sites improve rankings (they don’t! (not unless G changed that over the last 2 years)).
    The ideal is less than three seconds. If you have a dynamic site loading CSS, JS and images, you will likely get 1.4 if you are good.
    There are lots of things you can do to improve load times – and you should … for UX, Retention and Conversion reasons.
    (Make sure you load as few CSS/JS files as possible – merge them if possible. Compress or send compressed. Optimise images for smaller file size. If possible, shard domains (locate CSS/JS/Imgs on different domains or subdomains). Load JS last (after CSS if in the head, though ideally it would be before the /body tag, and using either defer or asynch).
    Be warned, some of it is tricky, and will take a bit of practice … but Google/Yahoo tools will tell you the priority order, and you only have to do a few things to save seconds)
    Why no mention of Redirect Chains? Or making sure that links point to the correct URL and not relying on redirects?
    (That said, why are you linking to external sources when it’s your guide?)

    HURLs/Semantic URLs.
    Well – those are new to me … but far better than the vastly incorrect SEF-URL.
    It’s worth pointing out that it is seldom worth changing to word-based URLs from system-id URLs … the loss of value due to redirects will usually outweigh the gain of a URL with a keyword in it.

    Keyword Cannibalisation
    I do wish people would be clearer on this.
    There are few pages targeting a specific exact keyword. If people have done things right, there should be a collection of interlinking pages focused around a singular word/phrase – but focusing on variations (different mid/long-tail phrases).
    If you do have multiple pages targeting the exact same phrase, with no variation … then that should be remedied fast (Either re-target the pages for variants, or merge them to a single (pre-existing!) URL with 301’s).
    Further, multiple listings of the same Domain in the SERPs is proven in many cases to increase CTR … so it may not always be a bad thing (though, yes, being 1st would be nice … being 1st, 2nd and 3rd would be ideal :D).

    Major personal bug-bear …
    Server Logs.
    I’m beginning to have serious concerns about the SEO industry in general due to the sheer lack of usage of one of the most useful resources available on most hosts!
    They can give you errors, redirects, traffic flow, inbound link sources, bounces, even a rough indicator of time on a page (so long as they hit 2+ pages).
    Tons of info in there – and yet no one seems to use them???
    (Okay, I admit, on large sites and/or high-traffic sites – they can be beastly … but then again, you should be moving them to DB anyway)

    Well … I think that’s enough for now (my fingers hurt and my keyboard is smoking :D).

    I know – a lot of nit-picking …
    … but all in all, you actually did a damn good job – far better than most.
    I’m just very picky 😀

    • James Parsons

      Wow, thanks for your comments and kind words.

      * Is there a reason there isn’t any sort of “priority” to the list?

      I think this is a simple enough checklist to do in an hour or two. I tried listing the most important ones first (broken links, Google webmaster / sitemap issues, etc); these are the most common issues from what I’ve seen on my clients sites.

      * Now, I’ll admit – I’ve never actually tested this … but are you suggesting that a user-friendly 404 page influences rankings? (Actually, it would be a sensible thing…).

      Absolutely. Its all about user-experience.

      * In fact, I don’t think I’ve ever seen a match between the number of pages on a site and the number in the SERPs if the site has any real volume of pages.

      Generally speaking, websites with more quality content indexed will perform better than websites with less quality content indexed. This doesn’t mean a blog that has thousands of tag and category pages indexed will perform better, but in a large enough study, you’ll find sites with more pages generally perform better in search engines (as they have more content and search engine results).

      * Thin Pages.First – it is Not nebulous at all.Google have defined it quite clearly – and did so some time ago!

      It can be. I’ve worked with clients who argued their eCommerce pages weren’t thin because of all of the content and images they had, but they had very little value and were what I considered “thin content”.

      * So you link to Pingdom – but not the tools specifically built by Google or Yahoo for examining and reporting … as well as spelling out what you need to fix???

      Pingdom has been more effective for me at identifying bottlenecks than Google/Yahoo.

      * (That said, why are you linking to external sources when it’s your guide?)

      People generally want you to back up your facts and claims with supporting links. It’s good article writing practice, good for SEO, and required by SEJ.

      * … but all in all, you actually did a damn good job – far better than most.

      Thank you!

      • R.Rogerson

        So, you are stating a user-friendly 40 improves ranking.
        And this has been tested and proven?
        (Again, it would make a bit of sense, and G did make a bit of noise over it – but I don’t think I’ve heard of anyone saying it’s a signal… I may have to test that)

        Site: operator – you kind of side-stepped the point 😀
        The number of pages reported in the SERPs is not reliable – it can/does fluctuate, and tends to change depending on the SERP page you are on. The initial figure is an estimate – and often a wild one. As you move through the SERPs, the figure will often change. When you reach a certain point, you will likely hit the “omitted” page … and clicking that will often only give yo a little more.
        I’ve seen sites with 50-1,000 get reported figures of 1000-25,000 – it’s generally that unreliable.
        (That’s not saying your referencing of indexed rate of quantity of quality content is wrong – just has no real bearing on the unreliability of what you suggested :D)

        Thin pages – just because a client doesn’t want to hear or acknowledge something does not make it nebulous.
        Again – it is clearly defined, and has been for many many years.

        External sources – indeed, 100% correct. But the point I was making was that you are writing a guide, yet pointed to other resources rather than including the pertaining information (that would be like picking up a Haynes manual, and being told to get another book when you look up replacing a clutch :D)
        (That said – yes, I appreciate the space it would take, so it is understandable)

  • Gus Quiroga

    Any thoughts on siteimprove?

  • Fahad Zahid

    Thanks James for sharing these tips, there are several other factors which are needed to be focused in order to increase onsite seo health of any website. For Instance Quality Content, Keywords Consistency, Text to HTML ratio, In Page Links, mobile loading speed, website Usability, HTML Validations, Implementation etc!

    • James Parsons

      Hey Fahad, thanks for sharing!

      I think website usability is the most notable of the additional factors you’ve shared. User experience is extremely important; all of the others kind of fall underneath user experience. If your HTML is making it difficult or inconvenient for people to browse your site, or your website doesn’t load on mobile (for example), that’s bad user experience and your rankings may suffer.

  • Nathan Whitaker

    Screaming Frog is an incredible tool for technical SEO audits. Especially for identifying duplicate issues, broken links and indexing issues. Also including an internal linking audit while checking the site architecture is a good idea.

    • James Parsons

      Thanks for your comment Nathan. Screaming Frog is great, every SEO firm I’ve worked with uses it for their client’s sites.

  • Yasin Rishad

    Hi James,
    So glad to see your guide for on site seo. I loved your tools of screaming frog which give quickly crawl, analysis and audit a site from an onsite SEO perspective. Thank you for this.

    I also get valuable information on Proper Canonicalization, redirect and what will be url format. Responsive design is also now an onsite seo factor.

    Thank you for your valuable tips for on site seo.

    Kind Regards
    Yasin Rishad

  • Melissa

    Or check all of this with just one tool:

  • Thomas Brew


    I never knew about Screaming Frog and Xenu Link Sleuth.

    I hope I would be able to do a deep audit for my blogs to find out the common issue and fix them to boost my ranking in search engines.

    Thanks for the big share. 🙂

  • Billy Ross

    If your website isn’t really works out, or you are looking to improve your website traffic, page rank, SERP Rank, a periodic Internet site search engine optimization audit is needed. Needed to be focused on user experience(mobile-friendly) and HTML Improvements to increase SEO health.

  • Jacob Riff

    Hi James

    Did you check out We check a lot of what you cover in your post – and we do it automatically every 7 days.

    Jacob Riff

  • Brandon Prettyman

    Nothing really new on the list, but still good to have it all listed in one place. Thanks for putting it together.

  • Roman Prokopchuk

    Usually also as soon you notice one error there are more to follow and leads you along a path of discovering all fixes need to help maximize your site’s performance in search. Solid post thanks for sharing.

  • Dev

    Some great stuff here. I use most of the tools listed regularly and consider them invaluable. My website unfortunately fell victim to a hack and a bunch of spam pages were created. The pages have since been removed from the site 7 sitemap however GWT keeps trying to crawl them and returns me a 404 for the pages. After removing the pages I’ve essentially reset my sitemap, resubmitted it to GWT and marked all the 404 errors as fixed yet I keep getting them every week or two… I use Yoast SEO for my sitemap and they refuse to help if you’re not a premium customer. Any suggestions for fixing this 404 nightmare?

  • Markos

    Loved your article! I was looking for a checklist for my new website last month!

  • Anees Muhammed

    Hi James.,
    I was using some online tools to check broken links and there were severe inaccuracies in the result. Just downloaded Screaming Frog, Indeed it is working great. Thanks for sharing it…

  • Nidhi- software developer

    Well, checking for w3c error is one of crucial factor of onpage optimization.. thanks for great post.

  • Kristy Harper

    Thank’s for the great article! I never heard of Screaming Frog before.

  • BDM

    Screaming frog might be one of the best tools, I normally like to run a few of them though in case something is missed or overlooked. Another one I recommend is SEO centro, it’s pretty comprehensive as well.

    • Kelsey Jones

      So many people at conferences recommend Screaming Frog, probably because it’s easy to use and it works!

  • Nirav Sheth

    The website auditing is simply a must-to-generate process. Timely technical checks and SEO features must be kept alarmingly on. Appreciate the tips provided herein. Overall, it’s user experience that needs ever to be enriched!!

  • Ankit

    So, you are stating a user-friendly 40 improves ranking.
    And this has been tested and proven?
    (Again, it would make a bit of sense, and G did make a bit of noise over it – but I don’t think I’ve heard of anyone saying it’s a signal… I may have to test that)