9 Things to Validate While Auditing E-Commerce Sites for SEO

SMS Text
9 Things to Validate While Auditing E-Commerce Sites for SEO

E-commerce businesses live and die by Google rankings–and regular SEO audits are a must. Running a comprehensive SEO audit isn’t for the faint of heart; some problems are quick and easy to fix, while others take weeks or months to iron out. Knowing how to spot and fix common issues and opportunities is a good place to start.

When you run an audit, what should you be looking for? How do you resolve any issues you find?

Here are nine things to get right when auditing your e-commerce site for SEO.

1. Flush out Thin or Duplicate Content

The more content there is to crawl, the happier Google is to point traffic your way. When similar content is placed on separate pages, the traffic is split, weakening traffic to individual pages. Too much duplicate content decreases a site’s overall SEO. Duplicate content also puts you at risk of an algorithm penalty, which are some of the hardest SEO trouble spots to fix because you won’t get notified. You’ll just see a sudden sharp drop in traffic the same time as an algorithm update – which can be a year or more apart: that’s a long time to wait for bad traffic to improve, especially in e-commerce where traffic is directly correlated to sales.

Duplicate content is one of the most common problems in e-commerce and SEO, therefore, it will play a big part in this post.

How do sites wind up with duplicate content?

It’s easy to have duplicate content from one page to another within your site. It happens even if you’ve carefully avoided duplicate content when creating product description and other site elements. When a customer follows a different route to a page on your site, from a menu rather than a Google search, a new URL is created. That URL points to a page that’s otherwise identical to all the other copies of the same product page, and Google isn’t yet sophisticated enough to let you off the hook – it will treat you like any other site with duplicate content and penalize you.

How can you avoid this?

  • Product descriptions should always be your own. Never use the manufacturer’s lifeless, repetitive descriptions.
  • Use robots.txt to seal off any repetitive areas of a page, like headers and footers, so they don’t get crawled.
  • Use canonical tags (see below) to prevent accidental duplication.

There aren’t many tools that help you accurately identify and get rid of duplicate content without manual inputs. However, you can use a tool like DeepCrawl to inspect website architecture (especially HTTP status codes and link structure) and see if you can come up with automated quick fixes to URLs or HTML tags within the code.

In terms of content, DeepCrawl identifies duplicate content well and ranks pages by priority – a critically useful feature for e-commerce sites that have extensive product lines and use faceted navigation.

2. Use Canonical Tags

Canonical tags tell Google which parts of a website are “canonical” – a permanent part of the site.

Two things happen when multiple users with multiple routes to a page create multiple URLs that all point to the same page. First, Google sees them as different pages and splits traffic between them, and your page authority nosedives. Second, Google indexes each page as a duplicate.

Use canonical tags and Google will ignore those pages, rather than indexing them and flagging your site for duplication.

What does it look like when you get canonicalization wrong?

In Apache, URLs may appear like this:


In Microsoft IIS, you might see something like:


You might also see different versions with the same text capitalized differently.

If each page is the same, but Google counts them as unique due to their different URLS, traffic will be split four ways. A page that gets 1000 visitors shows up in Google’s algorithm as receiving 250, while the “other three pages” get the additional traffic.

So much for why. What about how?

Using canonical tags is an “opt-in” system, so you’re tagging the pages you want Google to index. It’s fairly simple. The following is a common scenario for large retail sites:

Things to Validate While Auditing E-commerce Sites | SEJ

3. Balance Link Equity and Crawlability

Crawl efficiency and link equity aren’t either/or, but they can wind up treading on each other’s toes if you’re not careful. Here’s how they fit together:

If you have a bunch of pages that you don’t want indexed, because they’re slowing down crawling, you can use robots.txt instructions like noindex and disallow to stop them from being crawled. That saves your crawl budget for pages that you actually care about. However, if you have high authority links on those pages, they won’t get picked up by Google.

There is a fine line between disallowing indexing, allowing Google bot to crawl through pages, passing link juice, and canonicalization. You need to understand the various ways in which these are implemented and the trade-offs among them, and walk each of these tightropes carefully.

Things to Validate While Auditing E-commerce Sites | SEJ

4. Paginate Categories

Even smaller e-commerce sites face pagination issues thanks to product pages. Pagination is a two-edged sword. If you’re going to have a site that anyone can find their way around, pagination is a must. At the same time, pagination can confuse Google. You can run into duplicate content issues (again), from paginated and view-all versions of the same page; backlinks and other ranking signals can be spread out among paginations, diluting their effects. Additionally, in very large categories, crawl depth can be an issue too.

You can use canonical tags to identify the view-all page as the “real” one for indexing purposes. That should save you from duplication issues, but there are some shortcomings to this approach. Multi-product categories and search results won’t have view-all pages due to the large size requirement, so you can’t canonicalize in that case. (If it is an option, though, Google assures us it will consolidate backlinks too.)

You can also use rel=“next” and rel=“prev” HTML markup to differentiate paginated pages; make sure if you do this you don’t canonicalize the first page. Instead, feature numbers in your URLs and make each page its own canonical tag to prevent duplication issues. This technique is likely to be more effective when there are a large number of pages with no view-all, i.e. most e-commerce sites. Google won’t follow your tags instructions as closely if you go this way, but if you have thousands of pages, this might be the best option.

When doing an audit on your e-commerce site, look for unlinked pages with pagination that is broken or leads nowhere. Also, look for standalone pages that need to be paginated but aren’t. Bear in mind that these are separate from those that are canonicalized.

5. Keep Your Sitemaps Fresh

If Google has to crawl thousands of pages to find a few dozen new product pages, it’s going to affect crawl efficiency – and your new products will take longer to show up in search. Sitemaps can help Google find new content faster and deliver it to users. Indexing first will result in increased authority, leading to a competitor advantage where you may rank ahead of your competitors with essentially the same product.

In addition to getting new pages into search results quickly, sitemaps offer you other advantages. Your Search Console features information on indexing problems, giving you insight into site performance.

When you’re building your sitemap, there are a few things to keep in mind. If you do international business, and your site is multilingual, you’ll need to use hreflang – to let Google know where a user is located and which language they need.

Watch out for content duplication within your sitemap: it won’t cost you anything with Google, but it will make site performance models unreliable. If you have mobile and desktop versions of your site, use rel=“alternate” tags to make that clear.

One of the quickest and yet most comprehensive ways to generate a sitemap is to use the free Screaming Frog SEO Spider, which allows you to include canonicalized and paginated pages as well as PDFs. It also automatically excludes any URLs that return an error, so you don’t have to worry about redirects and broken links popping up in your sitemap.

6. Use Schema Markup

Schema is:

“HTML tags that webmasters can use to markup their pages in ways recognized by major search providers… improve the display of search results, making it easier for people to find the right web pages.”

Schema doesn’t generate SEO benefits directly – you won’t get a boost from Google for using the tags. You also won’t be punished by Google for leaving them out. Their function is to make search results more relevant and help you serve your users better. You get indirect SEO benefits by having a better UX and a stickier site, thanks to more relevant traffic.

Schema markup means things like customer reviews and pricing show up in SERP, as it allows you to extend and define the data that’s displayed in search results.

One data type supported is products, which should have every retailer sitting up a little straighter. Google’s Data Highlighter tool will talk you through how to do everything from organizing pages into page sets, to flagging the content you want to be available in SERP. The only catch is, you need to have claimed your site in Google Search Console.

7. Simplify Your Taxonomy for Easy Navigation

Taxonomies affect crawl depth, load speed, and searchability. The best way to score better for all of these is to have a “shallow, broad” taxonomical structure, rather than a “narrow, deep” one.

What’s the difference? On a chart, deep narrow taxonomies have more layers – more questions to answer, or choices to make, before you reach your destination. Wide, shallow taxonomies offer multiple points of entry, more cross-links and less distance to product pages.

This matters to users as a rise in the number of clicks to reach content damages user experience. It also harms SEO directly (Google is increasingly taking UX into account now) by making your site harder to crawl, especially if it’s a big site to begin with.

A wider, more open taxonomy is easier to crawl, easier to search and easier to browse, so it’s recommended for e-commerce sites.

Here’s Bloom’s Taxonomy guide, to help with content creation in case you’re still evaluating how to structure your categories and product descriptions:


8. Speed it Up – Even More

Page load time isn’t a tech issue. It’s a customer service issue – and a bottom line issue. Right now, getting good load times will give you a substantial edge over your competition. Why? According to Kissmetrics, the average e-commerce website takes 6.5 seconds to load – and a 1-second delay equates to a 7% drop in conversions. Surely e-commerce sites are getting faster to cope, then? Nope. They’re slowing down, by 23% a year. This should be seen as an opportunity take the lead. Amazon found out that it increased revenue by 1% for every 100 milliseconds of improvement.

How should you accelerate load speeds?

There are a lot of simple tweaks you can make to your website in order to speed up loading times and get on the good side of Google. Start with static caching. If your site uses dynamic languages like PHP, they can add to lag times. Turn dynamic pages static and web servers will just serve them up without processing, slashing load times.

CSS and JavaScript might need your attention too. Try tools like CSS Compressor to trim down the style sheet element of pages and make them load faster.

And don’t forget images. While everyone realizes the advantage of smaller file sizes, specifying images” height and width are often overlooked. When you visit a URL, the browser downloads all the data on the page and begins rendering it together. If image sizes haven’t been specified, the browser has no idea how large they’re going to be until after they’ve fully downloaded. This delay forces the browser to “repaint” the layout.

Finally, if you have a big, content-rich site (you’re in e-commerce so I’m guessing you do), try using multiple servers so your content is loading several items at a time.

9. Monitor All Pages for Errors

Checking your web pages for common errors is one of the simplest ways to maintain strong SEO. Common errors include:

  • HTML validation errors: These can cause pages to look different on different devices or in different browsers, or even to fail altogether. The W3C Markup Validator will point you in the right direction.
  • Broken links: These damage user experience and lead to abandonment and poor reviews, harming your reputation.
  • Missing or broken images: If images aren’t working deep in a hundred thousand page site, would you know?
  • JavaScript errors: Choppy loading is the least of your problems if you have serious, undiagnosed JavaScript errors. They can actually make access to some parts of your site impossible. They’re potentially a security risk as well.

Over to You

Fixing errors like duplicate content and broken links can deliver substantial benefits. Address more technical matters like canonicalization and schema, and your e-commerce site should see improved search rankings, more real estate in the SERPs, and the jumps in traffic and revenue that go with it!

What challenges are you facing when auditing and optimizing your e-commerce site? Let’s discuss in the comments!


Image Credits

Featured Image: Image by Rohan Ayyar
In-post Photos: Images by Rohan Ayyar
Screenshot by Rohan Ayyar. Taken March 2016.


Rohan Ayyar

Rohan Ayyar

Project Manager at E2M Solutions
Rohan Ayyar keeps the conveyor moving at E2M Solutions, India's premier digital marketing agency. He also doubles at OnlyDesign.org, which helps companies build a remarkable... Read Full Bio
Rohan Ayyar
Subscribe to SEJ!
Get our weekly newsletter from SEJ's Founder Loren Baker about the latest news in the industry!
  • Justinas Kundrotas

    Thanks for this post, but I have to point to some serious issues with it. These should addressed in order to stop confusing people with myths and incorrect SEO suggestions. First up, let’s take this one from the post “Duplicate content also puts you at risk of an algorithm penalty, which are some of the hardest SEO trouble spots to fix because you won’t get notified. “. Google never penalizes for duplicate content, it’s a wide-spread and popular myth, many Google spokesmen talked about it, it’s really strange that people writing about SEO audits don’t know such things. Secondly, the author talks about duplicate and thin content as the same things (actually, “thin content” is mentioned only in paragraph headline …), well, these are absolutely different things. Google penalties targeting thin content on-site may look similar, but actually these do not have anything in common with only having multiple URLs with the same content which is the essential of duplicate content issues.
    “If you have a bunch of pages that you don’t want indexed, because they’re slowing down crawling, you can use robots.txt instructions like noindex and disallow to stop them from being crawled. ” Well, “noindex” isn’t supported in robots.txt. It may sometimes work as some recent studies show, but it’s really not recommended to use this officially unsupported way.
    “There is a fine line between disallowing indexing, allowing Google bot to crawl through pages, passing link juice, and canonicalization. You need to understand the various ways in which these are implemented and the trade-offs among them, and walk each of these tightropes carefully.” Very interesting paragraph and a graph right after it. But what does it mean?

    • Rohan Ayyar

      Thanks for your comments, Justinas. I stand corrected in the sentence “you can use robots.txt instructions like noindex and disallow to stop them from being crawled.” I meant robots instructions AND noindex. Of course, noindex is a meta robots tag value and not part of robots.txt. I’ll try and get the article updated.

      As for your thoughts about duplicate content, there are no “manual” penalties for duplicate content, but the “algorithm” definitely doesn’t like duplicate content. We know SEO experts and Google spokespeople differ on this, but you can do a small experiment and see for yourself – try copying your own pages in your site and see if you are or aren’t affected by the next Panda or Quality update. 🙂 E-commerce sites tend to accumulate stubs, near-duplicates and CMS-auto-generated pages over time, which Google does a good job of detecting, but we can’t hold them to it.

      Further, I don’t think the article talks about thin and duplicate content as the same things. Agreed, I didn’t define it precisely, but I had a long post ahead of me and I suppose the term “thin content” is self-explanatory, at least for SEJ readers. 🙂 We’ve all seen e-commerce pages with little to no product descriptions, multiple pages for products with varying attributes, and so on. It makes sense to club duplicate and thin content together when attempting to improve the quality, accessibility, crawlability and UX of your site.

    • alanbleiweiss

      I am going to agree with Rohan here.
      I specialize in forensic site audits. I’ve audited more than fifty ecommerce sites over just the past few years. In fact, I’m working on an article right now ONLY about duplicate content.

      While there is not penalty specific to duplicate content, there are many forms of duplicate and perceived duplicate content, where, on scale, confusion takes place in algorithms as to which “version” to index higher, and which, if any, to index at all.

      This is a major issue. And when that happens, Google often ends up indexing the wrong “version”, or doesn’t give enough “uniqueness” trust and authority value for that content to rank as high as it might otherwise.

      • Justinas Kundrotas

        Alan, thank you for comments! I know you perfectly 🙂 Looking forward to reading your new article on duplicate content, seems like it will be really interesting.

  • shubham gupta


  • Julio Sanoja

    Great post. I just would like to comment that as Matt Cutts says WordPress does 80% of SEO technical job, so probably the eCommerce WordPress application would work fine.

  • dhlpackers

    excellent article…………..

  • edie lowther

    Great Article , just amazing , loved it

  • Shalini Sharma

    Wow! its very google article which i found yet. For long time my organization started to work on e-commerce portals websites. So, we still facing many problem in auditing and working. Though this article I will try to improve my all issues step by step. thank you.

  • jamesgurd

    Hi Rohan, thanks for sharing a really useful article. In relation to indexation and taxonomy, establishing URL schemas is critical to technical SEO as this can help control what does and doesn’t get indexed. For example, setting which elements of the faceted navigation generate indexable URLs vs. those that generate non-indexable URLs. I’ve worked on a handful of enterprise SEO projects and defining the URL schema is one of the most time consuming and critical tasks. Thanks, James.

    • Rohan Ayyar

      Thanks for validating from a hands-on point of view, James. Indexable vs. Non-indexable URLs is easily one of the most complex tasks, made more difficult by code framework / CMS-specific peculiarities. For enterprise ecommerce sites, one can’t emphasize enough that a well-defined information architecture is the first step towards getting site structure right. Cheers.

  • Crestinfotech

    Hi, This article is
    excellent and cover all the topics which is very helpful
    when eCommerce website audit. It did help lots at the time of the
    eCommerce website audit.. Thanks for the shear great
    post with us.

  • steelsprr s

    Very useful information sharing.. i am working at ecommerce sites that make wonderful ideas for after read this article ..I got these steps… great sharing..keep working…

  • hema

    This article is very useful and knowledgeable info and thank for sharing.

  • ujjwal bhattarai

    All fair points (without going into too much details) however here is my biggest beef with this article – It does NOT address any specific E-Commerce needs. All 9 points could equally apply to most websites.

    “9 Things to Validate While Auditing E-Commerce Sites for SEO” -> I clicked on the link.
    could have been
    “9 Things to Validate While Auditing Your webites for SEO” -> I would not have clicked on this link.

    Misleading Title seemed like a click-bait.

    • Rohan Ayyar

      Good point, Ujjwal, but if you look closely again, I’m sure you’ll find these points apply *more* to e-commerce sites than to others, when it comes to SEO, especially duplication, pagination, canoncialization, crawlability and indexation, because of the large number of products and pages involved. For the other points too – sitemaps, speed, etc. – I’ve attempted to focus on how they apply to e-commerce. Of course the title implies discussion about SEO from an e-commerce point of view – not e-commerce from an SEO pov.

      What would would be a few “specific E-commerce needs” that you’d like addressed? Cheers.

  • Zygis

    I’d like to ask more about this:
    “Use robots.txt to seal off any repetitive areas of a page, like headers and footers, so they don’t get crawled.”

    Header and footer are in page elements. Could you provide more info how is it possible to disallow index specific parts within one URL? And if so what are the best practices? I mean to disallow them from indexing in all URLs or only in some URLs (for e.g. main page excluded)?

    • Rohan Ayyar

      You got me on this one! Some e-commerce sites I worked with had large blocks of text e.g. shipping policies in the footer of every product page. It used to be the case that you could block footer.html by disallowing the /include/ folder on custom-built sites. This was true only for .html pages as header/footer.php isn’t crawled separately but the html output from within PHP files is lumped together with that of the home page. But now it isn’t recommended to block folders like wp-include as they include js files.

      So now if you have a big repetitive chunk of text in your footer, you could try changing the extension of footer.php to footer.incl (with require or require_once in the main body) so it cannot execute on its own and then putting it in a separate /include/ or such folder with a blank index.html in there to hide it from googlebot – there will be no links to these files and you don’t need to rely on robots.txt to block them.

      • Petar Jovetic

        Hey Rohan,

        Great post, although this was a point that stuck out for me also. I feel it’s a tad paranoid around the real implications of duplicate content. Content in the header and footer constitutes more ‘common’ content and therefore unavoidable. It comes part and parcel with having a website, surely? We should spend our time instead ensuring our in-body content is valuable and helpful for the user.

      • Rohan Ayyar

        “We should spend our time instead ensuring our in-body content is valuable and helpful for the user.” Couldn’t agree more with you, Petar! Unfortunately in the real word, many a time there is a disconnect between the content and SEO teams – even at large e-commerce retailers. My gripe with Google is that while it claims to be able to tell primary content from secondary, the algorithm isn’t always able to do that. And while they say they don’t penalize you for same-domain duplicate content, I suppose they forgot to add the word “directly” – you never know what filters like the “quality update” do to you. With significant text/code duplication, there is a real risk of overall site authority going for a toss.

  • info webtech

    Hi, This article is

    excellent and cover all the topics which is very helpful

    when eCommerce website audit.

  • Sławomir Piwowarczyk

    “…seal off any repetitive areas of a page, like headers and footers, so they don’t get crawled” – sorry mate but it seems you have no idea what you are talking about. Following your advice may get people in trouble

  • Dean

    Thanks for your great article Rohan. I’d like to hear your thoughts on best practices for an e-commerce site that has tens of thousands, or even millions of SKUs. Generally, those product descriptions come from a manufacturer. Given the magnitude and impracticality of rewriting all the product descriptions to be unique, what advice would you give to minimize the manufacturers lifeless and repetitive descriptions (that many other e-commerce site are also using?

    • Rohan Ayyar

      How do you eat an elephant? One bite at a time. :~)

      Start with the parts you like most (read, products that bring you traffic).

  • steelsprr s

    Thanks for sharing wonderful aricle… I really hope this is grew up my knowledge…great points sharing with us..keep working…

  • Raji Seo

    I am really very happy to read this article.. wonderful points sharing..