The Complete Guide to Mastering Duplicate Content Issues

The Complete Guide to Mastering Duplicate Content Issues

There is little doubt that duplicate content on a site can be one of the biggest battles an SEO has to fight against. Too many content management systems are designed for and work great with content, but few SEO considerations are included in how that content is implemented throughout the website.

There are two kinds of duplicate content, onsite and offsite. Onsite duplication is content that is duplicated on two or more pages of your own site. Offsite duplication is when the content of one site is displayed on other websites. Onsite duplication is an issue you have control over while offsite duplication may be beyond your control. Both are problematic.

Why is Duplicate Content an Issue?

The best way to explain why duplicate content is bad is to first point out why unique content is good. Unique content sets you apart. It makes you different. It helps you stand out. Why? Because that content is unique to you and you alone.

When you use the same text to describe your products as the next guy, there is nothing that gives you an advantage over the next guy. When you have multiple URLs with the same information, word for word, there is nothing that makes one URL trump the other, and neither performs well.

duplicate content issue

Duplicate content essentially downgrades your content’s value. Search engines don’t want to send people to several pages that all say the same thing, so they look for content that is unique from everyone else. Unique content helps you compete with your competitors rather than with yourself.

When the search engines begin spidering your site, they pull the page’s content and put it in their index. If they start seeing page after page of duplicate content as they analyze the content of those pages, they decide to use their resources somewhere else. Perhaps on indexing unique pages on your competitors’

When you have internal site duplication, the self-competition is at its worst when you have particularly link-worthy content. Each duplicate URL of content may receive links, giving neither page the full value of the link juice pointing at that valuable content. When that content is located only on one URL, all links pointing to that content are consolidated onto a single page enhancing the authoritative value of the page

Dealing With Offsite Duplicate Content Problems

Offsite duplicate content has two main sources you can blame: it’s either your fault or someone else’s! At its core, it is either content you stole or content someone stole from you. Whether legally, with permission or without, offsite duplicate content is likely hurting your site from performing better in the search engine

Content Scrapers and Thieves

The worst content theft offenders are those that scrape content from across the web and publish it on their own sites. The result is generally a Frankensteinian collection of content pieces that produce less of a coherent finished product than the green blockhead himself. Generally these pages are designed solely to attract visitors and get them to leave as quickly as possible by clicking on the ads scattered throughout the page. There isn’t much you can do about these types of content scrapers, and search engines are actively trying to recognize them for what they are in order to purge them from their indexes.

Not everyone stealing content does it by scraping. Some just flat out take something you have written and pass it off as their own. These sites are generally higher-quality sites than the scraper sites, but some of the content is, in fact, lifted from other sources without permission. This type of duplication is more harmful than scrapers because the sites are, for the most part, seen as quality sites and the content is likely garnering links. Should the stolen content produce more incoming links than your own content, you’re apt to be outranked by your own content!

For the most part, scrapers can be ignored; however, some egregious violators and thieves can be gone after via legal means or filing a DMCA removal request.

Article Distribution

In many cases content is published into distribution channels hoping to be picked up and republished on other websites. The value of this duplication is usually one or more links pointing to the author’s website.

Much of the content I write for our E-Marketing Performance blog is duplicated on other blog sites. This is strategic duplication, and I have to weigh carefully the pros and cons.

Each time my articles are posted somewhere else, I get a link back to my site. These links are incredibly valuable. I also get much wider exposure than I do on my own blog, allowing me to expand my reach far beyond my own natural borders. By keeping this duplication to a minimum, rather than en masse, I’m not at risk of creating mass off-site duplication that tends to hurt sites the most.

The downside is that whole duplicate content thing. I am no longer the sole holder of my content, which means I am potentially taking traffic away from my site and driving it to these other blogs. In fact, since many of these sites have more authority than my own, they often come up first in the search results above.

But this is a case where the pros outweigh the cons. At least for now. That may not always be the case.

The search engines make noise about finding the “canonical” version of such duplication to ensure the original content receives higher marks than the duplicate versions, but I have yet to see this play out in any kind of meaningful way. Years ago I asked a group of search engine engineers a question about this.

My question was that if there are two pieces of identical content and the search engines clearly know which one came first, do links pointing to the duplicated version count as links to the original version?

It would be great if this was in fact the case. I’d be happy even if the search engines split the link juice 50/50 between the duplicate site and the original site. Of course, that would also have to include social shares as well as links, but it is certainly something the search engines can do to reward original content over republished duplicate content, regardless of purposeful or nefarious intent.

Generic Product Descriptions

Some of the most common forms of duplicate content are through product descriptions. Thousands of sites on the web sell products, many of them the same or similar. Take for example any site selling books, CDs, DVDs or Blu-Ray discs. Each site basically has the same product library. Where do you suppose these sites get the product descriptions from? Most likely the movie studio, publisher, manufacturer or producer of the content. And since they all, ultimately, come from the same place, the descriptive content for these items is usually 100% identical.

generic product descriptions

Now multiply that across millions of different products and hundreds of thousands of websites selling those products. Unless each site takes the time to craft their own product descriptions, there’s a enough duplicate content to go around the solar system several times.

So with all these thousands of sites using the same product information, how does a search engine differentiate between one or another when a search is performed? Well, first and foremost, the search engines want to produce unique content, so if you’re selling the same product but you write a unique and compelling product description, you have a greater chance of pushing your way higher in the search

But left with no other factors to explore, the search engines have to look to the website as a whole. In these instances, the weight of the site itself and the number and quality of backlinks tend to be the strong factor. Given similar content with another site, a site that is more well-known, has a larger user audience, a better backlink structure and stronger social reach is likely to trump any other website.

Sites that provide unique product descriptions do have an advantage; however, unique content alone isn’t enough to outperform sites that have a strong historic and authoritative profile. But given a site of similar stature, unique content will almost always outperform duplicate content, providing the opportunity to grow into a stronger and stronger site. It takes time, but original content is the key to overcoming the pit of duplicate content despair.

Dealing with Onsite Duplicate Content Problems

The most problematic form of duplicate content, and the kind that you are most able to fight, is duplicate content on your own site. It’s one thing to fight a duplicate content battle with other sites that you do not control. It’s quite another to fight against your own internal duplicate content when, theoretically, you have the ability to fix it.

Duplicate onsite content generally stems from bad site architecture or, more precisely, bad website programming! When a site isn’t structured properly, all kinds of duplicate content problems surface, many of which can take some time to uncover and sort out.

Those who argue against good architecture usually cite Google propaganda about how Google can “figure out” these things and therefore can eliminate them from being an issue for your site. The problem with that scenario is it relies on Google figuring things out. Yes, Google can determine that some duplicate content shouldn’t be duplicate and the algorithms can take this into account when analyzing your site. But that’s no guarantee they will uncover it all or even apply the “fix” in the best way possible for your own

Just because your spouse is smart isn’t license for you to go around acting like a idiot. And just because Google may or may not figure out your problems and may or may not apply the proper workarounds is no excuse for not fixing the problems you have. If Google fails, you’re screwed. So the less you make Google work for you, the better Google will work for you.

Here are some common in-site duplicate content issues and how to fix them.

The Problem: Product Categorization Duplication

Many sites use content management systems that allow you to organize products by categories. In doing so, a unique URL is created for each product in each specific category. The problem arises when a single product is found in multiple categories. The CMS, therefore, generates a unique URL for each category

I’ve seen sites like this create up to ten URLs for every single product page. This type of duplication poses a real problem for the engines. A 5,000 product site suddenly becomes a 50,000 product site. But as the search engines spider and analyze, they realize that they have 45,000 duplicate pages!

If there was ever a reason for the search engine spider to abandon your site while indexing pages, this is it. The duplication creates an unnecessary burden on the engines, causing them to expend their resources in more valuable territory and leaving you out of the search results for a large number of pages.

Below is a screenshot I took several years ago from The Home Depot’s website. I found a particular product by navigating down two different paths. A book like this could easily be tied to several different categories, each one producing a unique URLand, therefore, a duplicate page of content.

Keep in mind that just because the navigation path is different, all the content on the page is 100% identical, save perhaps for the actual breadcrumb trail displayed at the top of the page. If ten people linked to each of these pages while a competitor got the same ten links, but to a single URL, which one do you think would top the search results? You guessed it, the competitor!

Stoney G deGeyter

Stoney G deGeyter

Stoney deGeyter is the author of The Best Damn Web Marketing Checklist, Period!, and President of Pole Position Marketing, a leading web presence optimization firm helping businesses grow since 1998. Follow him on Twitter: @StoneyD and @PolePositionMkg.
Stoney G deGeyter

Comments are closed.

12 thoughts on “The Complete Guide to Mastering Duplicate Content Issues

  1. Great article that’s going to be filed for future reference. Especially like the thinking around dealing with the inevitable shopping cart product ID scenario. I don’t why more CMS system developers haven’t thought this through given the importance of organic search traffic for e-commerce. The old adage lives on – developers keep SEOs in business.

    1. Thanks for the very Informative post! eCommerce is where onsite optimization and technical fixes is challenging for an SEO. A same product page can definitely be accessed in numerous ways, CMSes of today almost clearly address these issues.

  2. Duplicate content does get a bad rap, though. Duplicate content is not always bad (emphasis always). Duplicate content can and does often rank high. In fact we sell sites that are complete with content, as well as our competitors. With over 6,000 clients, our content is blanketing the vertical market of our clients, and yet in many local searches, it shows up on page 1. Our competitors have also been successful with their templated sites showing high in search results.

    First, SEO is on the page level, not the site level. So you can have a site full of duplicate content, but as long as it is relevant content to your market, simply adding some custom content to your homepage is all that many folks need to do to rank well in local markets. In other words, duplicate content only hurts the page. It doesn’t hurt the site.

    Also, it only hurts the page when the keywords are also duplicate. My understanding is only one page of duplicate content can rank for a given keyword, however if the content is essentially the same but the keyword target is different, then the page can still rank for it’s own given keywords. In fact there was a thread started on Reddit on our sites. A simple search of one paragraph from our provided homepage content yielded almost 90,000 results. Yet I can show you many local searches around the country where that content is ranking on the first page of search results, and in some cases the #1 search in a fairly competitive local market.

    I point this out because I increasing get asked about this from client’s concerned with this issue, because they read about it somewhere. The bottom line is there is more to it then just saying duplicate content is bad. IF someone is interested in search rankings, I absolutely recommend they have unique custom content, but, if they ask me if duplicate content is going to hurt them, I have to be honest and say, no, not in my experience with this issue.

    So I am not disagreeing with the examples you provide, but I am saying no guide on mastering duplicate content issues would be complete without this additional discussion.

  3. Hi Stoney

    Wow! Now THAT is what I call “a resourceful article!” Interestingly, after reading it i realized that I actually have a major duplicate content issue going on with one of my client sites that WASN’T addressed in your article – for no other reason than it is a ‘really rare case’.

    Let me explain…

    Initially, most of the page URLs within the site were built in such a manner that every part of the URL beyond the domain (i.e. after the “/”) contained Upper and Lower Case letters:


    Later on, the webmaster decided to change the URL structure so that everything appeared in lower case:


    Naturally enough, this led to TWO different URLs pointing to the same content page (and when you multiply out the number of content pages involved, it amounted to 27% of the entire site).

    I wasn’t phased by it as I assumed 301 redirects would completely solve the problem i.e. >>>

    But alas! The particular version of the CMS (Contegro) that this site was built on, isn’t capable of allowing the creation of 301 redirects for the purpose of redirecting uppercase URLs to lower case ones. In other words, it doesn’t differentiate between upper and lower case URLs. So, any time someone tries to 301 redirect the upper case URLs to the lower case ones, a 301 loop is created. Not exactly the epitome of a good user experience!

    While later versions of this CMS do force upper case URLs to appear as lower case ones, this doesn’t help my client’s problem of duplicate content (as the damage has already been done). For the record, it is disheartening going into Google Analytics and seeing traffic data applicable to both versions of the URLs. Even more disheartening when I find links pointing to them!

    Bottom line… if 301 redirects cannot be used to resolve this problem (for the reason I’ve just stated), and Webmaster Tools URL Removal service can’t be used because the offending URLs still point to “live” pages (as opposed to deleted or blocked ones), what can I do? Is this client’s site destined for eternal damnation from Google’s condemnation of duplicate content? NOTE: Ever since the roll-out of Penguin 2.0, the site has slowly been dropping in the rankings. After exhaustive site audits and reviews, the ONLY real issue I can find that is remotely related to what the common understanding of Penguin 2.0 is about, is the issue of duplicate content (i.e. duplicate page titles, descriptions, body copy etc.) applicable to the issues I’ve outlined above.

    Love to hear your thoughts … and even better still, “LOVE TO KNOW HOW TO FIX IT!”



    1. Bruce, I have a client with a similar issue with the upper/lowercase URLs, however we were able to get 301 redirects in place for that. I think your best solution here is the use of the canonical tag. Use that to reference the search engines to the non-uppercase version. While I’m a big fan of fixing problems and not relying on “signals” like the canonical tag for the search engines to decide how to interpret, I think this is a case where the search engine will figure out pretty easy what’s going on and make sure you’re not hitting any type of penalty.

  4. Hello Stoney,

    Thank you for sharing this great post. I have learned something new today. Indeed, there are many sites that uses content management systems that allow you to organize products by categories.


  5. Hello,
    Now a days you are going good is not enough for your business. You must be focus more clearly upon your content & security concerns for your growth.

    The Next Level Of Optimization !

  6. the free microsoft SEO toolkit is your friend here. it gives you all the information you need to track down the source links of duplicate and canonical content. its awesome

  7. This post is a complete guide to every website. If taken care of all the things discussed here properly, the performance of the website can be enhanced many folds automatically. There are numerous website which are not following the important guidelines while creating urls.

  8. great post. i work for an online distributor of medical/dental supplies and equipment…. regarding generic product descriptions…. is there a certain percentage by which you should vary your product descriptions versus similar descriptions elsewhere on the web? in other words, to what extent do you have to vary content so that its considered original (or, at least “less duplicate”????)?

  9. Thats a good question. I don’t know if you have to change each product description, so much as provide some unique info/flavor for each. Create a voice and use that to make your products stand out with unique content, even while you keep much of the content similar. Of cours,e you’re better of rewriting them completely (minuse specifications) , but unique something is better than unique nothing.