There is little doubt that duplicate content on a site can be one of the biggest battles an SEO has to fight against. Too many content management systems are designed for and work great with content, but few SEO considerations are included in how that content is implemented throughout the website.
There are two kinds of duplicate content, onsite and offsite. Onsite duplication is content that is duplicated on two or more pages of your own site. Offsite duplication is when the content of one site is displayed on other websites. Onsite duplication is an issue you have control over while offsite duplication may be beyond your control. Both are problematic.
Why is Duplicate Content an Issue?
The best way to explain why duplicate content is bad is to first point out why unique content is good. Unique content sets you apart. It makes you different. It helps you stand out. Why? Because that content is unique to you and you alone.
When you use the same text to describe your products as the next guy, there is nothing that gives you an advantage over the next guy. When you have multiple URLs with the same information, word for word, there is nothing that makes one URL trump the other, and neither performs well.
Duplicate content essentially downgrades your content’s value. Search engines don’t want to send people to several pages that all say the same thing, so they look for content that is unique from everyone else. Unique content helps you compete with your competitors rather than with yourself.
When the search engines begin spidering your site, they pull the page’s content and put it in their index. If they start seeing page after page of duplicate content as they analyze the content of those pages, they decide to use their resources somewhere else. Perhaps on indexing unique pages on your competitors’
When you have internal site duplication, the self-competition is at its worst when you have particularly link-worthy content. Each duplicate URL of content may receive links, giving neither page the full value of the link juice pointing at that valuable content. When that content is located only on one URL, all links pointing to that content are consolidated onto a single page enhancing the authoritative value of the page
Dealing With Offsite Duplicate Content Problems
Offsite duplicate content has two main sources you can blame: it’s either your fault or someone else’s! At its core, it is either content you stole or content someone stole from you. Whether legally, with permission or without, offsite duplicate content is likely hurting your site from performing better in the search engine
Content Scrapers and Thieves
The worst content theft offenders are those that scrape content from across the web and publish it on their own sites. The result is generally a Frankensteinian collection of content pieces that produce less of a coherent finished product than the green blockhead himself. Generally these pages are designed solely to attract visitors and get them to leave as quickly as possible by clicking on the ads scattered throughout the page. There isn’t much you can do about these types of content scrapers, and search engines are actively trying to recognize them for what they are in order to purge them from their indexes.
Not everyone stealing content does it by scraping. Some just flat out take something you have written and pass it off as their own. These sites are generally higher-quality sites than the scraper sites, but some of the content is, in fact, lifted from other sources without permission. This type of duplication is more harmful than scrapers because the sites are, for the most part, seen as quality sites and the content is likely garnering links. Should the stolen content produce more incoming links than your own content, you’re apt to be outranked by your own content!
For the most part, scrapers can be ignored; however, some egregious violators and thieves can be gone after via legal means or filing a DMCA removal request.
In many cases content is published into distribution channels hoping to be picked up and republished on other websites. The value of this duplication is usually one or more links pointing to the author’s website.
Much of the content I write for our E-Marketing Performance blog is duplicated on other blog sites. This is strategic duplication, and I have to weigh carefully the pros and cons.
Each time my articles are posted somewhere else, I get a link back to my site. These links are incredibly valuable. I also get much wider exposure than I do on my own blog, allowing me to expand my reach far beyond my own natural borders. By keeping this duplication to a minimum, rather than en masse, I’m not at risk of creating mass off-site duplication that tends to hurt sites the most.
The downside is that whole duplicate content thing. I am no longer the sole holder of my content, which means I am potentially taking traffic away from my site and driving it to these other blogs. In fact, since many of these sites have more authority than my own, they often come up first in the search results above.
But this is a case where the pros outweigh the cons. At least for now. That may not always be the case.
The search engines make noise about finding the “canonical” version of such duplication to ensure the original content receives higher marks than the duplicate versions, but I have yet to see this play out in any kind of meaningful way. Years ago I asked a group of search engine engineers a question about this.
My question was that if there are two pieces of identical content and the search engines clearly know which one came first, do links pointing to the duplicated version count as links to the original version?
It would be great if this was in fact the case. I’d be happy even if the search engines split the link juice 50/50 between the duplicate site and the original site. Of course, that would also have to include social shares as well as links, but it is certainly something the search engines can do to reward original content over republished duplicate content, regardless of purposeful or nefarious intent.
Generic Product Descriptions
Some of the most common forms of duplicate content are through product descriptions. Thousands of sites on the web sell products, many of them the same or similar. Take for example any site selling books, CDs, DVDs or Blu-Ray discs. Each site basically has the same product library. Where do you suppose these sites get the product descriptions from? Most likely the movie studio, publisher, manufacturer or producer of the content. And since they all, ultimately, come from the same place, the descriptive content for these items is usually 100% identical.
Now multiply that across millions of different products and hundreds of thousands of websites selling those products. Unless each site takes the time to craft their own product descriptions, there’s a enough duplicate content to go around the solar system several times.
So with all these thousands of sites using the same product information, how does a search engine differentiate between one or another when a search is performed? Well, first and foremost, the search engines want to produce unique content, so if you’re selling the same product but you write a unique and compelling product description, you have a greater chance of pushing your way higher in the search
But left with no other factors to explore, the search engines have to look to the website as a whole. In these instances, the weight of the site itself and the number and quality of backlinks tend to be the strong factor. Given similar content with another site, a site that is more well-known, has a larger user audience, a better backlink structure and stronger social reach is likely to trump any other website.
Sites that provide unique product descriptions do have an advantage; however, unique content alone isn’t enough to outperform sites that have a strong historic and authoritative profile. But given a site of similar stature, unique content will almost always outperform duplicate content, providing the opportunity to grow into a stronger and stronger site. It takes time, but original content is the key to overcoming the pit of duplicate content despair.
Dealing with Onsite Duplicate Content Problems
The most problematic form of duplicate content, and the kind that you are most able to fight, is duplicate content on your own site. It’s one thing to fight a duplicate content battle with other sites that you do not control. It’s quite another to fight against your own internal duplicate content when, theoretically, you have the ability to fix it.
Duplicate onsite content generally stems from bad site architecture or, more precisely, bad website programming! When a site isn’t structured properly, all kinds of duplicate content problems surface, many of which can take some time to uncover and sort out.
Those who argue against good architecture usually cite Google propaganda about how Google can “figure out” these things and therefore can eliminate them from being an issue for your site. The problem with that scenario is it relies on Google figuring things out. Yes, Google can determine that some duplicate content shouldn’t be duplicate and the algorithms can take this into account when analyzing your site. But that’s no guarantee they will uncover it all or even apply the “fix” in the best way possible for your own
Just because your spouse is smart isn’t license for you to go around acting like a idiot. And just because Google may or may not figure out your problems and may or may not apply the proper workarounds is no excuse for not fixing the problems you have. If Google fails, you’re screwed. So the less you make Google work for you, the better Google will work for you.
Here are some common in-site duplicate content issues and how to fix them.
The Problem: Product Categorization Duplication
Many sites use content management systems that allow you to organize products by categories. In doing so, a unique URL is created for each product in each specific category. The problem arises when a single product is found in multiple categories. The CMS, therefore, generates a unique URL for each category
I’ve seen sites like this create up to ten URLs for every single product page. This type of duplication poses a real problem for the engines. A 5,000 product site suddenly becomes a 50,000 product site. But as the search engines spider and analyze, they realize that they have 45,000 duplicate pages!
If there was ever a reason for the search engine spider to abandon your site while indexing pages, this is it. The duplication creates an unnecessary burden on the engines, causing them to expend their resources in more valuable territory and leaving you out of the search results for a large number of pages.
Below is a screenshot I took several years ago from The Home Depot’s website. I found a particular product by navigating down two different paths. A book like this could easily be tied to several different categories, each one producing a unique URLand, therefore, a duplicate page of content.
Keep in mind that just because the navigation path is different, all the content on the page is 100% identical, save perhaps for the actual breadcrumb trail displayed at the top of the page. If ten people linked to each of these pages while a competitor got the same ten links, but to a single URL, which one do you think would top the search results? You guessed it, the competitor!