SEO

11 Sources of Duplicate Content You’re Probably Unaware of

There’s so much ‘noise’ in the world of Internet marketing, and especially about SEO, that I sometimes feel the edges are blurring. If I feel that, as a professional, then it’s a sure thing that individual website owners hoping to strike it lucky with the search engines are having a tough time sorting out information they need, from that which is purely misleading.

Algorithms, traffic potential, even ‘latent semantic indexing’ and a whole load of other technical jargon, in general tends to put people off learning about SEO, even if they know they need to optimize their website in order to be successful online. Ethical SEOs try to alleviate their fears and put them at ease. But sadly, there are a lot of less-than-ethical SEOs who compound the fear because they hope to capitalize on it.

The alternative to search engine optimization is paying for your traffic, usually through PPC. This is not a bad idea, but of course the more free traffic you get, the more profit you’re making.

Google, being the biggest search engine by far, dictates to us that there are certain things we can do to get ranked, and a heck of a lot of things we cannot do, because if we do them, and Google discovers that, then we will lose our ranking and may even get banned.

SEO—What You Must Know!

There are basics that you need to know if you’re to have any kind of a successful website. Google and all the other search engines basically measure two aspects of your site: content and validation, which is links. Links into your website are regarded as votes of authority. Those are the two main things you have to think about when you create a website.

Original, informative content will take you far, because most websites online do not provide the information that people are looking for. Duplicate content is one thing that will get you penalized for sure though. In fact, a duplicate content penalty could cause your site to drop out of the search results like a stone.

Some ‘experts’ will tell you that you don’t need to worry about duplicate content if you have enough incoming links. True. But you need a mighty big pile of links in order to successfully counteract the effects of duplicate content. It’s sort of like eating loads of sugar and taking insulin to counteract it—sooner or later you’ll pay for your indiscretion. I’ve also seen experts saying that if your duplicate content is on your site, it’s OK, and that it’s only duplicate content from another site that’s a problem. This is absolutely the opposite of the truth—just to give you an example of how some ‘experts’ tilt the playing field on purpose.

Let’s look at the causes of duplicate content and what to do about it. Of course if you went out and stole someone else’s web content you seriously need to rethink your strategy. More to the point, visitors to your site will get irritated if you serve up the same old stuff on every page—it’s no way to run a business. You need, more than anything else, to offer well-written, original and informative content. Every word on your page counts when Google or the human visitor is judging originality.

Accidental Duplicate Content

But you may have duplicate content, not because you stole it, or even because someone else stole yours, but because you have the same words on many of your own website pages.  It can happen for a number of reasons.

So what kind of content could constitute duplicate content? Any text that occurs on every, or at least multiple pages of your site.

  1. Any type of disclosure or disclaimer
  2. Words that occur in your web template
  3. Navigation
  4. Footer,
  5. Sidebars,
  6. Your name and address,
  7. Contact form (if it appears on every page).

Think about it. That can add up to a lot of words, and if you only have about 200 words of ‘content’ on every page, you might well find that you’re getting into deep trouble with Google—and Yahoo too for that matter. In my experience Yahoo hates duplicate content even more than Google. There should always be a balance: more content than template words.

Even if you resolve these issues you’re probably still not done if you run a blog.

  1. If you show a snippet of your posts on the main page, this will constitute duplicate content with a lot of blog themes.
  2. If you habitually put your blog posts into multiple categories, that would be duplicate content too. I only submit my blog posts to one category, ever, to avoid this problem.
  3. Archives on your blog can also constitute duplicate content.

You can solve duplicate content issues of this type by adding no-index to the duplicate pages. You can find reliable information on doing just that here.

Anyway, to return to the main solution to the problem, if you want to avoid duplicate content issues, you need to make absolutely sure that there is more unique, original content on each page than there is duplicate text that ocurrs naturally on every page. If you make a rough count of the words that are always there, make a rule to always add content that exceeds that count by a comfortable margin just to make sure.

Content for Human Eyes Only

What can you do to get rid of duplicate content that has to be there? If there’s a lot of it, try putting it in a form that humans can read but the search engines can’t read: this would be one great use for a flash file, or even an image file  – but don’t forget to do that in a way that keeps the file size as small as possible so that you can keep the page load time down (here are some tools for on-page image analysis). That’s an issue that can hold you back in a search that we’ll keep for another time.

Patricia Skinner is an SEO consultant, social media coach & reputation management expert. She is also community leader at the nascent SEO Self Regulation Community. She can be reached any time through her SEO website. Why not follow her on Twitter & her LinkedIn profile.

 11 Sources of Duplicate Content You’re Probably Unaware of
Patricia Skinner is co-founder and Search and Social Director at Mideast SEO, and spends her days doing what she loves best; cooking up winning strategies for business branding, social media marketing and organic search. Her original blog, Wellwrittenwords is also sporadically maintained. Find her on Twitter: ISpeakSEO and LinkedIn

Comments are closed.

39 thoughts on “11 Sources of Duplicate Content You’re Probably Unaware of

  1. I’ve always been told that if 60% of your site is unique text then the rest will not count as dupe. Meaning if the body of the page is different on every page then the headers and footer will not even be noticed. Has anyone else thought this?

  2. Hi Thomson,
    That’s why Iwas telling you to make sure that for each page you create, make sure that the content is more than the words that appear on every page.

    Raphaelle: It’s not just Google we’re worrying about–in my experience Yahoo is even more sensitive about duplicate content. I once put some up on my website by mistake and I went from #3 on Yahoo to about page 50. I also noticed a steep drop on Google. If you doubt that though, please go ahead and put up as much duplicate content as you like. :)

  3. Yes Patricia, I have seen off late yahoo has been tightening the screws for duplicate content. I have noticed that if you can please google then others are by default happy :)

  4. I would add the the crack down on duplicate content will get worst before it gets better. I can immagine the CFO’s frustration at Google cutting all those ad sense checks to site owners and blog owners posting duplicat content. The cost to Google for duplicate content must be in the millions since populating content from articles and feeds has been a web strategy/business model since ad sense was introduced.

  5. Patricia I totally agree with you. In the comments section of my last article I described the issue a little bit in regard to headers and footers and sidebars. You’re absolutely correct in those being a factor.

    One of my newest clients maintains a national listing directory for a particular niche market and has over 20,000 pages on the site as a result. The problem is that most of those pages only have a small chunk of unique content related to individual businesses.

    He asked “why are 80% of my pages not indexed at Google.

    And of course the reason was twofold – it’s a brand new site so there’s hardly any inbound links. And duplicate content.

    We’re working on a way to turbo-charge the high quality unique content across the site, and that’s a massive undertaking on this scale.

    I’d also offer that as you described, some of the duplicate content might be eliminated by using Flash – that requires extreme caution because some of it is vital for other purposes, and even it’s own SEO value.

    On a final note, I had been concerned about this on my blog and now after reading this, I know I need to address the duplicate article content issue!

  6. The article is very informative especially about the accidental duplicate content. There are very small small things which needs to be considered for SEO and the article has done just that.

  7. Patricia, it is nice article and 100 % agree with you.. I think I have to check again my article at my site.. .keep in touch.

  8. Hi Patricia,
    I know that duplicate content can be a killer, but I find it hard to believe that things like navigation, “normal” footer info (without spamming) and name and address will be considered duplicate content in a significant way. In fact, those are all recommendations from at least Google and Yahoo to have on your pages.
    I see that you say there should be a balance and more content than template words (which I missed in my first readuing). But these items(the 3 I mention above) in a large part are usability items for visitors and the search engines, so I have a hard time conceiving that there would be huge penalty. Or that I should go in and unnecessarily load up on words just to outweight the navigation and footer.
    I’m sure it is good to take this into consideration, I am not sure how much time it would be wise to spend, especially on some of a site’s smaller product/item pages.

  9. Hi Bob, I’m not saying they’re huge issues: what I am saying is that sometimes, small issues like these will decide between you and a competitor (in their favor) if you have a shortage of unique content. If you think this kind of thing is time consuming, try crafting PPC ads (which is the alternative to good SEO, let’s face it) to save money and get results. :) Just sayin’.

  10. Patricia, you alwasy write very good articles. I Agree on providing unique content and the time consumption for PPC Ads. I just can’t get there with you on nav, name/address & footer info being deciding factors, with regards to dupe content. While these are duplicated across website pages, I just can’t get to the point where I think that SE’s take them into consideration as “duplicate content”, one page over another. Rather, I think that it is more likely that just having more unique and descriptive copy that controls an edge in ranking, than one page being id’d as dupe content because of nav, name/address & footer. Maybe just a small difference in the way I an interpreting your meaning that I am over reacting on.

  11. I think Patricia’s point regarding the percentage of duplicate content is key to the issue of repeating elements such as headers and footers. Unless you have pages with very small amounts of copy I think these elements will usually only account for a small percentage of the content on any given page.

    I did have one thought regarding humans vs. robots and the use of images/Flash rather than text. Taking this route can cause accessibility problems, so you’ll want to make sure the content is still available via an alt attribute. Unless you have an extreme redundancy problem it may be safer to stick with text.

    In any case it’s an interesting issue to think about and a nice reminder that while we often worry about not having the right key words on our pages it is also easy to overdo it.

  12. Terrific article –

    When I have a post of ‘normal’ length I post it on the main page of the blog. When I have a long article, I put it on a page of its own as a child page under the ‘Articles’ parent, and post a snippet on the main blog page with a [ read more... ] link to the article.

    I thought I was being sensible and I have nice internal links.

    Now I wonder how small a snippet needs to be to get under the Google radar?

  13. I work on multiple blogs with followed category pages (just the excerpt is visible though), and when I go into Webmaster Tools, it says that there is no duplicate content. I think that having your theme’s post titles linked to the original post helps tell Google that the post is more important than the category page. Plus, in your Sitemap.xml, setting the importance of your posts higher than the category pages is also a great indicator that the post is what should rank for targeted keywords.

    But I do agree that having your category pages or archive pages show the full post content sure doesn’t help your site at all.

    Do you agree?

  14. Patricia, I have a question about your statement: “If you habitually put your blog posts into multiple categories, that would be duplicate content too.”

    Isn’t it more accurate to say that if the same blog post can be found “at more than one URL” it will be considered duplicate content?

    WordPress 2.7 seems to allow you to file a post in multiple categories, but the post retains its original URL.

    Are you saying that even this is considered duplicate content?

  15. both Google and Yahoo have released papers in which they state they are able to identify areas of a page (ie. navigation, footers, sidebars, disclosures) and prevent those common repeating features/areas from counting as duplicate content.

    Otherwise, we would have to either remove those important pieces from each page or make them significantly different on each page. Both of which make no sense to site owners and users.

  16. Hi Simon, believe me, if you have less unique content on your pages than those template words, you will pay a price. :)

    Hi Craig: Content management software, including WordPress, is famous for causing any number of duplicate content issues because of the way it duplicates posts for a number of reasons, including in multiple categories. This is actually not a new revelation–the problem has been discussed online for years. If you do a search for WordPress duplicate content issues you’ll be astounded at the number of articles tackling the question. :)

  17. Another way avoid duplicate content is by putting the copyright notice at the bottom of the page. In this way spammers will not try to copy it because they are afraid of penalties. If in case you have multiple domains, you can consider the permanent redirection. This is where the articles are submitted in a text format so that the search spiders will view it as natural content. You can also used anti plagiarism software to check if your articles are not copied.Copyscape is one of good duplicate content checker tool.

  18. I just posted about this in a forum. I’m one of the ones that believe that people are getting a little paranoid about duplicate content. While it’s certainly important to address (and I like the article Patricia), don’t be too worried about this if you’re doing everything else right. I have many different websites and some with duplicate content. We have not noticed any penalties on the sites that we actively maintain.

  19. Duplicate content doesn’t cause any penalties, it is just discounted by the search engines.

    Think of all the legitimate reasons there are for repeating other peoples content such as quoting it or highlighting an article or repeating stats. There is no duplicate content penalty. Google simply discounts it.

    I was slightly surprised to read a whole article focussed on duplicate content and not one mention of the canonical tag either. That’s a pretty core part of dealing with duplicate content these days.

    Encouraging people to no index pages seems a little rash, what about any potential link equity these pages may have gained. If you have more duplicate than unique content on a page that is no reason to no index the page.

  20. SLight: I believe there are penalities, based on my experience. And why would you discourage ppl from no-indexing pages when you just said yourself that if there are two pages with identical or very similar content, the engines will only index 1 of them?

    Quoting a line from someone else’s content shouldn’t cause any duplicate content issues–if you’re quoting whole pages you’re not quoting, you’re stealing content.

    You’re right about the canonical tag but that deserves a post all by itself. It’s a solution by the way, not a cause of duplicate content. :)

  21. Hi Patricia,
    Thanks for the response. My problem with noindexing pages is that you are wasting any links or potential links to that page. If it is use full to the user then someone may well link to it.

    As far as I am aware there is no penalty for duplicate content, a penalty is where you would have lower rankings for having it. Removing the duplicate content from a page, alone, will not bring your rankings up. However when you put unique content in instead it will.

    In terms of stealing content think about things like syndicated articles and press releases. If a new car gets released and load s of sites all put up the manufacturers specs for the car. All these things create pages with a lot of legitimate duplicate content.

    I’d be happy to run a test on duplicate content and see, to be fair I’ve never tested it myself and it would certainly resolve this.

  22. I stay original but have two webpages. One time I put pics from one site and a shorter version, really a snippet to send them over to other site. Do you think this hurts in search engines?
    I can’t remember what it is but there is a site to check if anyone is stealing your stuff. You type in a line from post. Do you know what I am talking about?

  23. Duplicate content can be different. #1 – is to remember content made for humans, not for robots. Other point – when your unique content are stolen.
    .-= Tobto´s last blog ..Chrome palette =-.

    1. Mostly, google will choose to index the content that existed first, so don’t worry too much about stolen content. Duplicate content on your own site is far more of a problem.

  24. Hi
    I run an article directory and recently my site has become hacked. When I trie to build a sitemap the crawler picks up my whole directory as duplicate the search engines condemed it i went from 417000 hits the month of August
    now 10 a day. There is no help with this any were I tried removing the wild card
    and still nothing If any Sugestions
    contact me please.

    1. How do you know your site has been hacked? Maybe it’s just a glitch in your content management ? Your hits may have taken a dive because you’ve been penalized by Google. One problem I can from the outset is that you have too many links from your home page. Elementary SEO stuff. You may need professional help with this.

  25. I have a problem where I submit to multiple categories on my blog. The biggest culprit I think is my “featured” category in my newspaper style theme.

    Is there a way I can block the spider from crawling the featured category so as not to be hit with duplicate content?

  26. Yes this fact is often overlooked by most website owners. Make sure you avoid unnecessary repetition in you pages. Most of the time, these things don’t need to be there in the first place.

  27. Thanks Patrica for this great article. However being an internet marketer, I never encountered with a situation when my website got punished by Google or Yahoo due to so called duplicate content on navigation, header etc. I have few wordpress blogs where whole sidebar contains the same content throughout the website, but still they perform very well on most of the competitive keywords.

  28. how footer, header, author info ect. all these can be countable as dup by Google?

    Had this been the case then every site wuld be counted as dup because footer , header is same everywhere. it cant change from page to page

    ATUL