SEO · Tools

How to Find Your Website Duplicate Content Issues

We keep hearing about website internal duplicate content issues but many of us are sincerely unaware of the fact that that’s what our own website also suffers from. Why?

  • most of the websites are made with the help of third-party site creators that create duplicate URLs to same content and we don’t know about that;
  • sometimes webmasters just lack SEO knowledge. For example, they might be unaware of the fact that URLs are case sensitive and www.yoursite.com/page1 and www.yoursite.com/Page1 are handled as two different pages with the same content.


Now what can create duplicate content:

  • canonical issues (www and non-www version);
  • pagination when different pages have identical titles and meta description;
  • various versions of the home page (e.g. www.site.com and www.site.com/index.php);
  • incorrect internal navigation creating several URLs to one and the same page (e.g. www.site.com/page.php?id=567 and www.site/category/page.php?id=567); etc

Why is it important to get rid of duplicate content issues?

Google has mostly figured how to sort this out. It will drop one version and index and rank another one. But still internal content duplication may result in a few issues:

  • decreased crawl rate as Googlebot is kept busy crawling unnecessary identical pages;
  • a wrong version of the page ranked which results in bad user experience (e.g page 2 is ranked instead of page 1);
  • delayed ranking of newly launched sites.

What can help you to find internal duplicate content issues?

There are only few free tools available that can be of much help identifying your site duplicate content:

1. Duplicate content tool estimates the following:

  • www and non-www header response;
  • Google cache check;
  • Similarity check;
  • Default page check;
  • 404 header response;
  • PageRank dispersion check (i.e. if www and non-www versions have different PR).

dup content How to Find Your Website Duplicate Content Issues

2. Xenu scans all your site links and returns a table of all available URLs – all you have to do is to sort the list by title and find pages with identical titles.

xenu How to Find Your Website Duplicate Content Issues

3. Google Webmaster Tools reporting your site duplicate titles and meta descriptions.

More information on that: 7 ways to Tame Duplicate Content by Dr. Pete.

 How to Find Your Website Duplicate Content Issues
Ann Smarty is the blogger and community manager at Internet Marketing Ninjas. Ann's expertise in blogging and tools serve as a base for her writing, tutorials and her guest blogging project, MyBlogGuest.com.
 How to Find Your Website Duplicate Content Issues

Comments are closed.

36 thoughts on “How to Find Your Website Duplicate Content Issues

  1. Thanks for the useful tools. Could or have you explained more about “canonical issues (www and non-www version)” how to fix this?

    1. Yannis, it is easy to fix www and non-www version. You just have to configure your website’s .htaccess file and redirect the unexpected to expected with 301 redirect method.

  2. I always thought that the Google webmaster console and Xenu were the 2 tools that were invaluable in finding duplicate content. Thx Anna for pointing out the others.

  3. @ Ann, Thank you!

    @ Web Agency Chieti, Copyscape is good to find duplicate content outside your website. To check within your website, please use any one of the above tools.

  4. Ann – thanks again for pointing out the best online tools (and free to boot!) to help make our jobs a little easier.

    I always look forward to your posts b/c I know there will be another tool to go play with!

  5. Thank you very much Ann for introducing duplicate content tools to the world. It is very supportive for me.

    The first duplicate content tool virante.com not worked for the sub domain sites and blogs. Is there any duplicate content tools for sub domain sites and blogs?

  6. @Web Agency Chieti
    Xenu on windows really works without any manual! there are some options… but you will easily find them on the go.
    As for install… basically there is none! just copy the exe whare you want and it is ready.

  7. @Bernard

    maybe I don’t find a binary for windows. I honestly don’t remember what was the matter.
    For what I remember I read there was necessary a Perl installation or something like that.
    Too much expensive setup.
    But if you say me there’s a win version, certainly I looked on the wrong place. Could you please suggest me the right link path?

    Thanks.

  8. Many thanks for the Article Amy. I have been using Webmaster tools for a while and it has always helped, especially with pages that have duplicate titles. I may however also take a look at Xenu. Thanks for the links.

  9. Copyscape is a good tool to check for duplicate content outside your site but it is not allowing me to check one site a number of times. It requires premium registration. Is there any other free tool or software? Please answer this question and help us.

  10. Great tips. Any more suggestions about that. How about if it is a very large web site with more that 500.000 pages and the content is user generated. If there a way to find duplicate content without having to do everything manually.

  11. How bout if I hire someone to write articles for me, how can I check to see if these articles that are not online where copied from somewhere else thats online. ?

  12. If you’re looking to rewrite url’s you can create a .htaccess file and drop it in the root directory switch out http://www.ezsolution.com and index.asp for your default homepage in the example below to fix duplicate content issues. This code will redirect non-www to www and index.asp to the root (/)

    RewriteEngine ON

    RewriteBase /
    RewriteCond %{HTTP_HOST} ^ezsolution.com [NC]
    RewriteRule ^(.*)$ http://www.ezsolution.com/$1 [L,R=301]
    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.asp\ HTTP/
    RewriteRule ^index\.asp$ http://www.ezsolution.com/ [R=301,L]

  13. i have a blogger blog and have about 400 posts.i have changed custom domain on my blog 2 times.As a result i have lost all my visitors.Please tell me how can i get back all my visitors.according to me this is copyright issue.Please Help me website admin…!!