How to Find Your Website Duplicate Content Issues

SMS Text

We keep hearing about website internal duplicate content issues but many of us are sincerely unaware of the fact that that’s what our own website also suffers from. Why?

  • most of the websites are made with the help of third-party site creators that create duplicate URLs to same content and we don’t know about that;
  • sometimes webmasters just lack SEO knowledge. For example, they might be unaware of the fact that URLs are case sensitive and www.yoursite.com/page1 and www.yoursite.com/Page1 are handled as two different pages with the same content.


Now what can create duplicate content:

  • canonical issues (www and non-www version);
  • pagination when different pages have identical titles and meta description;
  • various versions of the home page (e.g. www.site.com and www.site.com/index.php);
  • incorrect internal navigation creating several URLs to one and the same page (e.g. www.site.com/page.php?id=567 and www.site/category/page.php?id=567); etc

Why is it important to get rid of duplicate content issues?

Google has mostly figured how to sort this out. It will drop one version and index and rank another one. But still internal content duplication may result in a few issues:

  • decreased crawl rate as Googlebot is kept busy crawling unnecessary identical pages;
  • a wrong version of the page ranked which results in bad user experience (e.g page 2 is ranked instead of page 1);
  • delayed ranking of newly launched sites.

What can help you to find internal duplicate content issues?

There are only few free tools available that can be of much help identifying your site duplicate content:

1. Duplicate content tool estimates the following:

  • www and non-www header response;
  • Google cache check;
  • Similarity check;
  • Default page check;
  • 404 header response;
  • PageRank dispersion check (i.e. if www and non-www versions have different PR).

Duplicate content tool

2. Xenu scans all your site links and returns a table of all available URLs – all you have to do is to sort the list by title and find pages with identical titles.

xenu.jpg

3. Google Webmaster Tools reporting your site duplicate titles and meta descriptions.

More information on that: 7 ways to Tame Duplicate Content by Dr. Pete.

Ann Smarty
Ann Smarty is the blogger and community manager at Internet Marketing Ninjas. Ann's expertise in blogging and tools serve as a base for her writing, tutorials and her guest blogging project, MyBlogGuest.com.
Ann Smarty
Get the latest news from Search Engine Journal!
We value your privacy! See our policy here.
  • Thanks for the useful tools. Could or have you explained more about “canonical issues (www and non-www version)” how to fix this?

    • Yannis, it is easy to fix www and non-www version. You just have to configure your website’s .htaccess file and redirect the unexpected to expected with 301 redirect method.

  • I always thought that the Google webmaster console and Xenu were the 2 tools that were invaluable in finding duplicate content. Thx Anna for pointing out the others.

  • Great article Amy. Thanks for the helpful tools in finding dup content!

  • About similarity check and copyright there is this good search engine for duplicate content at http://www.copyscape.com/

  • That’s a good free tools for SEO. I will try out this tool.

  • @ Ann, Thank you!

    @ Web Agency Chieti, Copyscape is good to find duplicate content outside your website. To check within your website, please use any one of the above tools.

  • @Software Testing

    You are right. On this way, has anyone some installation guide on how to easy install Xenu on windows or eventually a Mac?

  • Ann – thanks again for pointing out the best online tools (and free to boot!) to help make our jobs a little easier.

    I always look forward to your posts b/c I know there will be another tool to go play with!

  • Ann
    Your knack of delivering great information at the right time is uncanny.
    Thank you.

  • Thank you very much Ann for introducing duplicate content tools to the world. It is very supportive for me.

    The first duplicate content tool virante.com not worked for the sub domain sites and blogs. Is there any duplicate content tools for sub domain sites and blogs?

  • Very useful tool, thanks for the link.

  • Dilipprasad

    Thank you very much Ann. It is a very useful article.

  • @Web Agency Chieti
    Xenu on windows really works without any manual! there are some options… but you will easily find them on the go.
    As for install… basically there is none! just copy the exe whare you want and it is ready.

  • @Bernard

    maybe I don’t find a binary for windows. I honestly don’t remember what was the matter.
    For what I remember I read there was necessary a Perl installation or something like that.
    Too much expensive setup.
    But if you say me there’s a win version, certainly I looked on the wrong place. Could you please suggest me the right link path?

    Thanks.

  • @web agency chieti

    Xenu is at http://home.snafu.de/tilman/xenulink.html

    Download is for Windows ONLY! see
    http://home.snafu.de/tilman/xenulink.html#Download
    No Perl. No linux. No Apache.
    Cost: 2 minutes (install time)

    The install manual is a two-liner. You also have a video that show how to use it.

  • @Bernard

    Geez … I should became really blind to never see that link !!!! :S

  • ryan

    were is my awebsite i cant find my own website i made it an google

  • Many thanks for the Article Amy. I have been using Webmaster tools for a while and it has always helped, especially with pages that have duplicate titles. I may however also take a look at Xenu. Thanks for the links.

  • This information is new to me. That tools really good. I think it will be very helpful for me. Thanks

  • your article is really great. Thanks for such a nice informational post.

  • Copyscape is a good tool to check for duplicate content outside your site but it is not allowing me to check one site a number of times. It requires premium registration. Is there any other free tool or software? Please answer this question and help us.

  • nice tut, i had big problem with my site this helped me alot

  • Great Info thank you Ann Smarty .

  • Great tips. Any more suggestions about that. How about if it is a very large web site with more that 500.000 pages and the content is user generated. If there a way to find duplicate content without having to do everything manually.

  • I get error from google webmaster tools that my meta too short. is it problem for seo?

  • Tom L

    How bout if I hire someone to write articles for me, how can I check to see if these articles that are not online where copied from somewhere else thats online. ?

    • Tom you would be better off using Copyscape for checking these articles. It’s free just tp do a few checks.

  • great tools amy – always find the answer what I’m looking for on your blog. Now – to solve my numerous duplicate content issues!

  • If you’re looking to rewrite url’s you can create a .htaccess file and drop it in the root directory switch out http://www.ezsolution.com and index.asp for your default homepage in the example below to fix duplicate content issues. This code will redirect non-www to www and index.asp to the root (/)

    RewriteEngine ON

    RewriteBase /
    RewriteCond %{HTTP_HOST} ^ezsolution.com [NC]
    RewriteRule ^(.*)$ http://www.ezsolution.com/$1 [L,R=301]
    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.asp\ HTTP/
    RewriteRule ^index\.asp$ http://www.ezsolution.com/ [R=301,L]

  • Thanks for these tools. I read all your articles about google penalty to solve my site now.

  • Thanks for these tools. I read all your articles about google penalty to solve my site now.

  • WOW thats an awesome tool ….I was looking for that how can i check dupliccat title and description…Xenu is great. Thanks ANN

  • i have a blogger blog and have about 400 posts.i have changed custom domain on my blog 2 times.As a result i have lost all my visitors.Please tell me how can i get back all my visitors.according to me this is copyright issue.Please Help me website admin…!!

  • thanks for the hints and tips only just found your page, will have a good look at our sites now. many thanks

  • very right,i would like to know about the “duplicate anchor text ” testing service….can you help me in this regard…/