How to Find Your Website Duplicate Content Issues

SMS Text

We keep hearing about website internal duplicate content issues but many of us are sincerely unaware of the fact that that’s what our own website also suffers from. Why?

  • most of the websites are made with the help of third-party site creators that create duplicate URLs to same content and we don’t know about that;
  • sometimes webmasters just lack SEO knowledge. For example, they might be unaware of the fact that URLs are case sensitive and www.yoursite.com/page1 and www.yoursite.com/Page1 are handled as two different pages with the same content.


Now what can create duplicate content:

  • canonical issues (www and non-www version);
  • pagination when different pages have identical titles and meta description;
  • various versions of the home page (e.g. www.site.com and www.site.com/index.php);
  • incorrect internal navigation creating several URLs to one and the same page (e.g. www.site.com/page.php?id=567 and www.site/category/page.php?id=567); etc

Why is it important to get rid of duplicate content issues?

Google has mostly figured how to sort this out. It will drop one version and index and rank another one. But still internal content duplication may result in a few issues:

  • decreased crawl rate as Googlebot is kept busy crawling unnecessary identical pages;
  • a wrong version of the page ranked which results in bad user experience (e.g page 2 is ranked instead of page 1);
  • delayed ranking of newly launched sites.

What can help you to find internal duplicate content issues?

There are only few free tools available that can be of much help identifying your site duplicate content:

1. Duplicate content tool estimates the following:

  • www and non-www header response;
  • Google cache check;
  • Similarity check;
  • Default page check;
  • 404 header response;
  • PageRank dispersion check (i.e. if www and non-www versions have different PR).

Duplicate content tool

2. Xenu scans all your site links and returns a table of all available URLs – all you have to do is to sort the list by title and find pages with identical titles.

xenu.jpg

3. Google Webmaster Tools reporting your site duplicate titles and meta descriptions.

More information on that: 7 ways to Tame Duplicate Content by Dr. Pete.

Ann Smarty
Ann Smarty is the blogger and community manager at Internet Marketing Ninjas. Ann's expertise in blogging and tools serve as a base for her writing, tutorials and her guest blogging project, MyBlogGuest.com.
Ann Smarty
  • http://www.10000listings.com Yannis

    Thanks for the useful tools. Could or have you explained more about “canonical issues (www and non-www version)” how to fix this?

    • http://artlogdigi.com Avirat

      Yannis, it is easy to fix www and non-www version. You just have to configure your website’s .htaccess file and redirect the unexpected to expected with 301 redirect method.

  • http://www.kensavage.com Ken Savage

    I always thought that the Google webmaster console and Xenu were the 2 tools that were invaluable in finding duplicate content. Thx Anna for pointing out the others.

  • http://digitalovercast.com Ken

    Great article Amy. Thanks for the helpful tools in finding dup content!

  • http://www.web-ma.com Web Agency Chieti

    About similarity check and copyright there is this good search engine for duplicate content at http://www.copyscape.com/

  • http://www.realtor.com.ru Realtor

    That’s a good free tools for SEO. I will try out this tool.

  • http://www.aztecsoft.com/our_services/independent_testing/ Software Testing

    @ Ann, Thank you!

    @ Web Agency Chieti, Copyscape is good to find duplicate content outside your website. To check within your website, please use any one of the above tools.

  • http://www.web-ma.com Web Agency Chieti

    @Software Testing

    You are right. On this way, has anyone some installation guide on how to easy install Xenu on windows or eventually a Mac?

  • http://www.LunaMetrics.com Jim

    Ann – thanks again for pointing out the best online tools (and free to boot!) to help make our jobs a little easier.

    I always look forward to your posts b/c I know there will be another tool to go play with!

  • http://www.forwardslashmarketing.co.uk/ Gidseo

    Ann
    Your knack of delivering great information at the right time is uncanny.
    Thank you.

  • http://maduraimeenakshitempleforu.blogspot.com/ vignarajan

    Thank you very much Ann for introducing duplicate content tools to the world. It is very supportive for me.

    The first duplicate content tool virante.com not worked for the sub domain sites and blogs. Is there any duplicate content tools for sub domain sites and blogs?

  • http://www.condosphilippines.com Phil Condo

    Very useful tool, thanks for the link.

  • Dilipprasad

    Thank you very much Ann. It is a very useful article.

  • http://www.multi-sources.fr Bernard Savonet

    @Web Agency Chieti
    Xenu on windows really works without any manual! there are some options… but you will easily find them on the go.
    As for install… basically there is none! just copy the exe whare you want and it is ready.

  • http://www.web-ma.com Web Agency Chieti

    @Bernard

    maybe I don’t find a binary for windows. I honestly don’t remember what was the matter.
    For what I remember I read there was necessary a Perl installation or something like that.
    Too much expensive setup.
    But if you say me there’s a win version, certainly I looked on the wrong place. Could you please suggest me the right link path?

    Thanks.

  • http://www.multi-sources.fr Bernard Savonet

    @web agency chieti

    Xenu is at http://home.snafu.de/tilman/xenulink.html

    Download is for Windows ONLY! see
    http://home.snafu.de/tilman/xenulink.html#Download
    No Perl. No linux. No Apache.
    Cost: 2 minutes (install time)

    The install manual is a two-liner. You also have a video that show how to use it.

  • http://www.web-ma.com Web Agency Chieti

    @Bernard

    Geez … I should became really blind to never see that link !!!! :S

  • ryan

    were is my awebsite i cant find my own website i made it an google

  • http://www.safari-guide.co.uk Safari Holiday Guide

    Many thanks for the Article Amy. I have been using Webmaster tools for a while and it has always helped, especially with pages that have duplicate titles. I may however also take a look at Xenu. Thanks for the links.

  • http://www.sulumitsretsambewno.com sulumits retsambew

    This information is new to me. That tools really good. I think it will be very helpful for me. Thanks

  • http://www.image-recovery-software.com/ image recovery

    your article is really great. Thanks for such a nice informational post.

  • http://latestexams.com/ Some Questions

    Copyscape is a good tool to check for duplicate content outside your site but it is not allowing me to check one site a number of times. It requires premium registration. Is there any other free tool or software? Please answer this question and help us.

  • http://www.askgetanswer.com handa

    nice tut, i had big problem with my site this helped me alot

  • http://pro4all.net Fahad

    Great Info thank you Ann Smarty .

  • http://steveanastasiadis.com/ Stefanos

    Great tips. Any more suggestions about that. How about if it is a very large web site with more that 500.000 pages and the content is user generated. If there a way to find duplicate content without having to do everything manually.

  • http://www.flood-pictures.com/ flood heri pictures

    I get error from google webmaster tools that my meta too short. is it problem for seo?

  • Tom L

    How bout if I hire someone to write articles for me, how can I check to see if these articles that are not online where copied from somewhere else thats online. ?

    • http://www.insomnia-connection.com Wendy

      Tom you would be better off using Copyscape for checking these articles. It’s free just tp do a few checks.

  • http://www.seobrighton.com brighton seo

    great tools amy – always find the answer what I’m looking for on your blog. Now – to solve my numerous duplicate content issues!

  • http://www.ezsolution.com Matt Boaman

    If you’re looking to rewrite url’s you can create a .htaccess file and drop it in the root directory switch out http://www.ezsolution.com and index.asp for your default homepage in the example below to fix duplicate content issues. This code will redirect non-www to www and index.asp to the root (/)

    RewriteEngine ON

    RewriteBase /
    RewriteCond %{HTTP_HOST} ^ezsolution.com [NC]
    RewriteRule ^(.*)$ http://www.ezsolution.com/$1 [L,R=301]
    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.asp\ HTTP/
    RewriteRule ^index\.asp$ http://www.ezsolution.com/ [R=301,L]

  • http://www.thai-classified.com/ ลงประกาศฟรี

    Thanks for these tools. I read all your articles about google penalty to solve my site now.

  • http://www.thai-classified.com/ ลงประกาศฟรี

    Thanks for these tools. I read all your articles about google penalty to solve my site now.

  • http://websitemaintain.com/ Royal

    WOW thats an awesome tool ….I was looking for that how can i check dupliccat title and description…Xenu is great. Thanks ANN

  • http://www.propakistani.info English Poetry

    i have a blogger blog and have about 400 posts.i have changed custom domain on my blog 2 times.As a result i have lost all my visitors.Please tell me how can i get back all my visitors.according to me this is copyright issue.Please Help me website admin…!!

  • http://www.jordansfireworks.co.uk david jordan

    thanks for the hints and tips only just found your page, will have a good look at our sites now. many thanks

  • http://www.technotrait.com/ Rafaqat

    very right,i would like to know about the “duplicate anchor text ” testing service….can you help me in this regard…/