We keep hearing about website internal duplicate content issues but many of us are sincerely unaware of the fact that that’s what our own website also suffers from. Why?
- most of the websites are made with the help of third-party site creators that create duplicate URLs to same content and we don’t know about that;
- sometimes webmasters just lack SEO knowledge. For example, they might be unaware of the fact that URLs are case sensitive and www.yoursite.com/page1 and www.yoursite.com/Page1 are handled as two different pages with the same content.
Now what can create duplicate content:
- canonical issues (www and non-www version);
- pagination when different pages have identical titles and meta description;
- various versions of the home page (e.g. www.site.com and www.site.com/index.php);
- incorrect internal navigation creating several URLs to one and the same page (e.g. www.site.com/page.php?id=567 and www.site/category/page.php?id=567); etc
Why is it important to get rid of duplicate content issues?
Google has mostly figured how to sort this out. It will drop one version and index and rank another one. But still internal content duplication may result in a few issues:
- decreased crawl rate as Googlebot is kept busy crawling unnecessary identical pages;
- a wrong version of the page ranked which results in bad user experience (e.g page 2 is ranked instead of page 1);
- delayed ranking of newly launched sites.
What can help you to find internal duplicate content issues?
1. Duplicate content tool estimates the following:
- www and non-www header response;
- Google cache check;
- Similarity check;
- Default page check;
- 404 header response;
- PageRank dispersion check (i.e. if www and non-www versions have different PR).
2. Xenu scans all your site links and returns a table of all available URLs – all you have to do is to sort the list by title and find pages with identical titles.
3. Google Webmaster Tools reporting your site duplicate titles and meta descriptions.