SEO

How to Protect Your Site from Canonical Triggers

What happens when search engine spiders get lost? They cannot figure out what way to go on your site.

They sometimes get confused or other sites are sending them to another way to get to the same page on your site.

  • domain.com
  • www.domain.com
  • domain.com/index.html
  • www.domain.com/index.html
  • https://domain.com/index.html
  • https://domain.com

Some servers use mod_dir which causes additional issues by redirecting the domain without a training slash to the domain with a trailing slash so domain.com redirects to domain.com/.

It is very rare that this ever causes an issue. But, it is a reason when link building that you should always use the trailing slash in a link you add. It is the proper way to link to a site. Ever notice how the Open Directory and many other directories require their editors to add the trailing slash?

Canonical means the “Authoritative Path”.

This is how you tell the Search Engines that these are the pages of your site. Since you are essentially talking to robots, you need to take extra precautions because robots “do not think”. If a robot is caught in a loop or sees pages that are actually the same but have 3 to 6 different paths to get to, it will consider these additional pages.

So if the spider gets confused it can cause it to duplicate your pages and place importance on the unintended form of the page you wanted. Hence, it may put priority on index.html instead of the domain itself. Which is why a proper home page link is either “http://www.domain.com/” or “/” but NEVER /index.xxx.

Canonical issues can take many forms and problems with them are becoming rare thanks to sitemap programs and the increasing awareness of the factors. Yes, they do still exist and can be caused by an webmaster that has no knowledge of the SEO factors involved in developing proper website architecture.

Duplicate Content Issues

Duplicate pages are also caused by using the same contact form with different dynamic variables. So a form may be contact.asp?id=california and the same form may also be contact.asp?id=new york. This means that Google sees the exact page with different ways to get to it and treats it as spam.

The simple fix for this is a rel=”nofollow” tag or banning contact.php wildcards in the robots.txt file. This is becoming a common task on many dynamic sites, I have added this here because we can consider this a potential canonical trigger as the path becomes duplicated.

Adding a SSL (secure) certificate to a site and making a page with the exact same navigation as the rest of the site is a mistake, this means you kept the relative URL links on the https page -oooops. This could be done by every designer but not every SEO as the SEO understands the canonical factors. In creating this with relative links, you have now given the Search Engine spiders access to the entire site under a new domain.

The same way Google treats the non-www and the www. As two sites, you just added 3 sites.

Oh Boy, now you’ve caused a potential trigger that can remove a website from Googles good graces by valuing the wrong version. Search Engines are more likely to consider this as spam or a decision that its automation robots must now make — Is the site more important in the non-www for the www Form or the https form or even the https://www. Form? Its almost a potential nightmare waiting to happen if the decision is wrong.

Relax! There are many simple fixes:

  1. Always program the site to be friendly by using Absolute Links when developing navigation and adding links to internal pages of your site. Absolute links can also help in preventing automated content stealing, which sites try to own your content by and ranking with it, and “theoretically” it is extremely possible that a third party site can take your content and rank for it while you get hit as a duplicate page and no longer rank for it.
  2. Use the rel=”nofollow” in the href tags of pages that go to a secure server, and or pages that go to dynamic forms. This tells the spiders right off not to count the pages as a link, in effect helping them understand the priority of the page from the href relevancy command. This can help increase internal page quality as well by removing the potential trigger for “Mad Lib” spam.
  3. Use a Canonical URL redirect fix. I have listed many here in my 301 and Canonical Redirect Tutorial. I am still looking for the Mac WebSTAR Canonical version which would be appreciated.
  4. Robots.txt out files and wildcards. Not all search engines use the wildcards. Yahoo does, MSN does, Google does as well these may be best used bu identifying the robot and the path.

In trying to keep this short, I may have overlooked other methods of fixing canonical issues. Please feel free to add them in the comments below.

It’s always great to learn all avenues that can help secure the proper architecture of websites.

Alan Rabinowitz is the CEO of SEO Image, a New York based SEO and Internet Marketing company which focuses on corporate branding and positioning in search engines.

cc046d7aa150377a522a189306027ffa How to Protect Your Site from Canonical Triggers
Alan Rabinowitz is the CEO of SEO Image, a New York based SEO and Internet Marketing company which focuses on corporate branding and positioning in search engines.
cc046d7aa150377a522a189306027ffa How to Protect Your Site from Canonical Triggers

You Might Also Like

Comments are closed.

7 thoughts on “How to Protect Your Site from Canonical Triggers

  1. I never use absolute links for my sites for internal linking. I think I will hav to go back and change all of them to absolute urls now. Tat’s going to be a pain..

    “301 and Canonical Redirect Tutorial.” – you might want to correct that link. There is a ‘.’ at the end of ‘html’.

  2. “You’d think spiders would be somewhat smart enough to filter out index.html”

    Not all sites use index. or default. as their home pages.

  3. Hey guys, a bad canonical configuration can kill your site with a duplicate penalty.

    I recently found a bug on my server that was redirect a mail subdomain to my www

    I do not even have a mail subdomain in my account setings. It must be an internal sub domain configured by the hosting people.

    Well I used my htaccess to fix it.
    http://mail.travelconnecxion.com

    Please, I cannot stress the importance of having all your canonical domains go to one authoritative domain.

    It is also important if you have an SSL on your server. 301 redirect HTTPS to HTTP, same time excluding the HTTPS that you actually need.

    There is a problem redirecting index.xxx to a folder because of the infinite loop effect.

    A nice trick is to check the server request for index.xxx HTTP/ using an Apache condition statement.

  4. I would also add to make sure to send all of your internal links and your homepage link to your main domain version and the absolute link. The average webmaster without knowledge of SEO will just copy=>paste http://domain.com/index.php into the link field of their dreamweaver when the correct URL should have been http://www.domain.com/. I wouldnt leave stray links to chance thus giving bots the chance to find improperly setup canonical pages.