Google’s John Mueller answered a question on LinkedIn about how Google chooses canonicals, offering advice about what SEOs and publishers can do to encourage Google how to pick the right URL.
What Is A Canonical URL?
In the situation where multiple URLs (the addresses for multiple web pages) have the same content, Google will choose one URL that will be representative for all of the pages. The chosen page is referred to as the canonical URL.
Google Search Central has published documentation that explains how SEOs and publishes can communicate their preference of which URL to use. None of these methods force Google to choose the preferred URL, they mainly serve as a strong hint.
There are three ways to indicate the canonical URL:
- Redirecting duplicate pages to the preferred URL (a strong signal)
- Use the rel=canonical link attribute to specify the preferred URL (a strong signal)
- List the preferred URL in the sitemap (a weak signal)
Some of Google’s canonicalization documentation incorrectly refers to the rel=canonical as a link element. The link tag, <link>, is the element. The rel=canonical is an attribute of the link element. Google also calls rel=canonical an annotation, which might be an internal way Google refers to it but it’s not the proper way to refer to rel=canonical (it’s an HTML attribute of the link element).
There are two important things you need to know about HTML elements and attributes:
- HTML elements are the building blocks for creating a web page.
- An HTML attribute is something that adds more information about that building block (the HTML element).
The Mozilla Developer Network HTML documentation (an authoritative source for HTML specifications) notes that “link” is an HTML element and that “rel=” is an attribute of the link element.
Person Read The Manual But Still Has Questions
The person reading Google’s documentation which lists the above three ways to specify a canonical still had questions so he asked it on LinkedIn.
He referred to the documentation as “doc” in his question:
“The mentioned doc suggests several ways to specify a canonical URL.
1. Adding tag in <head> section of the page, and another, 2. Through sitemap, etc.
So, if we consider only point 2 of the above.
Which means the sitemap—Technically it contains all the canonical links of a website.
Then why in some cases, a couple of the URLs in the sitemap throws: “Duplicate without user-selected canonical.” ?”
As I pointed out above, Google’s documentation says that the sitemap is a weak signal.
Google Uses More Signals For Canonicalization
John Mueller’s answer reveals that Google uses more factors or signals than what is officially documented.
He explained:
“If Google’s systems can tell that pages are similar enough that one of them could be focused on, then we use the factors listed in that document (and more) to try to determine which one to focus on.”
Internal Linking Is A Canonical Factor
Mueller next explained that internal links can be used to give Google a strong signal of which URL is the preferred one.
This is how Mueller answered:
“If you have a strong preference, it’s best to make that preference very obvious, by making sure everything on your site expresses that preference – including the link-rel-canonical in the head, sitemaps, internal links, etc. “
He then followed up with:
“When it comes to search, which one of the pages Google’s systems focus on doesn’t matter so much, they’d all be shown similarly in search. The exact URL shown is mostly just a matter for the user (who might see it) and for the site-owner (who might want to monitor & track that URL).”
Takeaways
In my experience it’s not uncommon that a large website contains old internal links that point to the wrong URL. Sometimes it’s not old internal links that are the cause, it’s 301 redirects from an old page to another URL that is not the preferred canonical. That can also lead to Google choosing a URL that is not preferred by the publisher.
If Google is choosing the wrong URL then it may be useful to crawl the entire site (like with Screaming Frog) and then look at the internal linking patterns as well as redirects because it may very well be that forgotten internal links hidden deep within the website or chained redirects to the wrong URL are causing Google to choose the wrong URL.
Google’s documentation also notes that external links to the wrong page could influence which page Google chooses as the canonical, so that’s one more thing that needs to be checked for debugging why the wrong URL is being ranked.
The important takeaway here is that if the standard ways of specifying the canonical are not working then it’s possible that there is an external links, or unintentional internal linking, or a forgotten redirect that is causing Google to choose the wrong URL. Or, as John Mueller suggested, increasing the amount of internal links to the preferred URL may help Google to choose the preferred URL.
Read the LinkedIn discussion here:
My question – The mentioned doc suggests several ways to specify a canonical URL
Featured Image by Shutterstock/Cast Of Thousands
 
         
        