A hypothesis about a Google “content cannibalization penalty” is making the rounds of Facebook groups and blogs.
The case studies are very real and the solution of trimming similar content often works.
But does this mean the hypothesis is real?
Does knowing the real reason why a page doesn’t rank matter?
This article examines this so-called penalty and uses it as an example to show how to tell the difference between an idea about Google’s algorithm that’s possible and far-fetched “Pop SEO” ideas.
Is Content Cannibalization a Google Penalty?
When you start talking about penalties, what you are actually discussing are search engine algorithms.
And when you discuss search engine algorithms, one can always trace that back to a research paper or patent application.
There is an idea that search engines operate in mystery. But the reality is that Google operates quite openly.
Here is what Google says:
“Google publishes hundreds of research papers each year. Publishing is important to us; it enables us to collaborate and share ideas with, as well as learn from, the broader scientific community. Submissions are often made stronger by the fact that ideas have been tested through real product implementation by the time of publication.”
Pop SEO & Common Sense
Those who deal in Pop SEO Hypotheses tend to traffic in ideas that are common sense and reasonable.
They almost consistently never cite or link to scientific research papers, patents, or statements by Googlers.
Their “reasonable” ideas are founded on nothing more than gut feelings and common sense.
But common sense and reasonable ideas are what gave us the notion that the sun revolves around the earth, which you can plainly deduce with your own eyes.
I know it sounds counterintuitive, but it’s beneficial to resist common sense ideas when it comes to science, because in the science of information retrieval, all hypotheses can be proven or disproved by searching for an algorithmic reason or a statement by a Googler that will demonstrate that a hypothesis has a basis in reality, at which point the hypothesis becomes a theory.
There is a difference between a hypothesis and a theory:
- A hypothesis is like a reasoned guess, without any supporting evidence.
- A theory is based on evidence.
An SEO Theory is Preferable to an SEO Hypothesis
A defining characteristic of Pop SEO is that the ideas are comprised of hypotheses. You will never see a citation to any actual evidence that indicates the hypothesis has any basis in fact. They’re usually just reasonable guesses.
Science writer Kyle Hill has this to say about common sense:
“Science systematically and conscientiously pursues ‘real’ relationships backed by theory and evidence. Common sense does not. Common sense leads us to believe that giving children sugar causes them to be more hyper. Science shows us that this is not the case. We see possible correlations everywhere, but that does not mean much if we can’t prove it. ‘It seems right’ is not enough.”
Algorithms Can be Researched
When we talk about a “penalty”, one way to identify the penalty is by matching it to a known algorithm. But the problem with the content cannibalization hypothesis is that there is no spam catching or information retrieval algorithm whose purpose is to de-rank similar content from a single site. It simply does not exist.
So is it safe to say there is no content cannibalization penalty?
Yes and no.
There is something to it but the description is so general it’s like diagnosing what’s wrong with your car by saying it has engine trouble. Yes, it has engine trouble, but that doesn’t get you any closer to making the car run again.
It is safe to say that there is no Google penalty for shooting yourself in the foot by diluting the power of a strong piece of content.
If it’s Not a Penalty, What is it?
Pop SEO tends to confuse the inability to rank with a penalty.
For example, when someone obtains a truckload of poor quality links causing their site to rank for awhile until it subsequently drops, they call it a penalty. But it’s not a penalty. It’s called, ranking where it’s supposed to rank.
The SEO community calls it Churn and Burn. It can also be called Not Ranking. Take your pick, but it’s not a penalty.
Google tends to not penalize sites anymore. Sites simply do not rank.
Manual actions are actual penalties imposed by Google in order to take down a spam method that their algorithm can’t catch or to send a message that Google is aware of the spam network.
So What Is This Content Cannibalization Phenomenon?
Content cannibalization has been promoted as a Google penalty that only “advanced SEOs” know about.
This is untrue.
The idea of content cannibalization can be said to have been popularized by an article published by Rand Fishkin in 2007. An idea that has been popularized multiple times at Moz since 2007 is hardly a secret or advanced.
There are many different reasons why adding multiple pages of content on the same topic could cause the primary page to drop rankings.
Bad Information Architecture
Information architecture, as it relates to SEO, is the practice of creating meaningful ways for a site visitor to navigate through a website. It focuses on organizing webpages.
Poor site architecture can negatively affect how a site is crawled and how the pages are ranked. The effects are felt through the host load (how much crawling your web host will support) to how much PageRank is allocated, which will affect how deep a site is crawled. (Matt Cutts on site architecture)
The Pages are Thin Content Keyword-Based Doorway Pages
Creating a set of documents around keywords is a spam strategy called doorway pages. Google Webmaster Help has a page about this.
“…some webmasters attempt to improve their pages’ ranking and attract visitors by creating pages with many words but little or no authentic content. Google will take action against domains that try to rank more highly by just showing scraped or other cookie-cutter pages that don’t add substantial value to users.”
All the pages use similar words and phrases, causing them to compress at a higher rate. How much a site’s content compresses (like gzip compression) when a search engine stores it can signal that these sets of pages are spam.
See section 4.6 in Detecting Spam Web Pages through Content Analysis (PDF)
It’s Important to Understand Why Rankings Dropped
When a site’s rankings begin to fail because the publisher is adding similar content, that’s not always penalty. Most often it’s the site ranking where it should be.
It’s not that Google is confused and doesn’t know which page to rank. It could be any one of the reasons I listed above.
Knowing why a webpage no longer ranks is important. Having a good enough explanation is never good enough.
Would you spend $1,000 on a car repair because your auto mechanic has a gut feeling that it’s an “engine problem?” Engine problem is so general a diagnosis that it’s essentially useless for helping you diagnose why something is broken.
This the problem with Pop SEO ideas like content cannibalization, they are so broad the problems they seek to explain can be any number of issues. And like auto repair, you want to focus on repairing the problem, not guessing at what the problem is.
Having a good idea of how search algorithms work will help you diagnose and solve ranking problems far better than applying “gut instinct” Pop SEO solutions. Pop SEO solutions are often just guesses and they have a tendency to be inadequate for diagnosing site ranking problems.
Subscribe to SEJ
Get our weekly newsletter from SEJ's Founder Loren Baker about the latest news in the industry!