Recently SEMrush posted a study on Google’s top ranking factors. The study was not unlike many other studies published each year. It used a statistically significant data set to draw parallels between common metrics and high position (or ranking) in Google.
However, the conclusions they came to and reported to the industry were not entirely correct.
Defining Ranking Factors & Best Practices
First, let’s define “ranking factors.”
Ranking factors are those elements which, when adjusted in connection with a website, will result in a change in position in a search engine (in this case, Google).
“Best practices” are different. Best practices are tactics which, when implemented, have shown a high correlation to better performance in search results.
XML sitemaps are an excellent example. Creating and uploading an XML sitemap is a best practice. The existence of the sitemap does not lead directly to better rankings. However, providing the sitemap to Google allows them to crawl and understand your site more efficiently.
When Google understands your site better, it can lead to better rankings. But an XML sitemap is not a ranking factor.
The only ranking factors that we know about for sure are the ones that Google specifically mentions. These tend to be esoteric, like “high authority” or “good content” or even “awesomeness”.
Google generally doesn’t provide specific ranking factors because any time that they do, webmasters go overboard. Remember the link wars of 2011-2014? They have learned their lesson.
For more on ranking factors and correlation vs. causation, check out this definition by Searchmetrics.
Understanding Correlation vs. Causation
The discussion of correlation vs. causation is not new. Dave Davies wrote a great post on this back in 2013 which still rings true.
Here’s another way to think of correlation vs. causation:
A large percentage of high-ranking websites probably have XML sitemaps. This is a correlation. The XML sitemap did not cause the site to obtain a high ranking.
This would be like saying if you eat sour cream, you will get into a motorcycle accident based on the correlation shown below.
Click here for more examples of strange correlations.
Let’s take another example.
High amounts of direct traffic were shown to have strong correlation with better ranking in the SEMrush study. This was a controversial statement, because it was presented as “direct traffic is the number one ranking factor.”
While the data is likely accurate, what does it really mean?
Let’s start by defining Direct Traffic. This is traffic that came to a website URL with no referrer header (i.e. the visitor didn’t come to the site via email, search, or links from another site). Thus, it includes any traffic for which Google Analytics (or the platform in question) cannot determine a referrer.
Direct Traffic is basically the bucket for “we don’t know where it came from.” Sessions are misattributed to Direct Traffic all the time, and some studies have shown that as much as 60 percent of direct traffic could actually be organic traffic.
In other words, it’s not a reliable metric.
Let’s assume for a moment Direct Traffic is a reliable metric. If a site has high direct traffic, they are also likely to have a strong brand, high authority, and loyal users. All of these things can help SEO ranking. But the connection is indirect.
Moving on from Direct Traffic, Searchmetrics falls victim to this correlation/causation problem as well in their latest Travel Ranking Factors study, where they assert word count and number of images are both ranking factors for the travel industry.
Google has directly debunked the word count assertion here and the number of images claim is so silly I had to ask John Mueller about it directly for this article:
If you read between the lines, you can tell Mueller says the use of a certain number of images as a ranking factor is foolish, and it can vary widely.
It is much more likely that a fuller treatment of the keyword in question is the ranking factor rather than strictly “word count” and good quality travel sites are more likely to have lots of images.
If you want even more proof word count is a silly metric for any industry, just check out the top result for “is it Christmas?” (h/t Casey Markee)
This site has been in the #1 spot since at least 2008, and it literally has one word on the entire site. But that one word fully answers the intention of the query.
While Searchmetrics does a nice job of defining ranking factors, their use of that term in relationship to this graph is irresponsible. These should be labeled “correlations” or similar, not “ranking factors.”
This is the crux of the matter. Studies using statistically significant, correlation, or even machine learning like the Random Forest model (what SEMRush used) can be accurate. I have no doubt that the results of all of the studies mentioned were accurate as long as the data that was fed into them was accurate. However, the problem came not in the data itself, but in the interpretation and reporting of that data, namely when they listed these metrics as “ranking factors”.
Evaluate the Metrics Used
This raises the need to use common sense to evaluate things that you read. For example, a study may claim that time on site is a ranking factor.
First, you have to question where that data came from, since it’s a site-specific metric that few would know or be able to guess at without website or analytics access. Most of the time, this sort of data comes from third-party plugins or toolbars that record users’ behavior on sites. The problem with this is that the data set will never be as complete as site-specific analytics data.
Second, you have to consider the metric itself. Here’s the problem with metrics like time on site and bounce rate. They’re relative.
After all, some industries (like maps or yellow pages) thrive on a high bounce rate. It means the user got what they needed and went on their way having had a good experience and being likely to return.
For a time on site example, let’s say you want to consult with a divorce attorney. If you’re smart, you use incognito mode (where most/all plugins are disabled) to do this search and the subsequent site visits. Otherwise your partner might see your website history or get targeted ads to them.
Imagine your partner seeing this in the Facebook news feed when he or she thinks your marriage is solid:
So for an industry like divorce attorneys, time on site data is likely to be either heavily skewed or not readily available.
But Google Owns an Analytics Platform!
Some of you will say that Google has access to this data through Google Analytics, and that’s absolutely true. However there has been no positive correlation ever shown between having an active Google Analytics account and ranking better on Google. Here’s a great article on the SEMPost that goes into more detail on this.
Google Analytics is only installed on 83.3 percent of “websites we know about”, according to W3techs. That’s a lot, but it isn’t every website, even if we do assume this is a representative sample. Google simply could not feed something into their algorithm that is not available in nearly 20 percent of cases.
Finally, some will make the argument that Chrome can collect direct traffic data. This has the same problem as Google Analytics though, because at last check, Chrome commanded an impressive 54 percent market share (according to StatCounter). That’s sizeable, but only slightly more than half of all browser traffic is not a reliable enough data source to make a ranking factor.
Doubling Down on Bad Information
Many of you have read this thinking that yes, we know all that. After all, we’re search professionals. We do this every day. We know that a graph that says direct traffic or bounce rate is a ranking factor has to be taken with a grain of salt.
The danger is when this information gets shared outside of our industry. We all have a responsibility to use our powers for good; we need to educate the world around us about SEO, not perpetuate stereotypes and myths.
I’m going to pick on Larry Kim for a minute here, who I think is a great guy and a very smart marketer. He recently posted the SEMrush ranking factor graph on Inc.com along with a well-reasoned article about why he thinks the study has value.
I had the opportunity to catch up with Larry by phone prior to finishing this article, and he impressed upon me that his intention with his post was to investigate the claim of direct traffic as a ranking factor. He felt that if a study showed that direct traffic had a high correlation with good search ranking, there had to be something more there.
I told him that while I don’t agree with everything in his article, I understand his train of thought. What I would like to see more of from everyone in the industry is understanding that outside our microcosm of keywords and SERP click-through rates, SEO is still a “black box” in many people’s minds.
Because SEO is complicated and confusing, and there’s a lot of bad information out there, we need to do everything that we can to clarify charts and studies and statements. The specific problem I have with Larry’s article is that lots of people outside of SEO read Inc. This includes many high-level decision makers who don’t necessarily know the finer points of SEO.
In my opinion, Larry sharing the graph as “ranking factors” and not debunking the obviously false information contained in the graph was not responsible. For example, any CEO looking at that graph could reasonably assume that his/her meta keywords hold some importance to ranking (not a lot based on the position on the graph, but some).
However, no major search engine has used meta keywords for regular SERP rankings (Google News is different) since at least 2009. This is objectively false information.
We have a responsibility as SEO professionals to stop the spread of bad or incomplete information. SEMrush published a study that was objectively valid, but the subjective interpretation of it created problems. Larry Kim republished the subjective interpretation without effectively qualifying it.
‘Always’ & ‘Never’ Don’t Exist in SEO
Last week, I met with a new client. They had been struggling to include five supplemental links in all of their content because at some point, an SEO told them they should ALWAYS link out to at least five sources on every article. Another client had been told they should NEVER link out from their website to anything.
Anyone who knows about SEO knows that either one of these statements is bad advice and patently false information.
We as SEO professionals can help stem the tide of these mythical “revelations” by emphasizing to our clients, our readers, and our colleagues that ALWAYS and NEVER do not exist in SEO because there are simply too many factors to say anything definitively is or is not a ranking factor unless a search engine has specifically stated that it is.
It Happens Every Day
Literally every day something is taken out of context, misattributed, or incorrectly correlated as a causation. Just recently, Google’s Webmaster Trends Analyst John Mueller said this in response to a tweet from Bill Hartzer:
TTFB for those non-SEOs reading is “Time to First Byte”. This refers to how quickly your server responds to the first request for your page.
Google has said on multiple occasions that speed is a ranking factor. What they have not said is exactly how it is measured. So Mueller says TTFB is not a ranking factor. Let’s assume he’s telling the truth and this is fact.
This does not mean you don’t have to worry about speed, or that you don’t have to be concerned with how quickly your server responds. In fact, he qualifies it in his tweet – it’s a “good proxy” and don’t “blindly focus” on it. There are myriad other ranking factors that could be negatively impacted by your TTFB. Your user experience may be poor if your TTFB is slow. Your site may not earn high mobile usability scores if your TTFB is slow.
Be very careful how you interpret information. Never take it at face value.
Mueller said TTFB is not a ranking factor. Now I know that is fact and I can point to his tweet when necessary. But I will not stop including TTFB in my audits; I will not stop encouraging clients to get this as low as possible. This statement changes nothing about how SEO professionals will do their jobs, and only serves to confuse the larger marketing community.
It is our responsibility to separate SEO fact from fiction; to interpret statements from Google as carefully as possible, and to generally dispel the myth that there is anything you ALWAYS or NEVER do in SEO.
Google uses over 200 ranking factors, or so they say. Chasing these mystical metrics is hard to resist – after all as SEOs, we are data-driven – sometimes to a fault.
When you interpret ranking factor studies, use a critical eye. How was the data collected, processed and correlated? If the third party is making a claim that something is a ranking factor, does it make sense that Google would use it?
And finally, does learning that x or y is or is not a ranking factor change anything about the recommendations you will make to your client or boss? The answer to that last one is almost always “no.” Too much depends on other factors, and knowing something is or is not a ranking factor is generally not actionable.
There’s no ALWAYS or NEVER in SEO and if we want SEO to continue to grow as a discipline, we need to get serious about explaining that. It’s time to take the responsibility we have to the outside world more seriously.
Searchmetrics and SEMRush were asked for comment, but did not respond prior to press time.
This post was originally published on JLH Marketing.
Screenshots taken by author, December 2017