Google’s search results have been hit by a spam attack for the past few days in what can only be described as completely out of control. Many domains are ranking for hundreds of thousands of keywords each, an indication that the scale of this attack could easily reach into the millions of keyword phrases.
The spam was initially discovered by Lily Ray:
If you currently Google "craigslist used auto parts," every single result in the top 20 is spam, minus the first two results from Craigslist.
— Lily Ray 😏 (@lilyraynyc) December 20, 2023
How Google’s Algorithms Might Be Gamed
The spam sites appear to be taking advantage of at least three windows of opportunities that a part of how Google ranks websites. These opportunities are not new and spammers have been taking advantage of them for many years but not to the extent that has been going on lately.
Perhaps the most important reason for the success of the spam is that the search queries the spam sites are ranking for are low competition, which makes it easier to rank.
There are two kinds of low volume search queries where the spam sites are finding opportunities.
Opportunity 1. Local search algorithm: Local Search is a type of search that is triggered when people search for things nearby, like a restaurant or movie times. It’s a more permissive algorithm that allows a local restaurant with no links to rank.
Opportunity 2. Longtail Keywords: Longtail keywords are queries low volume phrases, one-offs that happen once a month or once a year. Consequently these queries are low competition which makes it easier to rank.
Opportunity 3. Many of the spam sites are brand new. The domains have been registered within 24 to 48 hours previous to ranking.
Google gives brand new sites a short period of time where it gets the benefit of a doubt while Google’s algorithm figures out the site during a short honeymoon period where the site is able to rank for search queries.
Many of the domains have only been registered within the past 24-48 hours. That could mean that those domains are also taking advantage of this small window of opportunity to sneak in, rank for millions of search queries then disappear.
A Googler described why new sites can rank:
“In particular, with completely new websites, one of the difficulties that we have is we might not have a lot of signals for those websites so we have to make estimates.
And depending on how we make estimates, it can sometimes mean that in the beginning we show this website a little bit more visibly than like it turns out that the signals tell us in the end.
Links Help Google Find The Spam Sites
This recently came to my attention from a series of posts by Bill Hartzer (LinkedIn profile) where he published a link graph generated by the Majestic backlinks tool that exposed the link networks of several of the spam sites.
Screenshot Of Tightly Interlinked Network
Bill and I talked about the spam sites over Facebook messenger and we both agreed that although the spammers put a lot of work into creating a backlink network, the links weren’t actually responsible for the high rankings.
The links are likely there to help Google find the brand new spam sites and get them crawled and eventually ranking.
“This, in my opinion, is partly the fault of Google, who appears to be putting more emphasis on content rather than links.”
I agree 100% that Google is putting more emphasis on content than links. But my thoughts are that the spam links are there so that Googlebot can discover the spam pages and index them, even if just for one or two days.
Once indexed the spam pages are likely exploiting what I consider two loopholes in Google’s algorithms, which I talk about next.
Out of Control Spam in Google SERPs
Multiple sites are ranking for longtail phrases that are somewhat easy to rank, as well as phrases with a local search component, which are also easy to rank.
Longtail is a concept that’s been around for almost twenty years and subsequently popularized by a 2006 book called The Long Tail: Why the Future of Business is Selling Less of More.
Spammers are able to rank for these rarely searched phrases because there is little competition for those phrases, which makes it easy to rank.
So if a spammer creates millions of pages of longtail phrases those pages can then rank for hundreds of thousands of keywords every day in a short period of time.
Companies like Amazon use the principle of the longtail to sell hundreds of thousands of individual products a day which is different than selling one product hundreds of thousands of times per day.
That’s what the spammers are exploiting, the ease of ranking for longtail phrases.
The second thing that the spammers are exploiting is the loophole that’s inherent in Local Search.
The local search algorithm is not the same as the algorithm for ranking non-local keywords.
The examples that have come to light are variations of Craigslist and related keywords.
Examples are phrases like Craigslist auto parts, Craigslist rooms to rent, Craigslist for sale by owner and thousands of other keywords, most of which don’t use the word Craigslist.
The scale of the spam is huge and it goes far beyond than keywords with the word “Craigslist” in it.
What The Spam Page Looks Like
Taking a look at what the spam page looks like is impossible by visiting the pages with a browser.
I tried to see the source code of the sites that rank in Google but all of the spam sites automatically redirect to another domain.
I next entered the spam URL into the W3C link checker to visit the website but the W3C bot couldn’t see the site either.
So I changed my browser user agent to identify itself as Googlebot but the spam site still redirected me.
That indicated that the site was not checking if the user agent was Googlebot.
The spam site was checking for Googlebot IP addresses. If the visitor’s IP address matched as belonging to Google then the spam page displayed content to Googlebot.
All other visitors got a redirect to other domains that displayed sketchy content.
In order to see the HTML of the website I had to visit with a Google IP address. So I used Google’s Rich Results tester to visit the spam site and record the HTML of the page.
I showed Bill Hartzer how to extract the HTML by using the Rich Results tester and he immediately went off to tweet about it, lol. Dang!
The Rich Results Tester has an option to show the HTML of a webpage. So copied the HTML, pasted it into a text file then saved it it as an HTML file.
Screenshot Of HTML Provided By Rich Results Tool
I was now able to see what the webpage looks like to Google:
Screenshot Of Spam Webpage
One Domain Ranks For 300,000+ Keywords
Bill sent me a spreadsheet containing a list of keyword phrases that just one of the spam sites ranked for. One spam site, just one of them, ranked for over 300,000 keyword phrases.
Screenshot Showing Keywords For One Domain
There were a lot of Craigslist keyword phrases but there were also other longtail phrases, many of which contained a local search element. As I mentioned, it’s easy to rank for longtail phrases, easy to rank for local search phrases and combine the two kinds of phrases and it’s really easy to rank for these keyword phrases.
Why Does This Spam Technique Work?
As previously mentioned, local search uses a different algorithm than the non-local algorithm. For example, a local site doesn’t need a lot of links to rank for a search query. The pages just need the right kinds of keywords to trigger the local search algorithm and subsequently rank.
The algorithm for local search is different and more permissive so that local type sites can rank. Local search algorithms are so permissive that a site written virtually entirely in Latin could rank for a phrase like Rhinoplasty Plano Texas.
Google has known about this spam problem since at least December 19th, as acknowledged in a tweet by Danny Sullivan.
Yes, I already passed that one on to the search team. Here’s a peek. And it’s being looked at. pic.twitter.com/vJH3EisnXD
— Google SearchLiaison (@searchliaison) December 19, 2023
There are a lot of ways Google can do it, like for example being stricter and not allowing sites on certain domains rank. It will be interesting to see if Google finally after all this time figures out a way to combat this kind of spam.
Featured Image by Shutterstock/Kateryna Onyshchuk