In a Webmaster Hangout, Google’s John Mueller answered whether plagiarized content could hurt a site’s rankings. Mueller’s response gave a peek into how Google handles sites that steal content and the effect it has on your site.
Scraper Sites and Effect on Ranking
There are many bad actors who steal content and use it on their own sites. It is done with automated software. The process is called content scraping and the sites that publish stolen content are known as content scrapers.
Stolen content is associated with the loss of rankings in Google. It’s not unusual to search a snippet of your own content and see another site ranking with it.
The concern about the effect on rankings is a legitimate one.
Here is the question:
“A few websites have started scraping my content and have been publishing them. We tried to contact their hosts for a DMCA takedown without luck. Does having my content scraped and republished hurt my site? Should I disavow these URLs?”
What is DMCA?
The question made a reference to a DMCA takedown. DMCA is an American law called The Digital Millennium Copyright Act (DMCA).
The law protects hosts, domain name registrars and other businesses from liability for copyright violations as long as they provide a way for content creators to request that stolen content be removed. It also provides due process provisions that allow the takedown to be contested which can then result in costly litigation for the content creator.
It’s somewhat surprising that the publisher tried using the DMCA and failed. This can happen when the web host and/or domain name registrar are in a country outside of the USA. Each country has their own remedy.
Does Copied Content Affect Rankings?
Google’s John Mueller gave an overview of how stolen content affects rankings:
“So from our point of view, other sites copying your content wouldn’t be something that would negatively affect your website. So that’s a very common situation, that sites copy content.
…if you’re not seeing those copies showing up in search for the queries that you care about then it might not be the highest priority to focus on.”
What John Mueller makes sense in the context that scraper sites do not generally rank for actual search queries. Is it possible for scrapers to rank in long tail or non-competitive queries? Almost anything is possible with those kinds of queries.
Why Scrapers Rank for Snippets of Content
It’s not unusual for a scraper site to rank for a snippet of content stolen from another site, but there’s a good reason for that.
Snippets of content are generally regarded as gibberish. If another site ranks for a snippet, it’s not because their thievery has made your site less relevant. It’s because the search algo ranks pages differently for nonsense phrases.
Google’s algorithm is trying to make sense of all search queries. That’s virtually impossible to do when there is no “sense” in the search query.
And when the snippet does make sense, Google may very well rank other sites for that query ahead of your sites, but that’s the algo kicking in, ranking pages for “topics.”
Google does not rank pages by matching keywords, so even if the search is your snippet, that does not guarantee that your site will rank number one.
What’s important is that content thieves generally do not rank for the search queries that matter. So don’t let it trouble you if you see scraper sites outranking you for snippets. That’s not a sign that your site lost ranking strength due to stolen content.
How to Protect Against Scrapers?
WordPress Anti-bot Plugins
There are many WordPress plugins that provide a defense against malicious scrapers.
WordFence is a popular plugin that can be customized to block scrapers for however hours you want to block them. It emails you to let you know when you’re under attack, which can help you increase how swiftly WordFence shuts them out.
WordFence appears to work by monitoring visitor behavior, particularly the amount of pages or the kinds of pages that it is trying to download. It’s the behavior that triggers a wall that blocks the bots.
I use WordFence to stop scrapers and hacker bots and am happy with how it works.
Blackhole Anti-bot WordPress Plugin
Another popular WordPress plugin is one called Blackhole. (It also comes with a feature rich and reasonably priced Pro version)
Blackhole works on the principle of the honeypot. Good bots will avoid crawling a prohibited link. Bad bots will rush right in. Blackhole sets a trap for bad bots by including a link to the honeypot. Once the bad bot follows the prohibited link the trap is triggered and the bot is excluded from crawling.
All search engines are whitelisted. This means that no legitimate search engine will ever be blocked, even if Google follows the link.
There is a PHP bot blocker called Blackhole. Blackhole can be installed with any server that uses PHP. So it will be compatible with a forum site using software such as Xenforo or phpBB. Read more about the PHP version of Blackhole here.
reCAPTCHA Enterprise Beta
Google recently announced a free beta trial of a service called, reCAPTCHA Enterprise. It is a cloud service that is designed to block automated scrapers, hackers and other malicious bots.
That Google itself is offering a solution to bad bots may be a sign of how important it is to block automated bot software, including scrapers.
Should You Protect Against Scrapers?
I believe it’s a good idea to protect your site from automated bots. Bots tend to crawl at night at the same time that Google and other legitimate bots are crawling. This can become problematic when too many malicious bots are probing your site, slowing down your server. This can cause your server to begin serving error response codes to Google, which will then be unable to crawl your and index your site.
So although John Mueller is correct to say that stolen content does not affect your rankings, you should still try to protect against scrapers in order that Google can properly crawl and index your site.
What’s important is that Google confirmed that scraped content does not affect your rankings.