Editor’s note: This post is part of an ongoing series looking back at the history of Google algorithm updates. Enjoy!
Panda is the official name of an algorithm update developed by Google to reduce the prevalence of low-quality, thin content in the search results, and to reward unique, compelling content.
At the time Panda launched, user complaints about the increasing influence of “content farms” were growing rampant.
Google’s Panda algorithm assigns pages a quality classification, used internally and modeled after human quality ratings, that is incorporated as a ranking factor.
Websites that recover from the impact of Panda do so by revamping pages with low-quality content, adding new high-quality content, eliminating filler words and above the fold ads, and in general improving the user experience as it relates to content.
Why Google Created Panda
In 2010, the falling quality of Google’s search results and the rise of the “content farm” business model became a subject that was repeatedly making the rounds.
As Google’s Amit Singhal later told Wired at TED, the “Caffeine” update of late 2009, which dramatically sped up Google’s ability to index content rapidly, also introduced “some not so good” content into their index. Google’s Matt Cutts told Wired this new content issue wasn’t really a spam issue, but one of “What’s the bare minimum that I can do that’s not spam?”
ReadWriteWeb pointed out:
“By the end of , two of these content farms – Demand Media [of eHow infamy] and Answers.com – were firmly established inside the top 20 Web properties in the U.S. as measured by comScore. Demand Media is the epitome of a content farm and by far the largest example of one, pumping out 7,000 pieces of content per day…The company operates based on a simple formula: create a ton of niche, mostly uninspired content targeted to search engines, then make it viral through social software and make lots of money through ads.”
In January 2011, Business Insider published a headline that says it all: “Google’s Search Algorithm Has Been Ruined, Time To Move Back To Curation.”
In another article, they pointed out:
“Demand [Media] is turning the cleverest trick by running a giant arbitrage of the Google ecosystem. Demand contracts with thousands of freelancers to produce hundreds of thousands of pieces of low-quality content, the topics for which are chosen according to their search value, most of which are driven by Google. Because Google’s algorithm weights prolific and constant content over quality content, Google’s algorithm places Demand content high on their search engine result pages.”
Undoubtedly, headlines like these were a major influence on Google, which responded by developing the Panda algorithm.
Google Panda Update Launches
Panda was first introduced on February 23, 2011.
On February 24, Google published a blog post about the update, and indicated that they “launched a pretty big algorithmic improvement to our ranking—a change that noticeably impacts 11.8% of our queries.” The expressed purpose of the update was as follows:
“This update is designed to reduce rankings for low-quality sites—sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites—sites with original content and information such as research, in-depth reports, thoughtful analysis and so on.”
Search Engine Land founder Danny Sullivan originally referred to it as the “Farmer” update, although Google later revealed that internally it had been referred to as “Panda,” the name of the engineer who came up with the primary algorithm breakthrough.
Analyses by SearchMetrics and SISTRIX (among others) of the “winners and losers” found that sites that were hit the hardest were pretty familiar to anybody who was in the SEO industry at the time. These sites included wisegeek.com, ezinearticles.com, suite101.com, hubpages.com, buzzle.com, articlebase.com, and so on.
Notably, “content farms” eHow and wikiHow did better after the update. Later updates would hurt these more “acceptable” content forms as well, with Demand Media losing $6.4 million in the fourth quarter of 2012.
The most readily apparent change in the SEO industry was how heavily it hit “article marketing,” in which SEO practitioners used to publish low-quality articles on sites like ezinearticles.com as a form of link building.
It was also clear that the most heavily hit sites had less attractive designs, more intrusive ads, inflated word counts, low editorial standards, repetitive phrasing, poor research, and in general didn’t come across as helpful or trustworthy.
What We Know About the Panda Algorithm
When Google discussed the development of the algorithm with Wired, Singhal said that they started by sending test documents to human quality raters who were asked questions like “Would you be comfortable giving this site your credit card? Would you be comfortable giving medicine prescribed by this site to your kids?”
Cutts said the engineer had developed “a rigorous set of questions, everything from. ‘Do you consider this site to be authoritative? Would it be okay if this was in a magazine? Does this site have excessive ads?’”
According to the interview, they then developed the algorithm by comparing various ranking signals against the human quality rankings. Singhal described it as finding a plane in hyperspace that separates the good sites from the bad.
Singhal later released the following 23 questions as guiding questions the algorithm was based on:
- Would you trust the information presented in this article?
- Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
- Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
- Would you be comfortable giving your credit card information to this site?
- Does this article have spelling, stylistic, or factual errors?
- Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?
- Does the article provide original content or information, original reporting, original research, or original analysis?
- Does the page provide substantial value when compared to other pages in search results?
- How much quality control is done on content?
- Does the article describe both sides of a story?
- Is the site a recognized authority on its topic?
- Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
- Was the article edited well, or does it appear sloppy or hastily produced?
- For a health related query, would you trust information from this site?
- Would you recognize this site as an authoritative source when mentioned by name?
- Does this article provide a complete or comprehensive description of the topic?
- Does this article contain insightful analysis or interesting information that is beyond obvious?
- Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
- Does this article have an excessive amount of ads that distract from or interfere with the main content?
- Would you expect to see this article in a printed magazine, encyclopedia or book?
- Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
- Are the pages produced with great care and attention to detail vs. less attention to detail?
- Would users complain when they see pages from this site?
It’s also a good idea to consider what Google’s human quality raters were asked to consider. This quote about low-quality content is especially important:
Consider this example: Most students have to write papers for high school or college. Many students take shortcuts to save time and effort by doing one or more of the following:
- Buying papers online or getting someone else to write for them
- Making things up
- Writing quickly, with no drafts or editing
- Filling the report with large pictures or other distracting content
- Copying the entire report from an encyclopedia or paraphrasing content by changing words or sentence structure here and there
- Using commonly known facts, for example, “Argentina is a country. People live in Argentina. Argentina has borders.”
- Using a lot of words to communicate only basic ideas or facts, for example, “Pandas eat bamboo. Pandas eat a lot of bamboo. Bamboo is the best food for a Panda bear.”
In March of 2011, SEO By The Sea identified Biswanath Panda as the likely engineer behind the algorithm’s namesake. One paper Biswanath helped author detailed how machine learning algorithms could be used to make accurate classifications about user behavior on landing pages.
While the paper is not about the Panda algorithm, the author with its namesake’s involvement, and the subject matter, suggest that Panda is also a machine-learning algorithm.
Most in the SEO industry have by now concluded that Panda works by using machine learning to make accurate predictions about how humans would rate the quality of content. What is less clear is what signals would have been incorporated into the machine learning algorithm in order to determine which sites were low in quality, and which weren’t.
Google Panda Recovery
The path to recovery from Panda is both straightforward and difficult.
Since Panda boosts the performance of sites with content that it categorizes as having high-quality, the solution is to increase the quality and uniqueness of your content.
While that’s easier said than done, it’s been proven time and time again that this is exactly what is needed to recover.
Felix Tarcomnicu recovered a site by removing low-quality, thin content that had never performed well (based on bounce rates, exit rates, time on site), cleaning up the grammar, and adding high-quality content.
Alan Bleiweiss helped a site recover by helping them rewrite content across 100 pages.
WiredSEO helped a site recover from Panda by changing their user-generated content guidelines to encourage more specific, unique bios, rather than ones copied from other sites. Users of the site had previously used bios from their other sites, but WiredSEO encouraged them to change the bio to ask specific questions, resulting in unique bios that weren’t duplicates.
SEOMaverick helped a site recover by deindexing cookie-cutter pages, combining multiple pages on the same topic into single pages, and updating all of the remaining pages with better copy and structure.
Google Panda Myths
Panda Isn’t About Duplicate Content
The most pervasive myth about Panda is that it is about duplicate content. John Mueller has clarified that duplicate content is independent of Panda. Google employees have stressed many times that Panda encourages unique content, but this goes deeper than avoiding duplication. What Panda is looking for is genuinely unique information that provides unique value to users.
Mueller likewise told one blogger that removing technical duplicates was actually a very low priority, and that they should instead “think about what makes your website different compared to the absolute top site of your niche.”
The source of this confusion is likely from Singhal’s questionnaire, with the question “Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?”
This isn’t referring to technical duplication, but to the redundancy of content, where novel content is rewarded more than derivative content.
Should You Delete Content to Resolve Panda Issues?
This one’s a bit tricky because Google has often be a source a lot of contradictory information.
You see, back in 2011, Google’s Michael Wyszomierski actually told webmasters to remove thin content if they were hit by Panda:
“Our recent update is designed to reduce rankings for low-quality sites, so the key thing for webmasters to do is make sure their sites are the highest quality possible. We looked at a variety of signals to detect low quality sites. Bear in mind that people searching on Google typically don’t want to see shallow or poorly written content, content that’s copied from other websites, or information that are just not that useful. In addition, it’s important for webmasters to know that low quality content on part of a site can impact a site’s ranking as a whole. For this reason, if you believe you’ve been impacted by this change you should evaluate all the content on your site and do your best to improve the overall quality of the pages on your domain. Removing low quality pages or moving them to a different domain could help your rankings for the higher quality content.
Now, in 2017, we’re hearing something a bit different.
Google’s Gary Illyes said on Twitter: “We don’t recommend removing content in general for Panda, rather add more highQ stuff”.
John Mueller said likewise on YouTube:
“Overall the quality of the site should be significantly improved so we can trust the content. Sometimes what we see with a site like that will have a lot of thin content, maybe there’s content you are aggregating from other sources, maybe there’s user-generated content where people are submitting articles that are kind of low quality, and those are all the things you might want to look at and say what can I do; on the one hand, hand if I want to keep these articles, maybe prevent these from appearing in search. Maybe use a noindex tag for these things.”
Google’s response has always been to either noindex or improve content, never to cut it completely, unless doing so is a move for branding as well.
A Google spokesperson directly told TheSEMPost:”Instead of deleting those pages, your goal should be to create pages that don’t fall in that category: pages that provide unique value for your users who would trust your site in the future when they see it in the results.”
So, in general, deleting content should be a consideration in terms of the overall branding of your site, rather than a move that will remove a Panda penalty.
Panda & User-Generated Content
Panda doesn’t target user-generated content specifically. Although Panda can target user-generated content, it tends to impact sites that produce low-quality content – such as spammy guest posts or forums filled with spam.
Do not remove your user-generated content, whether it is forums, blog comments, or article contributions, simply because you heard it is “bad” or marketed as a “Panda proof” solution. Look at it from a quality perspective instead.
Many high ranking sites rely on user-generated content – so many sites would lose significant traffic and rankings simply because they removed that type of content. Even comments made on a blog post can cause it to rank and even get a featured snippet.
Word Count Isn’t A Factor
Word count is another aspect of Panda that is often misunderstood by SEO professionals. Many sites make the mistake that they refuse to publish any content unless it is above a certain word count, with 250 words and 350 words often cited. Instead, Google recommends you think about how many words the content needs to be successful for the user.
For example, there are many pages out there with very little main content, yet Google thinks the page is quality enough that it has earned the featured snippet for the query. In one case, the main content was a mere 63 words, and many would have been hard pressed to write about the topic in a non-spammy way that was 350+ words in length. So you only need enough words to answer the query.
While word count can be a convenient way to identify pages that might be thin for some sites, it isn’t a factor that is specifically used by Panda, according to Mueller.
Affiliate Links & Ads Aren’t Directly Targeted
Affiliate sites and “made for AdSense” sites are often hit by Panda more often than other sites, but this isn’t because it specifically targets them. A Google spokesperson told TheSEMPost that
“An extreme example is when a site’s primary function is to funnel users to other sites via ads or affiliate links, the content is widely available on the internet or it’s hastily produced, and is explicitly constructed to attract visitors from search engines.”
Mueller said, similarly:
“But at the same time we see a lot of affiliates who are basically just lazy people who copy and paste the feeds that they get and publish them on their websites. And this kind of lower quality content, thin content, is something that’s really hard for us to show in search.”
In other words, these sites are being hit for the same reasons: they fail to provide compelling, unique, engaging content.
Technical SEO Doesn’t Play Any Role in Panda
Panda looks just at the content, not things like whether you’re using H1 tags or how quickly your page loads for users.
That said, technical SEO can be an important part of SEO and ranking in general, so it shouldn’t be ignored.
But technical SEO doesn’t have any direct impact on Panda specifically.
Panda almost certainly has the most extensive public record of public dates for its associated updates. Part of the reason for this is that Panda was run externally from Google’s core algorithm, and content scores were, as a result, only affected on or near the date of new Panda updates.
This continued until June 11, 2013, when Cutts said at SMX Advanced that, while Panda was not incorporated directly into Google’s core algorithm, its data was updated monthly and rolled out slowly over the course of the month, ending the abrupt industry-wide impacts associated with Panda updates.
The numbering convention is somewhat confusing.
One would expect core updates to Panda’s algorithm to correspond to 1.0, 2.0, 3.0, and 4.0, but no update is referred to as 3.0, and 3.1 was not, in retrospect, a core update to Panda.
Data refreshes, which updated the search results but not the Panda algorithm itself, were typically numbered as you would expect for software updates (3.2, 3.4, 3.5 and so on). However, there were so many data refreshes for version 3 of the algorithm that, for a time, this naming convention was abandoned and the industry referred to them simply by the total count of Panda updates (both refreshes and core updates).
Even after getting a handle on this naming convention, it still isn’t entirely clear whether all of the minor Panda updates were just data refreshes, or if some of them incorporated new signals as well.
Regardless, the timeline of Panda updates is, at least, well known and is as follows:
- 1.0: February 23, 2011. The first iteration of a then unnamed algorithm update was introduced (12 percent of queries were impacted), shocking the search engine optimization industry and many big players, as well as effectively ending the “content farm” business model as it existed at the time.
- 2.0 (#2): April 11, 2011. The first update to the core Panda algorithm. This update incorporated additional signals, such as sites that Google users had blocked.
- 2.1 (#3): May 9, 2011. The industry first called this Panda 3.0, but Google clarified that it was just a data refresh, as would be true of the 2.x updates to come.
- 2.2 (#4): June 21, 2011
- 2.3 (#5): July 23, 2011
- 2.4 (#6) International: August 12, 2011. Panda was rolled out internationally for all English-speaking countries, and for non-English speaking countries except for Japan, China, and Korea.
- 2.5 (#7): September 28, 2011
- (#8) “Panda-Related Flux”: October 5, 2011. Cutts announced to “expect some Panda-related flux in the next few weeks.” Dates for these minor updates were October 3, October 13, and November 18. Retroactively, it may make sense to refer to this announcement as the beginning of 3.0, since it marked the beginning of a period of more frequent, smaller data refreshes, many of which remain untracked.
- 3.1 (#9): November 18, 2011. In retrospect, it is clear that Cutts’ announcement on October 5 was the beginning of a period where data refreshes became more frequent and were not always announced or tracked by the industry. A more prominent update occurred on November 18 and many in the industry referred to it as 3.1, the first to be designated by the industry as part of the line of 3.0 updates.
- 3.2 (#10): January 18, 2011. Google confirmed an update occurred on this day but suggested that the algorithm hadn’t changed. Evidently, this was merely the date of a more heavy-hitting data refresh.
- 3.3 (#11): February 27, 2012
- 3.4 (#12): March 23, 2012
- 3.5 (#13): April 19, 2012
- 3.6 (#14): April 27, 2012
- 3.7 (#15): June 8, 2012. A data refresh that ranking tools suggest was more heavy-hitting than other recent updates.
- 3.8 (#16): June 25, 2012
- 3.9 (#17): July 24, 2012
- 3.9.1 (#18): August 20, 2012. A relatively minor update that marked the beginning of a new naming convention assigned by the industry.
- 3.9.2 (#19): September 18, 2012
- #20: September 27, 2012. A relatively large Panda update that also marked the beginning of yet another naming convention, after the industry recognized the awkwardness of the 9.x.x naming convention, and recognized that updates to what they called Panda 3.0 could continue to occur for a very long time.
- #21: November 5, 2012
- #22: November 21: 2012
- #23: December 21, 2012. A slightly more impactful data refresh.
- #24: January 22, 2013
- #25: March 14, 2013. This update was pre-announced, and tools suggest it occurred on roughly this day. Cutts seemed to suggest that this would be the final update before Panda would be incorporated directly into the Google algorithm, although it later became clear that this wasn’t quite what was happening.
- “Dance”: June 11, 2013. This is not the date of an update, but the day Cutts clarified Panda wasn’t going to be incorporated directly into the algorithm, but rather that it would update monthly with much slower rollouts, rather than the abrupt data refreshes of the past.
- “Recovery”: July 18, 2013. This update appears to have been a tweak to correct some overly harsh Panda activity.
- 4.0 (#26): May 19, 2014. A major Panda update (impacting 7.5 percent of queries) occurred on this date, and most in the industry believe that this was an update to the Panda algorithm, not just a data refresh, especially in light of Cutts’ statements about slow rollouts.
- 4.1 (#27): September 23, 2014. Another major update (impacting 3 to 5 percent of queries) that included some changes to the Panda algorithm. Due to the slow rollouts, the exact date is unclear, but the announcement was made on September 25.
- 4.2 (#28): July 17, 2015. Google announced a Panda refresh that would take months to roll out. Due to the slow nature of the rollout, it’s unclear how substantial the impact was or precisely when it occurred. It was the final confirmed Panda update.
- Core Algorithm Incorporation: January 11, 2016. Google confirmed that Panda had been incorporated into the core Google algorithm, evidently as part of the slow July 17, 2015 rollout. In other words, Panda is no longer a filter applied to the Google algorithm after it does its work, but is incorporated as another of its core ranking signals. It has been clarified, however, that this doesn’t mean the Panda classifier acts in real time.