Google highly values the “freshness” of the content. But how does Google defines it against the “staleness” of the content? Google’s Historical Data Patent reviewed at WebmasterWorld and SEroundtable sheds some light on what Google thinks to be fresh content and what can be defined as stale documents.
Bill Slawski did a great job of summarizing the patent with help of two examples:
The Constitution of the United States is an old document, but it’s not stale. A news article about the “World Series” from 1918 may not be what a baseball fan wants to see when searching for “World Series” this October.
According to Google itself:
Stale content refers to documents that have not been updated for a period of time and, thus, contain stale data (documents that are “no longer updated, diminished in importance, superceded by another document“).
The staleness of a document may be based on:
- document creation date,
- anchor growth, traffic,
- content change,
- forward/back link growth, etc.
Google patent explains how they can spot the stale content using 4 factors:
- Query-based factor;
- Link-based criteria;
- Traffic – based criteria;
- User-behavior-based criteria.
1. Query-based factor basically refers to analyzing which pages in SERPs are selected by users.
Besides, the search engine tracks which queries one and the same document ranks for: “discordant set of queries” might mean the page is spammy.
2. Link-based factor analyzes the page backlinks monitoring the dates that new links appear (i.e. “indexed by Google or the date the linking page was created”) to a document and that existing links disappear. By looking into the the rate at which links appear or disappear over time and how many links appear or disappear during a given time period, the search engine is able to conclude whether there is trend toward appearance of new links versus disappearance of existing links to the document or vice versa:
- downward trend = > stale document (more links disappear than appear);
- decrease in links = > stale content (either sudden or significant link disappearance).
3. Traffic – based criteria: a large reduction in traffic may indicate that a document may be stale.
4. User-behavior-based criteria: if people spend too little time on the page (compared with the similar / tightly relevant page), that might mean the document is stale.