The Google Cache, Caching Google in Protest
In my referals the other day I clicked over to a blog which linked to Search Engine Journal and a story on Google China. The blog entry was about Dorks protesting Google. I’m not sure how dorky covering search engine news is, nor have I really considered myself that much of a dork. However, when it does come to dorky protests, TheGoogleCache takes the cake (tag, now you’re the dork!).
TheGoogleCache is a protest site of the recent court ruling which said that Google’s copyrighted materials is ‘fair use’ :
“After the recent ruling that stated Google’s cache of copyrighted materials was “fair use”, I decided to put this to the test myself. This is The Google Cache. You search Google, your results get cached.”
In essence, TheGoogleCache is caching Google.
Vote for this post : 0“The Google cache is absolutely ridiculous. As an individual who has had quite a bit of experience on both sides of the white hat / black hat search engine industry, the cache is NOT a webmaster’s friend.
1. The cache removes content control away from the author. For example, a site like EzineArticles.com prevents scraping by using an IP blocking method based on the speed at which pages are spidered by that IP. It is absurdly easy to circumvent this by simply spidering the Google cache of that article instead of spidering the site. Google’s IP blocking is far less restrictive, and combined with the powerful search tool, it allows for easy, anonymous contextual scraping of sites whose Terms of Service explicitly refuse it.
2. The cache extends access to removed content, often for months if not years at a time. Google rarely replaces 404 pages (perhaps it is because of their wish to have the largest number of indexed pages). I have clients who have nearly 48,000 non existent pages still cached in google that have not been present in over 14 months. Despite using 404s, 301s, etc. these pages have not yet been removed. Furthermore, Google’s often mishandling of robots.txt, nocache, and nofollow leaves webmasters dependent upon search traffic hesitant to force removal of these pages using the supposedly standardized methods of removal.
3. The cache allows Google to serve site content anonymously. Don’t want the owner of a site to know you are looking at their goods (think of companies grepping for competitor IPs), just watch the cache instead.
The list goes on and on. But I think the point is this…
Why should a web author have to be technologically savvy to keep his or her content from being reproduced by a multi-billion dollar US company? Content control used to be as simple as “you write it, its yours”. It got a little more complicated with time to the point at which it might be useful to use, perhaps, a Terms of Service. Even a novice could write “No duplication allowed without expressed consent”. Now, a web author must know how to manipulate HTML meta tags and/or a robots.txt file.”
or Buzz it at Yahoo :







Comments
1 response so far ↓
Russ Jones on Feb 6, 2006 at 12:23 am
It is dorky. I am a dork. And, yes, covering SEO news is dorky. But, that doesn’t change the fact that it is a valid issue. There is a legitimate concern that Google’s cache removes the necessity of an individual visiting the content-owner’s site to gain access. In many cases, especially bloggers, it is not possible for them to modify the code necessary to prevent caching. And, even if they could, it seems rather capricious that copyright owners must employ technology to protect their copyrights, and that the law will no longer defend them. I believe whole heartedly in the rights of Google to present snippets; the SE equivalent of quotes. But the assumption that a document lacking the nocache attribute is implies consent that Google reproduce it is utterly ridiculous. Who needs a slippery slope when we are already at the bottom. No content unprotected by the proper meta-tags is exempt from reproduction by search engines, news organizations, or even competitors if they can substantiate a claim that it is for archival purposes.
Leave a Comment