Who would have thought that the day will come that even documents which were digitized through scanning can now be search by Google’s powerful search crawlers? Previously, Google can only searched through PDF documents which were converted from a text file. The Googlebot left out those PDF documents, wheter image or text-based which were converted to PDF through scanning. But not anymore. The Official Google blog just announced that Google’s search bot can now scour through all PDF documents which were produced through scanning.
How can Google do this? Through Optical Character Recognition. Those who have using Adobe Acrobat Professional who know about this process. When a printed document is converted to PDF through a scanner, the only way you can make that PDF searchable is through the OCR process which is one functionality of the Adobe Acrobat Professional (and other software with similar function).
The OCR process converts picture into words that can be searched and indexed. It’s a pretty nifty way of searching for chapters of a scanned books and printed documents when you’re reading its PDF file. You don’t have to scanned through the pages anymore just to find what you are looking for.
Quite frankly, this a simple and yet useful feature of Google’s search engine, and again Google has one upped its competition. With so many documents being made available on the web in PDF format, it’s just about time that search engines do something to make them searchable and findable not only through their metadata but more importantly through their contents.
Subscribe to SEJ
Get our daily newsletter from SEJ's Founder Loren Baker about the latest news in the industry!