Getting Into Google Book Search – Google Print
Google Print, Google Print, Argh Argh Argh! Since I got back from Web Search University I have been buried in sites and resources to review. I despair of catching up. And they keep piling on, as evidenced by the launch of Google Print. But if all the resources I have to review raise in me the tide of ambivalence that Google Print does, I may find myself with a pile of neatly-reviewed resources and all my hair torn out.
Google Print information is available at http://print.google.com/ . Google describes the project thus: “Google’s mission is to organize the world’s information and make it universally accessible and useful. Since a lot of the world’s information isn’t yet online, we’re helping to get it there.”
Which leads me to two immediate questions: 1) What’s the procedure for universities and other institutions which already have books indexed online? How can they add their books to the Google Print index? A Google rep tells me there’s no program for those institutions at this time. There is a program for publishers to add content ( https://print.google.com/publisher/faq ) but nothing else at the moment. (Considering that Google is going to share ad revenue with the publishers according to the FAQ, this is really disappointing.) Interestingly the program at the moment will only accept hardcopy books; h’m. I guess it’s better to build one intake infrastructure at a time.
The second question is “How is Project Ocean doing?” You may remember a reference in the February New York Times to Project Ocean. Lisnews has a good quote and rundown: ” And Google has embarked on an ambitious secret effort known as Project Ocean, according to a person involved with the operation. With the cooperation of Stanford University, the company now plans to digitize the entire collection of the vast Stanford Library published before 1923, which is no longer limited by copyright restrictions.”
Can you imagine what such a collection would do to something like Google Print? When I asked the Google rep about this, though, they pointed out that the article was speculation, and they don’t comment on speculation. Okay.
Back to what is known and confirmed: Google Print. The fact that they’ve got a free program to index books in their search engine is very exciting and I can imagine it’ll be a lot more useful to me in my searching than a similar effort, Google Catalogs (where you could send them hardcopy catalogs and they’d index ’em.) However, with the launch of the program one big thing has changed.
Before the program was officially announced, the content of Google Print was integrated into the Google index. This meant that with some judicious use of special syntax you could search just the print materials. (See my example at ResearchBuzz.) You could also set up Web alerts to find out when new material was added to the Google index (see the above article.)
However, the Google rep informs me that “Unlike the first phase of Google Print, these results are not integrated into the main Google search index.”
What does this mean? a) No ability to isolate the Google Print content — possibly no ability to run extensive general searches for content, b) No ability to set up search alerts to get information on new content as it’s added to the index, and c) no ability to access Google Print content through Google API calls.
However, there may be a ray of hope. The rep went on to say, “Also, keep in mind that this is just a test of the integration of book content into Google search results and things are changing as we learn more.”
So what do you get now? Try searching for books about x, substituting your preferred keywords for x. You’ll get a little book icon and links at the top of the page. If it doesn’t work, try refreshing a few times — I’m told some datacenters may still be in the process of updating.
On one hand I can’t help but be supportive of an initiative like this to get more-credible information into search engines in large chunks. If Google’s behind it there’s going to be a lot of muscle going into it, and I’m sure a lot of publishers are going to jump on board. However, the changes that have been made to the integration of this content (which makes it harder to isolate in a search) and the fact that they’re apparently concentrating on publishers first and not on existing collections of digitized content (of which there’s a lot, including unique stuff that might be really hard to find in a regular search) is disappointing.