Google Planning To Index Entire Libraries
For a long time there have been rumors going around about Project Ocean, a project by Google to index all the pre-1923 content in the Stanford Library. (I’d call it a little stronger than a rumor since it was in the New York Times, but Google wouldn’t comment on it.)
Now it’s come out that Google is planning to do a LOT of library indexing — specifically they’re working with Harvard, Stanford, the University of Michigan, the University of Oxford, and the New York Public Library to scan books from their collections and make them available via Google search. (You can see Google’s announcement at http://www.google.com/press/pressrel/print_library.html .)
This as you might imagine is an expansion of the Google Print program, and you can see additional information and examples about this at http://print.google.com/googleprint/library.html . Search results from Google Print will be integrated into Google search results, though as I understand it they will not be integrated into the Google API results, which is a real pity.
Here are announcements by each institution:
Harvard — http://hul.harvard.edu/publications/041213news.html — “In the coming months, Google will collaborate with Harvard’s libraries on a pilot project to digitize a substantial number of the 15 million volumes held in the University’s extensive library system. Google will provide online access to the full text of those works that are in the public domain.”
Stanford — http://www.stanford.edu/dept/news/pr/2004/pr-google-011205.html — “Stanford University today announced an ambitious plan to cooperate with Google Inc. in digitizing hundreds of thousands, perhaps millions, of books from the shelves of Stanford libraries and making them available to readers worldwide and without charge.”
University of Michigan — http://www.umich.edu/news/index.html?Releases/2004/Dec04/library/index — “Google will digitally scan and make searchable virtually the entire collection of the U-M library. A person looking for information will gain the extraordinary capability to use Google to locate and read the full text of printed works that are out of copyright.”
U of M also has a nice Q & A document at http://www.umich.edu/news/index.html?Releases/2004/Dec04/library/q&a .
Oxford — http://www.admin.ox.ac.uk/po/news/2004-05/dec/14b.shtml — Reg Carr, Director of Oxford University Library Services, said ?Making the wealth of knowledge accumulated in the Bodleian Library?s historic collections accessible to as many people as possible is at the heart of Oxford University?s commitment to lifelong learning. Oxford is therefore proud to be part of this effort to make information available to everyone who might benefit from it….The Bodleian Library?s 19th century holdings include works by Charles Darwin, Edgar Allan Poe and Christina Rossetti. In addition there are numerous books and journals on subjects such as art, history, theology, politics and travel, with some of the more unusual titles including: An account of the pirates executed at St Christopher?s, in the West Indies, in 1828 by Enoch Wood (1830), and On the Economy of Machinery and Manufactures by Charles Babbage (1832).”
New York Public Library — http://www.nypl.org/press/google.cfm — “In this pilot program, NYPL is working with Google to offer a collection of its public domain books, which will be scanned in their entirety and made available for free to the public online. Users will be able to search and browse the full text of these works.”
I am really of two minds about this entire process. The first mind says that any digitization of public domain works is a good idea, and it’s nice to see a private enterprise tackling such a hugely ambitious project.
The second mind says sod that. The second mind says what about the already huge numbers of digitized books that exist? The second mind asks who is going to organize those and make them available? The second mind says if this is going to take six years (mentioned at the U of M site) then why not do something with existing already digitized materials?
Columist Tara Calishain is writer and editor at ResearchBuzz and author of the new book Web Search Garage