Google Planning To Index Entire Libraries
For a long time there have been rumors going around about Project Ocean, a project by Google to index all the pre-1923 content in the Stanford Library. (I’d call it a little stronger than a rumor since it was in the New York Times, but Google wouldn’t comment on it.)
Now it’s come out that Google is planning to do a LOT of library indexing — specifically they’re working with Harvard, Stanford, the University of Michigan, the University of Oxford, and the New York Public Library to scan books from their collections and make them available via Google search. (You can see Google’s announcement at http://www.google.com/press/pressrel/print_library.html .)
This as you might imagine is an expansion of the Google Print program, and you can see additional information and examples about this at http://print.google.com/googleprint/library.html . Search results from Google Print will be integrated into Google search results, though as I understand it they will not be integrated into the Google API results, which is a real pity.
Here are announcements by each institution:
Harvard — http://hul.harvard.edu/publications/041213news.html — “In the coming months, Google will collaborate with Harvard’s libraries on a pilot project to digitize a substantial number of the 15 million volumes held in the University’s extensive library system. Google will provide online access to the full text of those works that are in the public domain.”
Stanford — http://www.stanford.edu/dept/news/pr/2004/pr-google-011205.html — “Stanford University today announced an ambitious plan to cooperate with Google Inc. in digitizing hundreds of thousands, perhaps millions, of books from the shelves of Stanford libraries and making them available to readers worldwide and without charge.”
University of Michigan — http://www.umich.edu/news/index.html?Releases/2004/Dec04/library/index — “Google will digitally scan and make searchable virtually the entire collection of the U-M library. A person looking for information will gain the extraordinary capability to use Google to locate and read the full text of printed works that are out of copyright.”
U of M also has a nice Q & A document at http://www.umich.edu/news/index.html?Releases/2004/Dec04/library/q&a .
Oxford — http://www.admin.ox.ac.uk/po/news/2004-05/dec/14b.shtml — Reg Carr, Director of Oxford University Library Services, said ?Making the wealth of knowledge accumulated in the Bodleian Library?s historic collections accessible to as many people as possible is at the heart of Oxford University?s commitment to lifelong learning. Oxford is therefore proud to be part of this effort to make information available to everyone who might benefit from it….The Bodleian Library?s 19th century holdings include works by Charles Darwin, Edgar Allan Poe and Christina Rossetti. In addition there are numerous books and journals on subjects such as art, history, theology, politics and travel, with some of the more unusual titles including: An account of the pirates executed at St Christopher?s, in the West Indies, in 1828 by Enoch Wood (1830), and On the Economy of Machinery and Manufactures by Charles Babbage (1832).”
New York Public Library — http://www.nypl.org/press/google.cfm — “In this pilot program, NYPL is working with Google to offer a collection of its public domain books, which will be scanned in their entirety and made available for free to the public online. Users will be able to search and browse the full text of these works.”
I am really of two minds about this entire process. The first mind says that any digitization of public domain works is a good idea, and it’s nice to see a private enterprise tackling such a hugely ambitious project.
The second mind says sod that. The second mind says what about the already huge numbers of digitized books that exist? The second mind asks who is going to organize those and make them available? The second mind says if this is going to take six years (mentioned at the U of M site) then why not do something with existing already digitized materials?
–
Columist Tara Calishain is writer and editor at ResearchBuzz and author of the new book Web Search Garage







I am mainly interested in books published after 1923, that will not
be available because they are copyrighted.
Surprizingly enough, there have been interesting developements since
1923. — World War II, the developement of Quantum Mechanics, Nuclear
Power (and War!), Electronic Computers, a Host of Revolutionary discoveries in Astronomy– All after 1923 and not included in the
database!
I am mainly interested in books printed after 1923. These are still
copyrighted and not included in the database. There have been many
interesting developements since 1923 that will be missed – World War II, Developement of Quantum Mechanics, a host of Revolutionary developements in Astronomy, Electronic Computers, Space Travel and
many, many new discoveries — all left out of the data base!
I think this is a fantastic ideea. Although there were so many things that happened after ‘23, there was a exceptional literature before that date. This is a huge step forward. The simple fact that anyone could find in seconds the pages he is interested in, the simple fact that anyoane can could read books which will never be translated in his mother language,,, O, this is very exciting! I for one I’m looking forward to reading Holderlin, for instance, an author that is not translated in my country and never will be. More free information, more more intelligent people.
Google is soon to be the eighth wonder of the modern world. Royal Library of Alexandria all over again…best to keep the Romans away from it perhaps.
Howdy! Great site. Great content. Great! I can recommend this site to others!