Google Suggest, Google & Libraries

This is a few days old and many folks have already made reference to it, but Google Suggest is very cool feature that bears mentioning.

But, even cooler, is Google’s announcement of their ambitious plan to digitize the collections of Stanford, Harvard, Oxford, the University of Michigan and the New York Public Library. Among these libraries are as many as 50 million books, though it isn’t clear how much duplication there is among them. In any case, assuming an average count of 200 pages per book, which is probably low, you could wind up with 10 billion pages in the index once the task is complete (though it is not actually possible to complete the task given the number of new books published every year). Compared to the 8 billion web pages they have indexed today, that’s pretty impressive, though by the time they complete the digitization over many years, the web index will no doubt have grown well beyond its current girth. In any case, one can see it is possible that in time this library index will rival the size of the web index.

One interesting feature of an indexed corpus of print media is that it lacks the hyperlinks among the pages that enable Google to deliver their highly relevant search results. The quality of Google’s search technology stems from the basic insight of the PageRank algorithm: that the link structure of the web itself is a useful determinant of the quality and relevancy of pages that match particular keywords. Of course, counting the “votes” from links among pages is not the only technique they employ, but it is a big part of their secret sauce.

It will be fascinating to see how the results from these library queries change over time, since presumably once these books are made available digitally, web pages will increasingly link into the books that are hosted on Google’s servers, which is why Google’s decision to do this is doubly brilliant; not only will they see increased search traffic due to these collections becoming freely available, but they will also see huge amounts of page views that result from all the links that accumulate on the web that will point into the library collections.

This is a great example of enlightened capitalism that follows naturally from Google’s simple yet audacious mission statement, which is “to organize the world’s information and make it universally accessible and useful.” Kudos to them.

December 16th, 2004     Categories: Web/Tech