8.24.2005Get over it.
Quentin Hardy, 09.05.05
Type a phrase into Google and, in an instant, it pores over an astounding 8 billion Web pages. Peter Norvig is haunted by the prospect of what it misses. As Google's director of search quality and research (a doozy of a job description), Norvig spurs on 140 scientists and engineers racing to add more depth, speed and relevance to the world's best search engine.
E-mails, out-of-print books, blogs, research papers in Arabic--any of them might contain something useful to someone. Yet a search engine accesses only 25% of all online data; the rest is out of reach. So Norvig's group designs tools to scan the contents of public libraries, crafts translators that convert foreign-language documents and creates ways to store and index e-mails cheaply.
His Google geeks also work on improving mapping technology and the ability to recognize image content. Norvig pays a small team of contractors to just do random searches all day to test the Google engine, in the hope of teaching it to learn on its own.
What makes improving search quality so complex, Norvig says, is "the uncertainty about a right answer. There is a lot of human intuition in the loop." His hope is to inject a lot more machine intelligence into that loop.
Norvig arrived at Google in 2001, bringing serious artificial intelligence chops to a company still run in seat-of-the-pants fashion. He had spent three years at NASA's Ames Research Center, where he did the early work on the artificial intelligence that steered the Mars Rover. His 1996 book on AIis considered the standard in the field.
Google could access 2 billion pages when Norvig arrived, small enough to let a handful of engineers fine-tune it. He set about broadening the ideas that make Google work. Its benchmark for what makes a page relevant as a search result is defined by the number and quality of other sites that link to it.
Now Google's statisticians develop algorithms that look at how closely one query links to another and how groups of queries interact. Studying word "clusters"helps determine whether a search term like "Blondie" means the comic strip or the punk-pop band from the 1980s. Norvig's crew also aims to accelerate results by learning which irrelevant words (like "like") to discard when indexing a Web page.
Norvig's group is pursuing video search and personalized search, as well as a program to index data from library books and photos. It designed optical scanning software that can tell when a book page is creased and correct it on the fly. "All of humanity is working for us," he says. "We just have to decipher it."
The internet does not represent the aggregate knowledge of mankind. The internet represents the total knowledge that we have allowed our machines to have access to. It is important that we don't forget this.