Bridging the Structured-Unstructured Gap: Searching the Annotated Web


Soumen Chakrabarti Indian Institute of Technology Department of Computer Science and Engineering Powai M


Wednesday, 24 March 2010 (All day)


Over 99% of queries to Web search engines contain a noun, often referring to an entity. Catalogs like WordNet and Wikipedia list millions of well-known entities. Bootstrapping techniques may help us expand that to hundreds of millions of people, millions of locations, books, songs, and other artifacts.

The second decade of Web search will represent, index, query and rank in a fine-grained graph-structured setting where dozens to hundreds of tokens on each Web page may be linked to entities, which in turn have attributes as well as types, subclass and other relational linkages.

We will discuss recent advances in databases, indexing, proximity models in information retrieval, information extraction, graph search and mining, and machine learning for ranking that are coming together to make possible this new generation of search engines.