Indian Institute of Technology
Department of Computer Science
Web search has come a long way from matching query words with document words. It is now mediated by knowledge graphs (KGs) such as Freebase, having hundreds of millions of entities belonging to tens of thousands of types, connected by billions of relations. Also essential is to annotate token spans in the Web corpus with canonical types (e.g. `scientist') and entities (e.g. `m.0jcx', Freebase's unique ID for Albert Einstein). Armed with suitable indexes and ranking functions, we can now search for ``scientists who played the violin'', but only if the search engine understands that `scientists' is the target type, `violin' is a grounded entity, and `played' is the connecting relation. We will review recent dramatic improvements in the techniques search engines use to infer these diverse roles of query words, by jointly exploiting knowledge graphs and corpus annotations. The best techniques use neural networks to compare KG and corpus neighborhoods of candidate entities against possibly overlapping query segments.
Time permitting, I will conclude with emerging work on more complex queries such as ``countries having more rivers than India'' or ``how old was Indira Gandhi when Rahul Gandhi was born''. Answering such queries involves breaking them down into subtasks and performing nontrivial reasoning on the subtask responses.
Bio: Soumen Chakrabarti is a Professor of Computer Science at IIT Bombay. He got his PhD from University of California, Berkeley and worked on Clever Web search and Focused Crawling at IBM Almaden Research Center. He has also worked at Carnegie-Mellon University and Google. He works on linking unstructured text to knowledge bases and exploiting these links for better search and ranking. Other interests include link formation and influence propagation in social networks, and personalized proximity search in graphs. He has published extensively in WWW, SIGKDD, EMNLP, VLDB, SIGIR, ICDE and other conferences. His work on keyword search in databases got the 10-year influential paper award at ICDE 2012. He is also the author of one of the earliest books on Web search and mining.