Latent Dirichlet Allocation forText Segmentation

Hemant Misra Xerox Research Center Europe France
Tuesday, 28 Dec 2010 (all day)
A-212 (STCS Seminar Room)
In this presentation, first we visit latent Dirichlet allocation (LDA), an unsupervised topic model, and propose its application for the task of text segmentation. The proposed methodology has state-of-the-art performance on a benchmark database, is able to perform segmentation in an online manner, and assigns a meaningful topic distribution to each segment. The last point is particularly interesting for information retrieval at segment level. Another important discussion will be on how the computational cost associated with the dynamic programming (DP) algorithm typically used for the search can be reduced by a factor of more than 95%, and the usability of this result to the entire domain of text segmentation.