$D^2$ Sampling and the $k$-means Problem

Prahladh Harsha
Tuesday, 29 Mar 2016, 16:00 to 17:00
A-201 (STCS Seminar Room)
Abstract: Given a set of points $P$ in a $d$-dimensional Euclidean space, the $k$-means clustering problem seeks to find a set $C$ of $k$ centers such that the sum over all points in $P$ of the square of the distance to the closest center in $C$ is minimized. A natural random sampling based heuristic for finding the $k$ centers is as follows -- pick the first center uniformly at random from the set of points, the next one with probability proportional to the square of the distance from the first center and so on. In this talk, I will survey algorithms for the $k$-means problem based on this random sampling technique. I will show that this idea can be used to obtain fast PTAS for this problem. We extend this result to constrained $k$-means clustering problems, where there may be additional constraints on  valid clusterings of the input points (this is joint work with Anup Bhattacharya and Ragesh Jaiswal).