$D^2$ Sampling and the $k$-means Problem


Amit Kumar


Indian Institute of Technology
Department of Computer Science
and Engineering
Hauz Khas
New Delhi 110016


Tuesday, 29 March 2016, 16:00 to 17:00



Abstract: Given a set of points $P$ in a $d$-dimensional Euclidean space, the $k$-means clustering problem seeks to find a set $C$ of $k$ centers such that the sum over all points in $P$ of the square of the distance to the closest center in $C$ is minimized. A natural random sampling based heuristic for finding the $k$ centers is as follows -- pick the first center uniformly at random from the set of points, the next one with probability proportional to the square of the distance from the first center and so on. In this talk, I will survey algorithms for the $k$-means problem based on this random sampling technique. I will show that this idea can be used to obtain fast PTAS for this problem. We extend this result to constrained $k$-means clustering problems, where there may be additional constraints on  valid clusterings of the input points (this is joint work with Anup Bhattacharya and Ragesh Jaiswal).