$D^2$ Sampling and the $k$-means Problem

Speaker:

Amit Kumar

Organiser:

Prahladh Harsha

Date:

Tuesday, 29 Mar 2016, 16:00 to 17:00

Venue:

A-201 (STCS Seminar Room)

Category:

STCS Colloquium

event.ics Print

Abstract

Abstract: Given a set of points $P$ in a $d$-dimensional Euclidean space, the $k$-means clustering problem seeks to find a set $C$ of $k$ centers such that the sum over all points in $P$ of the square of the distance to the closest center in $C$ is minimized. A natural random sampling based heuristic for finding the $k$ centers is as follows -- pick the first center uniformly at random from the set of points, the next one with probability proportional to the square of the distance from the first center and so on. In this talk, I will survey algorithms for the $k$-means problem based on this random sampling technique. I will show that this idea can be used to obtain fast PTAS for this problem. We extend this result to constrained $k$-means clustering problems, where there may be additional constraints on valid clusterings of the input points (this is joint work with Anup Bhattacharya and Ragesh Jaiswal).