Abstract: We are confronted with very high dimensional data sets. As a result, methods of dealing with high dimensional data have become prominent. One geometrically motivated approach for analyzing data is called manifold learning. The underlying hypothesis of this subfield of machine learning is that high dimensional data tend to lie near a low dimensional manifold. However, the basic question of understanding when data lies near a manifold is poorly understood. I will describe joint work with Charles Fefferman and Sanjoy Mitter on developing a provably correct algorithm to test this hypothesis using i.i.d samples from an arbitrary distribution supported in the unit ball in a Hilbert space.