Detecting outliers in high dimensional data is a central problem in data analysis. Principal Component Analysis (PCA) is commonly used both for defining and detecting outliers: points that are far from a low-dimensional principal subspace are considered to be outliers. While there has been a huge amount of work on speeding up PCA, if the points come from a very high-dimensional space, then PCA computations can be costly, especially in the streaming setting.
The Johnson-Lindenstrauss projection naturally suggests itself as a tool for reducing the dimensionality. How well does the JL transform preserve the property of being an outlier? In this talk, we will present some recent work with Vatsal Sharan (Stanford) and Udi Wieder (VMware) about this question.