Abstract: High dimensional problems are ubiquitous in machine learning and many other fields such as biology, finance, environmental science, medicine, etc. For example, a 5 mega pixel image can be thought as a vector in 5 million dimensional vector space, similarly a document can also be considered as a vector indexed by words in high dimensional vector space. Dimension reduction is a well studied problem in machine learning. The main reason behind studying this has been three folds: i) to avoid the curse of dimensionality, ii) to develop efficient algorithms, and iii) to develop methods for visualization of high dimensional data. In this talk we will discuss three problems about dimension reduction where the focus of the first one is on classification and the other two is on visualizing high dimensional data.
1. In the first part we will discuss how mutual information can be used to reduce the output dimensions.
2. We study a generalization of the well known data visualization technique t-distributed stochastic neighbor embedding (t-SNE).
3. We present a new algorithm for embedding large dimensional data into two or three dimensions for visualization.