Understanding Global Properties of Data Sets from Local Observations


Sofya Raskhodnikova


The Pennsylvania State University & Boston University


Tuesday, 25 February 2014, 10:00 to 11:00



Suppose we are given a list of numbers and we wish to determine whether it is sorted in increasing order. That problem obviously requires reading the entire list. However, it turns out that if we know in advance that our list is either sorted or far from sorted, we can perform the test by examining only a small portion of the list. This is an example of a global property of a data set that we can understand by making a few local observations. As data of all types gets easier to obtain and cheaper to store, data sets are becoming increasingly large. Consequently, there is a need to perform computational tasks on massive data sets. What useful computations can be performed on a data set when reading all of it is prohibitively expensive? This question, fundamental to several fields, is at the heart of a research area, called Sublinear Algorithms, that has provided important insights into fast approximate computation. In this talk, we will give a few examples of specific problems that can be solved while making only local observations, starting with the sorting example and moving on to simple analysis of images, comparing and compressing documents, and understanding properties of functions. We will also present an application of these ideas to an area (namely, data privacy), where extreme efficiency is not a requirement per se, but helps to guarantee other properties. Speaker Bio: Sofya Raskhodnikova is an associate professor of Computer Science and Engineering at Penn State. She received her Ph.D. from MIT. Prior to joining Penn State in 2007, she was a postdoctoral fellow at the Hebrew University of Jerusalem and the Weizmann Institute of Science, and a visiting scholar at the Institute for Pure and Applied Mathematics at UCLA. Currently, she is on sabbatical at Boston University. Dr. Raskhodnikova works in the areas of randomized and approximation algorithms. Her main interest is the design and analysis of sublinear-time algorithms for combinatorial problems. Sublinear algorithms produce approximate answers after examining only a tiny portion of the data. Such algorithms are important for dealing with massive data sets.