Universal Lossless Compression of Graph-indexed Data


Venkat Anantharam


University of California at Berkeley
Department of Electrical Engineering
and Computer Science
271 Cory Hall
Berkeley, California 94720-1770
United States of America


Thursday, 30 March 2017, 16:00 to 17:00



Many modern data sources arising from social networks, biological data, etc. are best viewed as indexed by combinatorial structures such as graphs, rather than as time series. The local weak limit theory for sparse graphs, also known as the objective method, provides a framework in which to make sense of what one might mean by a stationary stochastic process indexed by graphs. The theory of time series is the engine driving an enormous range of applications in areas such as control theory, communications, information theory and signal processing, to name just a few elds. It is to be expected that a theory of stationary stochastic processes indexed by combinatorial structures, in particular graphs, would eventually have a similarly wide ranging impact.

Consider a graph whose vertices and edges carry marks, viewed as data samples associated to the vertex or the edge respectively. For example the vertices might be individuals, the vertex mark might describe the musical taste of the individual and the edge mark might describe the degree of anity between the corresponding pair of individuals. We pose the problem of representing the information content of such a marked graph in the most ecient way possible. Further, we would like to do this in a universal sense, i.e., in eect, without making any statistical assumptions about the data sample being compressed. It turns out that one can make precise sense of this question in the language of the local weak limit theory, the only assumption being that the graph is sparse, i.e. that the number of edges is on the same scale as the number of vertices. What is more, the compression can be done in the most ecient way possible. Namely one can make sense of a notion of entropy associated to the local weak limit, which is on a linear scale in the number of nodes, and one can compress down to this entropy in a universal sense. The entropy notion we work with is a generalized version of the one introduced by Bordenave and Caputo (2014).

All the necessary background from the theory of local weak limits that is needed will be developed during the talk (joint work with Payam Delgosha).