Tata Institute of Fundamental Research

Distinct Elements in Streams: An algorithm for the (text) book

STCS Seminar
Speaker: Kuldeep S. Meel (University of Toronto)
Organiser: Umang Bhaskar
Date: Tuesday, 28 Nov 2023, 10:00 to 11:30
Venue: via Zoom in A201

(Scan to add to calendar)
Abstract: 

Given a data stream of m elements, the Distinct Elements problem is to estimate the number of distinct elements in the stream. Distinct Elements has been a subject of theoretical and empirical investigations over the past four decades resulting in space-optimal algorithms for it. However, all the current state-of-the-art algorithms are often difficult to analyze or impractical.

I will present a simple, intuitive, sampling-based space-efficient algorithm whose description and the proof are accessible to undergraduates with a knowledge of basic probability theory.

In addition to the simplicity, the approach has significant theoretical and practical implications: our approach allowed us to resolve the open problem of (Discrete) Klee's Measure Problem in the streaming setting and build a state-of-the-art DNF counter in practice.