Distinct Elements in Streams: An algorithm for the (text) book

Speaker:
Organiser:
Umang Bhaskar
Date:
Tuesday, 28 Nov 2023, 10:00 to 11:30
Venue:
via Zoom in A201
Category:
Abstract

Given a data stream of m elements, the Distinct Elements problem is to estimate the number of distinct elements in the stream. Distinct Elements has been a subject of theoretical and empirical investigations over the past four decades resulting in space-optimal algorithms for it. However, all the current state-of-the-art algorithms are often difficult to analyze or impractical.

I will present a simple, intuitive, sampling-based space-efficient algorithm whose description and the proof are accessible to undergraduates with a knowledge of basic probability theory.

In addition to the simplicity, the approach has significant theoretical and practical implications: our approach allowed us to resolve the open problem of (Discrete) Klee's Measure Problem in the streaming setting and build a state-of-the-art DNF counter in practice.