Streaming Algorithms for Embedding and Computing Edit Distance in the Low Distance Regime


Michal Koucky


Charles University
Department of Computer Science
Malosranske nam. 25, 118 00 Praha 1
Czech Republic


Wednesday, 6 April 2016, 16:00 to 17:00



Abstract: The Hamming and the edit metrics are two common notions of measuring distances between pairs of strings $x,y$ lying in the Boolean hypercube. The edit distance between $x$ and $y$ is defined as the minimum number of character insertion, deletion, and bit flips needed for converting $x$ into $y$. Whereas, the Hamming distance between $x$ and $y$ is the number of bit flips needed for converting $x$ to $y$.

In this talk I will present a randomized injective embedding of the editdistance into the Hamming distance with a small (qudratic) distortion. Namely, for any satisfying that their edit distance equals , the Hamming distance between the embedding of and is with high probability. This improves over the distortion ratio of $O(\log n \log^* n)$ obtained by Jowhari (2012) for small values of . Moreover, the embedding output size is linear in the input size and the embedding can be computed using a single pass over the input.

I will mention several applications for this embedding. Among our results we provide a one-pass (streaming) algorithm for edit distance running in space and computing edit distance exactly up-to distance. This algorithm is based on kernelization for edit distance that is of independent (joint work with Diptarka Chakraborty and Elazar Goldenberg).