Faster and Space Efficient Indexing for Locality Sensitive Hashing

Speaker:
Organiser:
Raghuvansh Saxena
Date:
Tuesday, 14 Oct 2025, 16:00 to 17:00
Venue:
A-201 (STCS Seminar Room)
Category:
Abstract

This work suggests faster and space-efficient index construction algorithms for LSH for Euclidean distance (a.k.a. E2LSH) and cosine similarity (a.k.a. SRP). The index construction step of these algorithms relies on grouping data points into several bins of hash tables based on their hashcode. To generate an m-dimensional hashcode of the d-dimensional data point, these LSHs first project the data point onto a d-dimensional random Gaussian vector and then discretise the resulting inner product. The time and space complexity of both E2LSH and SRP for computing an m-sized hashcode of a d-dimensional vector is O(md), which becomes impractical for large values of m and d. To overcome this problem, we propose  LSH algorithms for both Euclidean distance and cosine similarity that  reduce the hashcode computation time from O(md) to O(d), and simultaneously reduce the space complexity from O(md) to O(d). Our proposals are backed by mathematical guarantees, and we validate their performance through numerical simulations on various real-world datasets.

Short Bio:

Dr. Rameshwar Pratap is a faculty member at Department of CSE, IIT Hyderabad. He earned his Ph.D. in Theoretical Computer Science from Chennai Mathematical Institute (CMI). His research work lies in the intersection of theory and practice. Broadly his research work falls in the areas of "Algorithms for Massive Datasets", and focuses on developing  dimensionality reduction algorithms and sampling algorithms to handle the large dimensionality and volume of the datasets, respectively.