Approximate Nearest Neighbor (ANN) search is a fundamental algorithmic problem, with numerous applications in many areas of computer science. In this work, we propose indexable distance estimating codes (iDEC), a new solution framework to ANN that extends and improves the locality sensitive hashing (LSH) framework in a fundamental and systematic way. Empirically, an iDEC-based solution has a low index space complexity of O(n) and can achieve a lowaverage query time complexity of approximately O(log n). We show that our iDEC-based solutions for ANN in Hamming and edit distances outperform the respective state-of-theart LSH-based solutions for both in-memory and externalmemory processing. We also show that our iDEC-based in-memory ANN-H solution is more scalable than all existing solutions. We also discover deep connections between Error-Estimating Codes (EEC), LSH, and iDEC.
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Computer Science(all)