Abstract:Approximate nearest neighbor (ANN) search in high-dimensional metric spaces is a fundamental problem with many applications. Over the past decade, proximity graph (PG)-based indexes have demonstrated superior empirical performance over alternatives. However, these methods often lack theoretical guarantees regarding the quality of query results, especially in the worst-case scenarios. In this paper, we introduce the {\alpha}-convergent graph ({\alpha}-CG), a new PG structure that employs a carefully designed edge pruning rule. This rule eliminates candidate neighbors for each data point p by applying the shifted-scaled triangle inequalities among p, its existin…
Abstract:Approximate nearest neighbor (ANN) search in high-dimensional metric spaces is a fundamental problem with many applications. Over the past decade, proximity graph (PG)-based indexes have demonstrated superior empirical performance over alternatives. However, these methods often lack theoretical guarantees regarding the quality of query results, especially in the worst-case scenarios. In this paper, we introduce the {\alpha}-convergent graph ({\alpha}-CG), a new PG structure that employs a carefully designed edge pruning rule. This rule eliminates candidate neighbors for each data point p by applying the shifted-scaled triangle inequalities among p, its existing out-neighbors, and new candidates. If the distance between the query point q and its exact nearest neighbor v* is at most {\tau} for some constant {\tau} > 0, our {\alpha}-CG finds the exact nearest neighbor in poly-logarithmic time, assuming bounded intrinsic dimensionality for the dataset; otherwise, it can find an ANN in the same time. To enhance scalability, we develop the {\alpha}-convergent neighborhood graph ({\alpha}-CNG), a practical variant that applies the pruning rule locally within each point’s neighbors. We also introduce optimizations to reduce the index construction time. Experimental results show that our {\alpha}-CNG outperforms existing PGs on real-world datasets. For most datasets, {\alpha}-CNG can reduce the number of distance computations and search steps by over 15% and 45%, respectively, when compared with the best-performing baseline.
Comments: | Accepted to ACM SIGMOD 2025. This is the preprint version before camera-ready submission |
Subjects: | Data Structures and Algorithms (cs.DS) |
Cite as: | arXiv:2510.05975 [cs.DS] |
(or arXiv:2510.05975v1 [cs.DS] for this version) | |
https://doi.org/10.48550/arXiv.2510.05975 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Binhong Li [view email] [v1] Tue, 7 Oct 2025 14:29:54 UTC (328 KB)