Prism: Reducing Arithmetic for High-Recall Approximate Nearest Neighbor Search on Processing-in-Memory
Weihan Kong, Shengan Zheng*, Yingxue Zhou, Yifan Hua, Yuheng Wen, Cong Zhou, Guifeng Wang, Linpeng Huang*
Published in Design Automation Conference (DAC), 2026
Abstract: Approximate Nearest Neighbor Search (ANNS) at scale is constrained by the memory wall. By moving compute to memory, Processing-in-Memory (PIM) offers ample internal bandwidth but limited on-die compute, making arithmetic reduction crucial. We propose Prism, a PIM-based ANNS system co-optimizing vector pruning, distance evaluation, and host-PIM orchestration. It employs a proximity-aware vector pruner to leverage high intra-PIM bandwidth and dual-cluster affiliations to filter out distant vectors. Prism then performs sensitivity-ordered distance computation, prioritizing high-impact dimension segments and early terminating candidates once exclusion criteria are met. A stall-free host-PIM pipeline overlaps query preparation, PIM execution, and global ranking to sustain high throughput. Experiments show that Prism achieves 2.7-19.8× higher throughput over state-of-the-art systems.
