Retrieval at Scale | Drop for 2025-09-29

TL;DR

Four fresh items since mid‑September 2025: (1) Milvus 2.6.2 adds JSON Shredding, NGRAM indexing, and a Boost Ranker to speed up hybrid filtering and re‑ranking at scale; (2) FusedANN proposes a principled way to “fuse” structured attributes with vectors for filtered ANN, reporting better recall/latency; (3) NVQ introduces individualized non‑uniform vector quantization for higher‑fidelity compression; (4) Amazon OpenSearch Service rolled out OpenSearch 3.1, bringing Lucene‑10 era performance plus vector quantization and memory‑optimized search to the managed service.

Milvus 2.6.2: JSON Shredding, NGRAM indexing, and Boost Ranker for hybrid retrieval

Key facts and current state of the topic
- Hybrid (vector + metadata) filtering and LIKE queries are operational hot spots; Milvus 2.6.2 adds JSON Shredding for dynamic‑field filtering, NGRAM indexes for faster LIKE, partial upsert, and a Boost Ranker for controllable score boosts. Release date: September 19, 2025.
Important context and background information
- These features target lower tail‑latency for attribute‑heavy search while enabling flexible schema evolution and simpler re‑ranking hooks inside Milvus.
Recent developments or changes
- If you run large, filter‑heavy retrieval, evaluate JSON Shredding + NGRAM on your high‑DF fields and test Boost Ranker as a lightweight second‑stage signal before heavy learned re‑rankers.

FusedANN: convex fusion of attributes and vectors for filtered ANN

Key facts and current state of the topic
- Filtered ANN typically stitches metadata filters onto HNSW/IVF and can degrade under selective filters; FusedANN relaxes hard filters into continuous penalties, embedding attributes and vectors in a fused space with guarantees on top‑k semantics. Preprint date: September 24, 2025.
Important context and background information
- The approach aims to replace multi‑stage filter + ANN pipelines with a single approximate search while preserving approximation guarantees, claiming up to 3× throughput gains on hybrid benchmarks.
Recent developments or changes
- Worth piloting on workloads with heavy structured filters to compare against ACORN‑style filtered HNSW or pre‑filter/post‑filter strategies.

NVQ: individualized non‑uniform vector quantization for high‑fidelity search

Key facts and current state of the topic
- Vector quantization underpins large‑scale retrieval; NVQ learns a non‑uniform quantizer per indexed vector to improve accuracy in the high‑fidelity regime. Preprint date: September 22, 2025.
Important context and background information
- Compared with recent quantizers (e.g., SAQ; RaBitQ/BBQ derivatives), NVQ targets better accuracy at similar or lower compute by tailoring codebooks individually.
Recent developments or changes
- If recall at fixed latency is bottlenecked by compression error, A/B NVQ against PQ/RaBitQ/SAQ on your embeddings and ANN stack.

Amazon OpenSearch Service adds OpenSearch 3.1 (Sep 15–17): Lucene‑10 era gains, vector quantization, memory‑optimized search

Key facts and current state of the topic
- OpenSearch 3.1 is now available in Amazon OpenSearch Service (announced September 15, 2025; China regions September 17), bringing Lucene‑10 upgrades plus vector quantization and memory‑mapped “memory‑optimized” Faiss search for efficiency.
Important context and background information
- For managed clusters, this enables lower memory footprints, faster indexing, improved hybrid scoring (e.g., Z‑score normalization), and reduced latency without self‑hosting.
Recent developments or changes
- Plan controlled rollouts: validate recall/latency under quantization, and test memory‑optimized search for cost/QPS gains on production filters and traffic patterns.