Retrieval at Scale | Drop for 2025-11-24

TL;DR

Since Nov 6, 2025: Qdrant 1.16 adds ACORN-style filtered ANN and a disk‑efficient “inline storage” HNSW mode; Milvus 2.6.6 ships a Boost Ranker plus major scalar‑filtering and new datatypes; Faiss tagged v1.13.0 and, on main, landed PANORAMA-style verification acceleration and scalar‑quantizer optimizations; late‑interaction research proposes token‑importance weighting for multi‑vector scoring; and a new “AMER” retriever generates multiple query vectors to capture multi‑modal relevance.

Qdrant 1.16: ACORN filtered‑ANN + disk‑friendly “inline storage” for HNSW

  • Key facts and current state of the topic
    • Qdrant 1.16 (Nov 19) adds ACORN-style traversal for filtered ANN and introduces “inline storage,” embedding quantized vectors directly in HNSW nodes to reduce random I/O for on‑disk search. Also ships tiered multitenancy and full‑text upgrades. (qdrant.tech)
  • Important context and background information
    • ACORN approaches address the latency/recall cliff under selective filters by exploring 2‑hop neighborhoods; inline storage targets predictable performance when indexes don’t fit RAM. (qdrant.tech)
  • Recent developments or changes
    • Benchmarks show large accuracy gains under multi‑filter search with ACORN, and 10×+ QPS vs. non‑inline at low‑RAM settings for on‑disk HNSW (with quantization). Evaluate per‑query ACORN and enable inline storage when running quantized vectors from SSD. (qdrant.tech)

Milvus 2.6.6 (Nov 21): Boost Ranker, geospatial/timestamptz, and faster scalar filtering

  • Key facts and current state of the topic
    • 2.6.6 introduces a Boost Ranker to re‑score ANN candidates using optional predicate matches, plus new Geometry (RTREE) and TIMESTAMPTZ types with filter support. (milvus.io)
  • Important context and background information
    • Hybrid (vector + metadata) stacks benefit from lightweight in‑engine re‑ranking before heavier learned rankers; richer scalar types and faster term expressions stabilize tail latency. (milvus.io)
  • Recent developments or changes
    • Release notes highlight scalar‑filter optimizations, prefetch for sealed non‑indexed segments, and other performance fixes. If you use Milvus for filtered candidate gen, test Boost Ranker as a cheap second stage. (milvus.io)

Faiss latest: v1.13.0 tag; PANORAMA and SQ optimizations landed on main

  • Key facts and current state of the topic
    • Faiss v1.13.0 was tagged Nov 12. Shortly after, upstream main integrated PANORAMA into an HNSW “Panorama” variant and optimized the scalar quantizer. These changes target the verification stage and encoding efficiency. (github.com)
  • Important context and background information
    • PANORAMA accelerates the final distance‑computation/refinement stage across common ANN indexes; bringing it into Faiss reduces end‑to‑end latency without recall loss (per the paper; validate on your embeddings). (github.com)
  • Recent developments or changes
    • If you need the new kernels immediately, build from main; otherwise watch for a point release including these commits. Benchmark vs. your current IVFPQ/HNSW settings at production recall targets. (github.com)

Late interaction: token‑importance weighting for multi‑vector scoring

  • Key facts and current state of the topic
    • New arXiv work (Nov 20) augments ColBERT‑style scoring by learning/assigning per‑query‑token weights, improving expressiveness while keeping the multi‑vector representation fixed. (arxiv.org)
  • Important context and background information
    • Classic ColBERT sums per‑token MaxSim uniformly; weighting high‑value tokens can better align with relevance while preserving late‑interaction efficiency. (arxiv.org)
  • Recent developments or changes
    • Reported +1.28% R@10 zero‑shot on BEIR with IDF weights and +3.66% with few‑shot fine‑tuning. Low‑risk to prototype as a drop‑in scoring adjustment atop existing ColBERT/XTR indexes. (arxiv.org)

Beyond single vectors: AMER generates multiple query embeddings per request

  • Key facts and current state of the topic
    • AMER (Nov 4) autoregressively generates multiple query vectors to capture multimodal relevance modes, outperforming single‑vector retrievers on synthetic and multi‑answer datasets. (arxiv.org)
  • Important context and background information
    • Many production queries are multi‑intent; multi‑query embeddings can increase first‑stage recall without switching to heavy late‑interaction for the candidate stage. (arxiv.org)
  • Recent developments or changes
    • Authors report 4× gains on synthetic multimodal targets and consistent in‑domain lifts. Consider AMER‑style candidate generation feeding a re‑ranker when recall is bottlenecked by single‑vector queries. (arxiv.org)