Retrieval at Scale | Drop for 2026-03-07

TL;DR

Five notable retrieval updates (Feb 27–Mar 7, 2026): (1) Faiss 1.14.x ships new PANORAMA variants, RaBitQ FastScan improvements, ARM SVE distance kernels, and IVF/HNSW refinements; (2) Weaviate 1.36 adds HFresh, a disk‑oriented, SPFresh‑inspired index (preview), and promotes server‑side batching, TTL, and async replication to GA; (3) Azure AI Search’s 2025‑03‑01‑preview API adds multi‑vector embeddings and binary‑quantization rescoring controls; (4) CatapultDB (research) exploits query locality with “catapult” edges to speed DiskANN‑class search; (5) VectorMaton (research) supports vector search with LIKE/substring constraints via an enhanced suffix automaton.

Faiss 1.14.0–1.14.1: more speed/accuracy levers for IVF/HNSW and binary quantization

  • Key facts and current state of the topic
    • Faiss remains a primary ANN workhorse on CPU/GPU; recent cycles focused on verification speed (PANORAMA), binary/multi‑bit RaBitQ, and GPU interop. (github.com)
  • Important context and background information
    • Faster verification and higher‑fidelity quantization directly raise recall under fixed latency—especially with filters or hybrid pipelines. (github.com)
  • Recent developments or changes
    • v1.14.0 (Mar 2) adds IndexFlatIPPanorama, ARM SVE distance support, InvertedListScanner for IVFRaBitQFastScan, improved k‑means (k‑means++, AFK‑MC²), and SIMD refactors; v1.14.1 (Mar 6) further optimizes multi‑bit RaBitQ inner‑product scoring and fixes build/compat issues. Plan A/Bs vs. your current PQ/RaBitQ/HNSW settings. (github.com)

Weaviate 1.36: HFresh (preview) + server‑side batching/TTL/async‑replication GA

  • Key facts and current state of the topic
    • Weaviate continues to operationalize filtered‑ANN and multi‑vector pipelines; 1.36 introduces HFresh, a disk‑oriented index inspired by SPFresh. (weaviate.io)
  • Important context and background information
    • HNSW excels in-RAM but becomes costly at billion scale; HFresh partitions vectors into on‑disk postings searched via a centroid HNSW, targeting fresher, lower‑RAM retrieval. (weaviate.io)
  • Recent developments or changes
    • 1.36 (Mar 3) ships HFresh (tech preview) and promotes to GA: server‑side batching (flow‑controlled ingest), object TTL (lifecycle management), async replication (per‑collection controls), “drop inverted indices,” and cancelable backup restore. Useful for cost/QPS and freshness in multi‑stage stacks. (weaviate.io)

Azure AI Search (Mar preview): multi‑vector embeddings + binary‑rescore support

  • Key facts and current state of the topic
    • Azure AI Search added a new preview data‑plane API (2025‑03‑01‑preview) with vector/RAG improvements relevant to production search. (learn.microsoft.com)
  • Important context and background information
    • Production pipelines increasingly mix late‑/multi‑vector signals and heavy compression; platform‑level support simplifies deployment and tuning. (learn.microsoft.com)
  • Recent developments or changes
    • New: explicit support for multi‑vector embeddings in the REST API and rescoring of binary‑quantized results using full‑precision vectors (enableRescoring / discardOriginals). Also lists broader preview changes across facets/aggregation. (learn.microsoft.com)
  • Key facts and current state of the topic
    • Graph ANN typically ignores workload locality, re‑traversing similar paths per query. CatapultDB adds a lightweight “catapult” layer that routes queries to better entry points. (arxiv.org)
  • Important context and background information
    • Approach preserves existing features (filters, dynamic inserts, disk‑resident indexes) by layering on top of the base graph rather than changing its algorithm. (arxiv.org)
  • Recent developments or changes
    • Reported results show up to 2.51× throughput vs. DiskANN at comparable recall and graceful adaptation to workload shifts—promising for high‑QPS, skewed ad/search traffic. (arxiv.org)

VectorMaton (Mar 2): pattern‑constrained vector search via enhanced suffix automaton

  • Key facts and current state of the topic
    • Many production queries need vector similarity with sequence/text constraints (e.g., LIKE/CONTAINS). VectorMaton couples pattern filtering with ANN in one index. (arxiv.org)
  • Important context and background information
    • Integrating pattern predicates reduces multi‑stage overhead (vector → text) and can improve precision without extra joins in sequence‑rich domains. (arxiv.org)
  • Recent developments or changes
    • Authors report up to 10× higher query throughput at the same accuracy and up to 18× smaller indexes vs. baselines—worth piloting where substring constraints co‑occur with vector search. (arxiv.org)