Retrieval at Scale | Drop for 2026-05-03

TL;DR

Five fresh items since your last drop: (1) Milvus 2.6.15 (Apr 24) improves hybrid search and observability, adds requery controls and BM25 stats buffering; (2) Weaviate 1.37.2 (Apr 23) hardens TTL, speeds compressed‑vector rescoring, and tweaks HFresh distance; (3) Elastic Stack 9.3.4 (Apr 30) ships a maintenance/security update across Elasticsearch/Kibana; (4) a new arXiv note (Apr 21) clarifies RaBitQ vs. TurboQuant binary quantization trade‑offs for vector search; (5) Azure AI Search adds a preview knowledgeRetrieval property (late April docs), relevant to RAG/retrieval planning.

Milvus 2.6.15: hybrid‑search tweaks, requery control, and better observability

Key facts and current state of the topic
- Milvus is widely used as a first‑stage ANN engine in hybrid stacks; recent 2.6.x builds focused on filtering, FP16/BF16 conversion, and stability. (milvus.io)
Important context and background information
- Hybrid (vector + lexical/filter) pipelines benefit from predictable “requery” behavior and accurate latency metrics when rescoring ANN candidates. (milvus.io)
Recent developments or changes
- v2.6.15 (Apr 24, 2026) adds microsecond‑precision metrics, separates requery latencies from user queries, introduces a configurable requery policy, buffers BM25 IDF preloads to cut I/O, and fixes several correctness issues (e.g., offset corruption on disk indexes). Consider upgrading if you rely on hybrid rescoring and attribute filters. (milvus.io)

Weaviate 1.37.2: TTL stability, compressed‑vector cache, and HFresh scoring

Key facts and current state of the topic
- Weaviate 1.37 brought Diversity (MMR), per‑query profiling, a built‑in MCP server, and incremental backups; 1.37.x patches continue hardening high‑QPS operations. (github.com)
Important context and background information
- Multi‑vector and compressed‑HNSW setups are sensitive to verification/cache behavior and TTL/replication edge cases. (github.com)
Recent developments or changes
- v1.37.2 (Apr 23, 2026) improves compressed‑vector index cache, fixes TTL startup/race issues, optimizes replication zstd usage, and switches HFresh to asymmetric distance computation—useful for lower tails and better preview‑index behavior. (github.com)

Elastic Stack 9.3.4: maintenance/security update (Apr 30)

Key facts and current state of the topic
- Elasticsearch underpins many hybrid vector + lexical stacks; timely minors matter for stability and security posture. (elastic.co)
Important context and background information
- 9.3.0 introduced GPU‑accelerated vector indexing (cuVS) and other AI features; 9.3.4 is a patch release. (infoq.com)
Recent developments or changes
- Elastic announced 9.3.4 GA on April 30, 2026; review release notes before upgrading clusters that serve candidate generation or filtered ANN. (elastic.co)

RaBitQ vs. TurboQuant: clarifying binary quantization choices

Key facts and current state of the topic
- Binary/multi‑bit quantizers (e.g., RaBitQ) are increasingly used to raise recall at fixed latency/memory; TurboQuant is a recent alternative. (github.com)
Important context and background information
- Selecting a quantizer affects both index build time and verification accuracy; prior reports lacked a unified comparison. (github.com)
Recent developments or changes
- A new technical note (Apr 21, 2026) offers a symmetric comparison of RaBitQ and TurboQuant, aligning theory and experiments. Use it to inform PQ/BQ/RaBitQ choices and FastScan/refinement settings in your ANN layer. (arxiv.org)

Azure AI Search: preview knowledgeRetrieval property for service configs

Key facts and current state of the topic
- Azure AI Search recently expanded vector features (multi‑vector support, binary‑quantization rescoring, strict post‑filter). (learn.microsoft.com)
Important context and background information
- Platform‑level RAG/retrieval knobs reduce glue code and help standardize behavior across environments. (learn.microsoft.com)
Recent developments or changes
- The 2026‑03‑01‑preview API adds a knowledgeRetrieval property on the service definition (docs updated in late April), complementing earlier vector features. Evaluate if you’re centralizing RAG configuration and want consistent retrieval defaults. (learn.microsoft.com)