Retrieval at Scale | Drop for 2026-05-18

TL;DR

Milvus 2.6.16 ships stability/perf wins (compaction, resource isolation, failover) that help hybrid/vector stacks keep p95/p99 in check.
Weaviate 1.37.4 fixes async-replication race conditions and adds server‑side guardrails—useful for multi‑tenant, high‑QPS clusters.
Elastic 9.4.1 is a recommended patch across the stack—apply if you run Lucene‑based filtered‑ANN/hybrid search.
OpenSearch‑VL releases an open recipe for multimodal (vision‑language) search agents—useful patterns for retrieval that spans text+images.
A new paper argues for a unified (Postgres+pgvector) “data layer” for production RAG, with design guidance for ops/scale.

Key facts and current state of the topic
- Milvus is a common first‑stage ANN engine in hybrid pipelines; recent 2.6.x releases focused on filtering, precision formats, and ops stability. (github.com)
Important context and background information
- Compaction, resource isolation, and failover behavior often dominate tail latency and availability at ad‑scale.
Recent developments or changes
- v2.6.16 (May 14) improves L0 compaction (higher deltalog cap), introduces streaming‑node resource‑group isolation with a new inspection endpoint, and hardens proxy query failover; it also fixes delete consistency, replica scaling, and rolling‑upgrade edge cases. Plan staged rollouts on heavy‑ingest shards. (github.com)

Key facts and current state of the topic
- Weaviate 1.37 added MMR diversity, per‑shard query profiling, and incremental backups; 1.37.x patches have focused on high‑QPS stability. (weaviate.io)
Important context and background information
- Replication scheduling and quota/usage limits are frequent sources of p95 spikes and noisy‑neighbor effects in multi‑tenant vector collections.
Recent developments or changes
- 1.37.4 (May 14) fixes race conditions in the async‑replication scheduler, adds “CompareDigests” to replication, and introduces usage‑limit guardrails for objects/collections/tenants/shards—useful for predictable tails. (github.com)

Key facts and current state of the topic
- Elastic 9.4.0 GA brought GPU‑accelerated vector indexing (cuVS), faster DiskBBQ, and new low‑bit quantizers; 9.4.1 is the next patch. (github.com)
Important context and background information
- Many production candidate stages (filtered HNSW/binary quantization) run atop Elasticsearch/Lucene; minor patches frequently include fixes that affect stability or performance.
Recent developments or changes
- On May 13, Elastic published 9.4.1 and recommends upgrading; consult product‑level notes before rolling through managed/self‑hosted clusters. (elastic.co)

Key facts and current state of the topic
- Multimodal (vision‑language) retrieval is increasingly relevant for commerce and ads use cases (images + text).
Important context and background information
- Agentic planners paired with strong retrieval can lift recall on complex, cross‑modal tasks; open recipes lower the barrier to replicating results.
Recent developments or changes
- A May 6 preprint details “OpenSearch‑VL,” an open recipe (with data/code to be released) for building multimodal search agents on OpenSearch—useful patterns to port into existing retrieval stacks. (arxiv.org)

Key facts and current state of the topic
- Many RAG systems sprawl across bespoke vector stores and ETL; operational complexity often gates scale/freshness.
Important context and background information
- Postgres + pgvector are increasingly used for retrieval, filters, and joins; recent security/robustness updates (e.g., pgvector 0.8.2) have targeted production readiness. (postgresql.org)
Recent developments or changes
- A May 5 paper proposes a unified data layer built on Postgres with native vector search (pgvector/HNSW), discussing trade‑offs and showing competitive performance for production RAG—actionable guidance if you’re consolidating infra. (arxiv.org)