Retrieval at Scale | Drop for 2026-05-28

TL;DR

Weaviate shipped two stability patches (1.37.5–.6) and opened 1.38.0-rc.0, which promotes HFresh to GA and adds Namespaces and Nested Object Filtering—useful for fresher, disk‑oriented vector search at scale.
Milvus 2.6.17 improves load/search isolation, adds array partial‑update operators, and fixes several routing/replica bugs—good for steadier tails in hybrid pipelines.
Qdrant 1.18.1 lands post‑GA fixes (incl. I/O and security hardening) while 1.18 introduced TurboQuant, a low‑bit quantizer with improved recall vs. 1‑bit binary at similar footprint.
U‑HNSW proposes a graph ANN method that supports “universal” Lp metrics with large speedups over LSH and competitive results vs. HNSW at fixed p—promising for non‑cosine/dot similarity.
Two plug‑in routes toward late‑interaction without heavy reindexing: SMART (uses hidden states of single‑vector encoders for multi‑vector scoring) and Spectral Retrieval (multi‑scale sinc over token embeddings), both showing sizable gains in early results.

Key facts and current state of the topic
- Weaviate 1.37.x focuses on high‑QPS stability (MMR diversity, profiling, incremental backups). 1.37.5 reduces shard locking and improves backup behavior; 1.37.6 fixes HNSW panic and backup/compression edge cases. A new 1.38.0‑rc.0 release candidate arrives with HFresh moving to GA plus Namespaces and Nested Object Filtering. (github.com)
Important context and background information
- HFresh (disk‑oriented, SPFresh‑inspired) targets fresher, lower‑RAM retrieval; Namespaces and Nested Object Filtering simplify large multi‑tenant and semi‑structured workloads common in ads/search. (github.com)
Recent developments or changes
- Dated May 26–27, 2026: v1.37.5 “HFresh task priorities, reduced shard locking” and v1.37.6 “Core Stability, Backup, and Compression Fixes”; 1.38.0‑rc.0 (May 27) lists HFresh (GA), Namespaces (Preview), and Nested Object Filtering (Preview). Plan canary upgrades and RC testing for feature adoption. (github.com)

Key facts and current state of the topic
- Milvus is widely used as the ANN candidate stage in hybrid pipelines; recent 2.6.x work focused on filtering, memory, and recovery stability. (github.com)
Important context and background information
- Tail latency often comes from segment loads and query routing under churn; isolating executors and hardening routing reduces p95/p99 spikes. (github.com)
Recent developments or changes
- Released May 22, 2026: separate executor pools for load/search, async segment operations with proper cancellation, ARRAY_APPEND/ARRAY_REMOVE partial‑update ops, and fixes for stale routing/replica state and use‑after‑free paths. Upgrade recommended for hybrid re‑rank setups. (github.com)

Key facts and current state of the topic
- Qdrant 1.18 added TurboQuant (a Hadamard‑rotation‑based quantizer) plus collection memory monitoring and schema‑level vector add/remove; 1.18.1 (May 22) follows with fixes and security/validation improvements. (qdrant.tech)
Important context and background information
- TurboQuant aims to beat 1‑bit binary recall at similar footprint and approach scalar‑quantization recall at 2× compression—relevant where memory is the bottleneck for probe budgets. (qdrant.tech)
Recent developments or changes
- 1.18.1 includes io_uring‑ready changes to quantized multi‑vector scorers, stricter validation, and snapshot‑upload authorization. If you trialed TurboQuant, pick up .1 for stability. (github.com)

Key facts and current state of the topic
- Many production stacks fix on cosine/IP; some domains need Lp distances that vary by task. U‑HNSW is the first graph‑based approach targeting “universal” Lp (0<p≤2) without separate per‑p indexes. (arxiv.org)
Important context and background information
- Prior universal‑Lp solutions were LSH‑based with poor latency; reusing HNSW over L1/L2 to propose candidates, then verifying with early‑terminating Lp checks is a pragmatic compromise. (arxiv.org)
Recent developments or changes
- Paper dated May 3, 2026 reports up to 2670× faster queries than MLSH (RAM‑disk baseline) and competitive results vs. standard HNSW at fixed p. Worth tracking if your similarity metric shifts across tasks or features. (arxiv.org)

Key facts and current state of the topic
- Classic late interaction (e.g., ColBERT) boosts quality but increases index/storage and serving costs. Two new works show drop‑in reranking ideas that harvest token‑level signal without building full multi‑vector indexes. (arxiv.org)
Important context and background information
- SMART exposes multi‑vector structure from a single‑vector encoder’s hidden states and applies late‑interaction at inference; Spectral Retrieval uses multi‑scale sinc convolution over token embeddings to interpolate between pooled and per‑token matching. (arxiv.org)
Recent developments or changes
- May 23–24, 2026 preprints: SMART reports consistent lifts on multimodal/visual‑doc retrieval; Spectral Retrieval shows large improvements on controlled and small real benchmarks without retraining. Low‑risk to prototype as a re‑ranker atop existing dense/LSR stages. (arxiv.org)