Blog
Short pieces on hybrid search, embeddings, reranking, and retrieval evaluation. Each post relates to a chapter of Designing Hybrid Search Systems.
Users pick the same word for the same concept less than 20% of the time. Keyword search cannot bridge that gap, and most systems never measure it.
April 20, 2026
A nearest neighbor always exists, so vector search never returns zero results. That is precisely why its failures are so hard to notice.
April 13, 2026
Hybrid retrieval consistently outperforms keyword-only and vector-only on public benchmarks, but the case for hybrid is about complementary failure sets, not averages.
April 6, 2026
Parallel, sequential, and unified hybrid search architectures are not interchangeable. They make different bets on latency, complexity, and debuggability.
March 30, 2026
Treating every query identically wastes compute on easy queries and under-serves hard ones. Query classification decides which retrieval path to invoke.
March 23, 2026
A good reranker can fix a mediocre first-stage retriever, but only if it fits inside a tight latency budget. Pick the candidate set and the model together.
March 16, 2026
Every search platform now advertises hybrid support. The implementations behind those APIs differ, and so does the platform that is right for your team.
March 9, 2026
Benchmark leaderboards are a starting point for picking an embedding model, not a decision. Domain fit matters more than MTEB rank, and the commitment is harder to reverse than it looks.
March 2, 2026
Given fixed positives, the choice of negatives is the highest-leverage lever left in embedding fine-tuning, and getting it wrong quietly poisons retrieval quality.
February 23, 2026
Distilling a large cross-encoder into a smaller model approximates its quality at a fraction of the serving cost. It is how most production rerankers actually get deployed.
February 16, 2026
NDCG, MRR, and recall do not measure the same thing. Picking a single metric to optimize guarantees you will ship something that regresses on the others.
February 9, 2026
LLM judges rank far-apart systems reliably and collapse exactly where leaderboard rank matters most. The evidence, and what it means for offline evaluation.
February 2, 2026
Interleaving experiments detect ranking quality differences with far fewer users than a standard A/B test, which matters when your traffic is limited or your effects are small.
January 26, 2026
In a hybrid pipeline at scale, embedding computation and the vector index dominate cost. Stale embeddings are a second, quieter bill your users pay in quality.
January 19, 2026
The three HNSW knobs (M, efConstruction, efSearch) move your recall-latency curve more than most teams realize. Pick defaults deliberately, not because the library shipped them.
January 12, 2026
A search system's quality can degrade without the latency, error rate, or any other standard dashboard noticing. Embedding drift monitoring is one of the pieces that standard ML observability tends to miss.
January 5, 2026
Stacked compression can cut vector index RAM by up to 192x, but the quality losses are non-additive. A validation workflow is the only way to find the Pareto point.
December 29, 2025
Upgrading the LLM is the most visible decision in a RAG pipeline. Upgrading retrieval usually moves output quality more, and for less money.
December 22, 2025
A product query like 'red running shoes under $120' mixes semantic intent with exact filtering. Neither pure keyword nor pure vector search handles that, and hybrid is only a partial answer.
December 15, 2025
Enforcing document-level access control inside a vector search index is an architectural decision, not a runtime filter. Getting it wrong leaks data in subtle ways.
December 8, 2025