Vector Search Always Returns Something, Even When It Should Not
A nearest neighbor always exists, so vector search never returns zero results. That is precisely why its failures are so hard to notice.
One of the quietly dangerous properties of a vector index is that it cannot return nothing. There is always a closest point in the embedding space, no matter how malformed, adversarial, or off-topic the query. That property makes vector search feel robust on the surface and makes its failures almost impossible to detect with the monitoring practices teams inherit from the keyword-search era. The rest of the book treats this as a design constraint, not a footnote.
The nearest neighbor is not always a relevant neighbor
Dense retrieval decisively beats BM25 on aggregate benchmarks. On the 15-dataset BEIR subset of MTEB, NV-Embed-v2 reaches an average nDCG@10 around 62.65, roughly 20 points above a BM25 baseline on the same datasets (Lee et al., 2025; Kamalloo et al., 2024). The aggregate gap has not just closed, it has reversed. That success hides a different failure surface.
The first failure is exact-match degradation. On EntityQuestions, a benchmark of simple entity-rich questions, DPR drastically underperformed BM25 overall (Sciavolino et al., 2021). Within that gap, performance dropped sharply for rare or uncommon entities, while BM25 stayed robust regardless of entity frequency, and data augmentation did not close the generalization gap. Scaling up only partially helps. On CapRetrieval, BM25 still beats dense embeddings on about 40% of singleton-entity queries, even when the aggregate metric favors the dense model (Xu et al., 2025). Embeddings compress the entire query into a fixed-length vector, and precise identifiers lose their distinctiveness in that compression.
Negation, confusion, and the illusion of similarity
The second family of failures is semantic, not lexical. Most production embedding models are insensitive to negation: the cosine similarity between "The treatment improved patient outcomes" and "The treatment did not improve patient outcomes" has been measured at 0.96, and the original and negated forms of a sentence sit "very close to each other in the representation space" in state-of-the-art text embedding models (Cao 2025; Sack 2025). Numerical constraints fare no better. A query for "hotels under 400 luxury suites, because the embedding captures the topic (hotels, pricing) while discarding the "less than" relationship between the numbers; surrounding context tends to attenuate, not strengthen, those fine-grained numerical signals (Borkakoty 2025). Polysemy shows the same pattern from a different angle: "Apple stock price" and "apple nutrition facts" share a surface form, and in a bi-encoder the query is compressed into a single vector before it ever sees a document, so whichever sense dominates the training data pulls the embedding in its direction.
The third failure is what Xu et al. call the granularity dilemma: training an encoder for fine-grained keyword matching damages its grasp of overall semantics, and training for semantic matching costs fine-grained discrimination. This is not a tuning problem. It is an inherent tension between the capacity of a fixed-length vector and the diversity of information it must encode. Bigger models do not automatically win.
What this means for your system
A search system that only returns lexical misses and a search system that only returns semantic misses look very different in the logs. Pure keyword search has a visible zero-result rate. Pure vector search hides its failures inside confidently ranked, subtly wrong results. If your dashboards only track empty result sets, you will miss the second class entirely.
The deeper problem is that the two approaches fail in complementary, mutually invisible ways. Keyword search is blind to synonymy and paraphrase; vector search is blind to exact identifiers, negation, and numerical constraints, and it hides that blindness behind plausible-looking neighbors. Even with modern embeddings dominating aggregate benchmarks, BM25 still wins on roughly 40% of individual entity-centric queries (Xu et al., 2025), and each of those losses is a user-facing error the vector system will not flag on its own.
Related chapter
Chapter 2: The Limits of Vector Search
Vector search promises to solve the vocabulary mismatch problem by matching meaning instead of words, and on aggregate benchmarks it has decisively overtaken BM25. In exchange, it introduces a different category of failures (entity confusion, hallucinated similarity, and blindness to negation) whose silent nature often makes them more difficult to surface and correct than the ones BM25 creates.
You will receive the introduction and the first two chapters in PDF.
Laszlo Csontos
Author of Designing Hybrid Search Systems. Works on search and retrieval systems, and writes about the engineering trade-offs involved in combining keyword and vector search.
Related Posts
Users pick the same word for the same concept less than 20% of the time. Keyword search cannot bridge that gap, and most systems never measure it.
April 20, 2026
Hybrid retrieval consistently outperforms keyword-only and vector-only on public benchmarks, but the case for hybrid is about complementary failure sets, not averages.
April 6, 2026