Vector Search Always Returns Something, Even When It Should Not

One of the quietly dangerous properties of a vector index is that it cannot return nothing. There is always a closest point in the embedding space, no matter how malformed, adversarial, or off-topic the query. That property makes vector search feel robust on the surface and makes its failures almost impossible to detect with the monitoring practices teams inherit from the keyword-search era. The rest of the book treats this as a design constraint, not a footnote.

The nearest neighbor is not always a relevant neighbor

Dense retrieval decisively beats BM25 on aggregate benchmarks. On the 15-dataset BEIR subset of MTEB, NV-Embed-v2 reaches an average nDCG@10 around 62.65, roughly 20 points above a BM25 baseline on the same datasets (Lee et al., 2025; Kamalloo et al., 2024). The aggregate gap has not just closed, it has reversed. That success hides a different failure surface.

The first failure is exact-match degradation. On EntityQuestions, a benchmark of simple entity-rich questions, DPR drastically underperformed BM25 overall (Sciavolino et al., 2021). Within that gap, performance dropped sharply for rare or uncommon entities, while BM25 stayed robust regardless of entity frequency, and data augmentation did not close the generalization gap. Scaling up only partially helps. On CapRetrieval, BM25 still beats dense embeddings on about 40% of singleton-entity queries, even when the aggregate metric favors the dense model (Xu et al., 2025). Embeddings compress the entire query into a fixed-length vector, and precise identifiers lose their distinctiveness in that compression.

Negation, confusion, and the illusion of similarity

The second family of failures is semantic, not lexical. Most production embedding models are insensitive to negation: the cosine similarity between "The treatment improved patient outcomes" and "The treatment did not improve patient outcomes" has been measured at 0.96, and the original and negated forms of a sentence sit "very close to each other in the representation space" in state-of-the-art text embedding models (Cao 2025; Sack 2025). Numerical constraints fare no better. A query for "hotels under $200" will happily return documents about$ 400 luxury suites, because the embedding captures the topic (hotels, pricing) while discarding the "less than" relationship between the numbers; surrounding context tends to attenuate, not strengthen, those fine-grained numerical signals (Borkakoty 2025). Polysemy shows the same pattern from a different angle: "Apple stock price" and "apple nutrition facts" share a surface form, and in a bi-encoder the query is compressed into a single vector before it ever sees a document, so whichever sense dominates the training data pulls the embedding in its direction.

The third failure is what Xu et al. call the granularity dilemma: training an encoder for fine-grained keyword matching damages its grasp of overall semantics, and training for semantic matching costs fine-grained discrimination. This is not a tuning problem. It is an inherent tension between the capacity of a fixed-length vector and the diversity of information it must encode. Bigger models do not automatically win.

What this means for your system

A search system that only returns lexical misses and a search system that only returns semantic misses look very different in the logs. Pure keyword search has a visible zero-result rate. Pure vector search hides its failures inside confidently ranked, subtly wrong results. If your dashboards only track empty result sets, you will miss the second class entirely.

The deeper problem is that the two approaches fail in complementary, mutually invisible ways. Keyword search is blind to synonymy and paraphrase; vector search is blind to exact identifiers, negation, and numerical constraints, and it hides that blindness behind plausible-looking neighbors. Even with modern embeddings dominating aggregate benchmarks, BM25 still wins on roughly 40% of individual entity-centric queries (Xu et al., 2025), and each of those losses is a user-facing error the vector system will not flag on its own.

Vector Search Always Returns Something, Even When It Should Not

The nearest neighbor is not always a relevant neighbor

Negation, confusion, and the illusion of similarity

What this means for your system

Chapter 2: The Limits of Vector Search

Laszlo Csontos

Related Posts