Query Understanding: Why One Retrieval Path Is Never Enough
Treating every query identically wastes compute on easy queries and under-serves hard ones. Query classification decides which retrieval path to invoke.
A user types "apple m2 macbook air 15 under $1200" and your pipeline hands the reranker a bag of tokens with an embedding stapled to it. The structured intent (brand, product line, chipset, screen size, price ceiling) has already been flattened, and the reranker is left to patch up mistakes that should never have reached it. That is the failure mode query understanding exists to prevent, and it is why every serious hybrid system treats the layer between the raw string and retrieval as load-bearing rather than decorative.
No single retrieval strategy dominates
The empirical case for routing rests on a boring but well-replicated result: no retrieval approach consistently wins across query types. On BEIR's 18 datasets, BM25 was more robust out of domain while dense retrievers excelled in-domain, and updated reproducible baselines confirmed that no single model dominated all datasets (Thakur et al., 2021; Kamalloo et al., 2024). A static retrieval strategy is suboptimal for any system that serves diverse queries.
A BERT-based classifier can select between sparse-only, dense-only, or hybrid retrieval per query using only the query text, improving both efficiency and effectiveness compared to always using one strategy (Arabzadeh et al., 2021). More ambitious routers in the RAG setting predict query complexity and send easy queries to no-retrieval, medium queries to single-step retrieval, and hard queries to iterative multi-step retrieval (Jeong et al., 2024). In all of these, the gain comes from matching the pipeline to the query, not from running a bigger model.
More than one knob to turn
Query understanding is not a single NLP model. It is a small collection of cooperating components, some of which clean the input, some of which extract structure, some of which enrich vocabulary, and one of which collapses the accumulated signal into a routing decision. The mix and the emphasis shift by domain. A product catalog leans on attribute extraction and exact identifiers. A knowledge base leans on expansion and intent signal. The useful framing is not a fixed list but a design surface: several independent decisions sit between the raw query and retrieval, and each one can be tuned, measured, and replaced.
What matters architecturally is that these decisions are first-class. They have inputs, outputs, and failure modes that can be logged and evaluated. Folding them into a single opaque preprocessing step is how teams end up unable to explain why a query that worked last quarter stopped working this one.
The question that remains
Accepting that no retrieval approach dominates and that multiple query-understanding signals exist does not, by itself, tell you how to build the system. It raises a sharper question, the one a production team actually has to answer.
How do you decide which retrieval path to invoke for a given query? Where does that decision live in the pipeline: inline with the retrievers, as a dedicated stage upstream, or distributed across the components that enrich the query? And once the decision exists, how do you evaluate it independently of the retrievers it routes to, so that a regression in the router is not silently blamed on the dense index, and a regression in the dense index is not silently blamed on the router?
These are not rhetorical questions. They have concrete answers, and the answers shape the rest of the pipeline. A router that lives inside the dense retriever is a different system from a router that lives in front of both retrievers, and a router you cannot evaluate in isolation is a router you cannot improve. The routing classifier, whatever form it takes, is the architectural hinge between query understanding and retrieval, and the choices around its placement, its inputs, and its evaluation harness deserve more attention than they usually get.
That is the problem worth staring at.
Related chapter
Chapter 5: Query Understanding
Everything a retrieval pipeline does downstream is bounded by how well the system interprets the query up front. This chapter lays out the query understanding layer piece by piece, covering retrieval routing, intent classification, entity recognition, expansion, spell correction, and synonym handling, and then reframes the raw query log as a product-level feedback signal.
Get notified when the book launches.
Laszlo Csontos
Author of Designing Hybrid Search Systems. Works on search and retrieval systems, and writes about the engineering trade-offs involved in combining keyword and vector search.
Related Posts
Parallel, sequential, and unified hybrid search architectures are not interchangeable. They make different bets on latency, complexity, and debuggability.
March 30, 2026
A good reranker can fix a mediocre first-stage retriever, but only if it fits inside a tight latency budget. Pick the candidate set and the model together.
March 16, 2026
Every search platform now advertises hybrid support. The implementations behind those APIs differ, and so does the platform that is right for your team.
March 9, 2026