Use when retrieved results are irrelevant, search returns too many or too few results, deciding how many results to retrieve for RAG, adding filters to retrieval, using hybrid search for RAG, setting a score threshold, query expansion isn't helping, retrieval misses obvious matches, or choosing between dense and sparse retrieval for a RAG use case.
Dense retrieval alone fails on exact-match queries (product codes, names, version numbers). Sparse retrieval alone fails on semantic or paraphrased queries. Most RAG pipelines need both. Start with the failure mode, not the architecture.
Use when: answers are wrong and you aren't sure if retrieval or generation is the problem.
Use when: semantic search isn't finding obvious matches.
limit to 20 and inspect rank positions before cutting back; the correct chunk may exist but rank at position 15with_payload: true on search responses so you can inspect which documents are being retrievedUse when: queries include product names, IDs, acronyms, or technical terms that dense search misses.
query_by_interface: prefetch to run both vector searches in parallel then merge results with Reciprocal Rank Fusion (RRF)rank_constant to 60 as a starting point; lower values weight top results more heavilyUse when: results span multiple tenants, document types, or time periods that shouldn't mix.
must filter conditions before vector search, not after; Qdrant evaluates filters during HNSW traversal — qdrant.tech/documentation/concepts/filteringshould filters for optional boosts (e.g., prefer recent documents), not mustUse when: low-relevance results are making it into the LLM context.
score_threshold only after measuring score distributions on representative queries; a threshold that works on test data often fails on production queriesUse when: short or ambiguous queries produce poor results.
limit to 3 without measuring whether the correct chunk is in positions 4–10