Name: Rag Engineering
Author: projectious-work

搵技能.../

Rag Engineering | Skills Pool

trafilatura

readability-lxml

Model	Dims	Context	Notes
`text-embedding-3-small`	1536	8191	Good default, cheap
`text-embedding-3-large`	3072	8191	Better quality, 2x cost
`nomic-embed-text`	768	8192	Open source, strong MTEB
`bge-large-en-v1.5`	1024	512	Open source, good for code
`voyage-code-2`	1536	16000	Best for code retrieval

Tuning the prompt to fix a retrieval problem. If the retrieved chunks are wrong, rewriting the generation prompt will not fix the underlying retrieval failure — it just masks symptoms. Isolate which stage is failing (retrieval vs. generation) before making changes. Never tune the prompt to compensate for poor retrieval.
Chunks too large relative to query granularity. A 2,000-token chunk contains many facts. The embedding represents the average of all of them, so it will be a mediocre match for any specific fact inside the chunk. Match chunk size to the granularity of expected queries — FAQ answers need small chunks; technical explanations need larger ones.
Pure dense retrieval for catalogs with exact identifiers. Dense semantic search fails on product SKUs, version numbers, proper nouns, and acronyms. These need exact term matching via sparse (BM25) retrieval. Hybrid dense + sparse is the correct default for production.
Stuffing low-scoring retrieved chunks into the context. When no relevant content exists, returning the top-k chunks regardless of relevance score causes hallucinations — the model will try to construct an answer from irrelevant content. Set a similarity threshold; if no chunk passes it, tell the LLM explicitly that no relevant information was found.
No faithfulness evaluation before production. A RAG system that produces fluent, confident answers that contradict the retrieved context is worse than one that says "I don't know." Measure faithfulness (answer grounded in context?) from the first prototype, not as an afterthought.
Re-indexing for every iteration when debugging chunking. Rebuilding the full vector index to test every chunking change is expensive and slow. Maintain a small representative test corpus that re-indexes in seconds; use the full corpus only for final validation.
Building without a golden eval set. Without a ground-truth set of queries and expected context passages, "does the retrieval work?" is a feeling, not a measurement. Build the eval set from real user questions before the first production deployment and run it after every change.

RRF(d) = sum( 1 / (k + rank_i(d)) ) for each retrieval method i
                                                 (k typically 60)

Stage 1: Retrieve top-50 with fast method (dense, hybrid)
Stage 2: Rerank to top-5 with cross-encoder

MMR = argmax[ lambda * sim(q, d) - (1 - lambda) * max(sim(d, d_selected)) ]

results = vector_store.similarity_search(
    query_embedding, k=10,
    filter={"source": "api-docs", "version": ">=3.0"},
)

Symptom	Likely cause	Fix
Wrong docs retrieved	Vocabulary mismatch	Add hybrid search
Right doc exists, not retrieved	Chunk too large, answer buried	Smaller chunks, parent-doc retrieval
Top-1 right, rest noise	No reranking	Add cross-encoder reranker
Same info repeated	Overlapping chunks, no dedup	MMR or dedup pass
Answer contradicts context	Hallucination / model prior	Stronger grounding prompt, better model
Vague generic answer	Context not used effectively	Reorder context, improve template
Refuses despite good context	Over-cautious system prompt	Relax prompt, check conflicts
Answers part of question	Low context recall	Multi-query, smaller chunks

Metric	Target
Context precision	> 0.8
Context recall	> 0.7
Faithfulness	> 0.9 (non-negotiable for production)
Answer relevancy	> 0.8

Rag Engineering

Intro

Overview

Architecture

Document ingestion

Rag Engineering

Intro

Overview

Architecture

Document ingestion

Chunking strategies

Embedding models

Vector stores

Retrieval strategies

Prompt construction

Evaluation

Gotchas

Full reference

Hybrid search and RRF

Reranking

MMR

Parent-document retrieval

Multi-query retrieval

Metadata filtering

Chunking guidelines

RAGAS targets

Failure-mode debugging

Worked scenarios

Clickhouse Io

Clickhouse Io

Claude Devfleet

Clickhouse Io

Ai First Engineering

Postgres Patterns