Odyssey RAG project-specific patterns, anti-patterns, and implementation guides. Use when working on the Odyssey RAG retrieval system, ingestion pipeline, MCP server, or any component of the knowledge retrieval platform.
Project-specific patterns discovered during comprehensive audit and implementation (2026-04-17).
"odyssey-rag" — it is the canonical contract namec.tsvector_content for BM25 queries (not to_tsvector())run_in_executorWHERE d.is_current = TRUE in all retrieval queriesWhen adding optional filters to SQL queries, use the CAST/IS NULL pattern:
AND (CAST(:param AS TEXT) IS NULL OR d.column = CAST(:param AS TEXT))
Pass None for unset filters. The clause evaluates to TRUE and is skipped.
Used in: vector_search.py, bm25_search.py
PostgreSQL maintains tsvector_content via DB trigger with weighted sections:
c.tsvector_content with the GIN index idx_chunk_tsvector.
Anti-pattern: to_tsvector('english', c.content) bypasses the index entirely.loop = asyncio.get_running_loop()
scores = await loop.run_in_executor(None, self._model.predict, pairs)
When changing sync → async:
await)PassthroughReranker)SOURCE_TYPE_RULES is an ordered list of (regex, type) tuples. First match wins.
IPS_Annex_B) come first\.md$) come lastpayss?ett? for "Payset/Paysett/Paysset"When adding a new filter parameter, wire it through the full stack:
MCP tool (@mcp.tool) → handler (tools/search.py) → engine (tool_context) →
all_filters dict → vector_search() AND bm25_search()
Both search paths MUST receive the filter. Missing one creates silent inconsistency.
For HTTP services with auth middleware, insert health routes BEFORE the middleware:
routes = [Route("/health", health_handler)] + existing_routes
Docker healthcheck pattern: httpx.get("http://localhost:PORT/health")
supersede(source_path) marks old is_current=False, then inserts new docTool-specific retrieval behavior is defined in tool_strategies.py:
@dataclass
class ToolStrategy:
metadata_filters: dict[str, str]
source_type_boosts: dict[str, float]
require_source_types: list[str]
focus_filters: dict[str, dict[str, str]]
bm25_boost_terms: list[str]
Strategies are data-driven, not if/else chains. Add new strategies as dataclass instances.
| Table | Key Columns | Notes |
|---|---|---|
| document | id, source_path, source_type, integration, is_current, file_hash | One row per version |
| chunk | id, document_id, content, tsvector_content, section, subsection | GIN index on tsvector |
| chunk_embedding | chunk_id, embedding (Vector 768) | HNSW index |
| chunk_metadata | chunk_id, message_type, source_type, module_path | Domain-specific filtering |
| ingest_job | id, source_path, status, chunks_created | Pipeline tracking |
| feedback | id, query, chunk_ids, rating | Quality feedback |
PYTHONPATH=src .venv/bin/python -m pytest tests/unit/ -x.venv/bin/ruff check src/ tests/docker compose build && docker compose up -dvector_search.py AND bm25_search.pypayss?ett? not paysett? for brands with spelling variantsUser/AI → MCP Server (stdio/HTTP) → RetrievalEngine
├→ QueryProcessor
├→ Vector Search (pgvector HNSW)
├→ BM25 Search (PostgreSQL tsvector GIN)
├→ RRF Fusion (k=60)
├→ Tool Strategy (boosts/filters)
├→ Cross-Encoder Reranker
└→ ResponseBuilder → {evidence, gaps, followups}
Admin → FastAPI API → Ingestion Pipeline → DB
Created: 2026-04-17 | Last updated: 2026-04-17