Standards for tracing, monitoring, and evaluating AI applications in production.
You cannot improve what you cannot measure. Observability is the "Black Box Recorder" for your AI system.
Tracing visualizes the entire execution chain provided by the AI.
| Metric | Definition | Why it matters |
|---|---|---|
| TTFT (Time to First Token) | Time from user enter to first word appearing. | User perception of speed. Target < 1.5s. |
| Total Latency | Time to complete the full answer. | Overall system performance. |
| Token Usage | Input + Output tokens per query. | Direct cost correlation. |
| Retrieval Score | Similarity score of top chunk. | Low score = "I don't know" or missing data. |
The most valuable data is user feedback.
# Phoenix OTLP setup
from phoenix.otel import register
tracer_provider = register(
project_name="my-rag-app",
endpoint="http://localhost:6006/v1/traces"
)
| Need | Skill |
|---|---|
| RAG pipeline to observe | rag-patterns |
| Evaluation framework (offline metrics) | rag-evaluation |
| Cache hit/miss monitoring | semantic-cache |
| Security alerting | ai-security |