Real-time latency monitoring and performance dashboard for PipeCat voice agent. Use when: (1) Monitoring active calls, (2) Debugging slow responses, (3) Analyzing latency patterns, (4) Identifying bottlenecks, (5) Reviewing historical performance
Real-time latency monitoring, bottleneck identification, and performance optimization.
/latency-report — Summary of recent call latencies/latency-report [call_id] — Latency breakdown for specific call/latency-report --live — Real-time latency during active call/latency-report --daily — Daily latency summary/latency-bottleneck — Identify current bottleneck component/latency-tune — Get tuning recommendationsreferences/latency-thresholds.md — Alert thresholds and optimization targetsThe voice agent pipeline has four main latency components:
User speaks → [STT] → [LLM] → [TTS] → Bot speaks
↓ ↓ ↓
~150ms ~500ms ~100ms
| Component | Service | Target | Warning | Fail |
|---|---|---|---|---|
| STT | AssemblyAI Universal-2 | <150ms | <250ms | >400ms |
| LLM | Claude Opus 4.5 | <500ms | <800ms | >1200ms |
| TTS | ElevenLabs Flash v2.5 | <100ms | <200ms | >350ms |
| Total | End-to-end | <1000ms | <1500ms | >2000ms |
User: /latency-report
Claude: Fetching latency data from last 24 hours...
PipeCat Latency Report (Last 24 Hours)
═══════════════════════════════════════
Total Calls: 87
Avg Response: 847ms
Component Breakdown:
STT (AssemblyAI) 142ms avg [██████████░░░░░░░░░░] PASS
LLM (Claude) 512ms avg [████████████████░░░░] PASS
TTS (ElevenLabs) 98ms avg [████████░░░░░░░░░░░░] PASS
VAD/Other 95ms avg [████████░░░░░░░░░░░░]
Percentiles:
P50: 725ms (Excellent)
P90: 1,102ms (Good)
P99: 1,892ms (Warning)
No latency alerts in the last 24 hours.
User: /latency-report call_abc123
Claude: Analyzing latency for call_abc123...
Call Latency Report: call_abc123
════════════════════════════════
Duration: 2:34
Turns: 12
Avg Response: 892ms
Per-Turn Latency:
Turn 1: 1,245ms [STT: 165 LLM: 890 TTS: 108 Other: 82] ← First response
Turn 2: 712ms [STT: 132 LLM: 445 TTS: 89 Other: 46]
Turn 3: 856ms [STT: 148 LLM: 567 TTS: 102 Other: 39]
Turn 4: 698ms [STT: 128 LLM: 432 TTS: 94 Other: 44]
Turn 5: 1,105ms [STT: 142 LLM: 812 TTS: 98 Other: 53] ← Spike
Turn 6: 723ms [STT: 135 LLM: 456 TTS: 91 Other: 41]
...
Anomalies Detected:
⚠ Turn 1: First response latency expected to be higher
⚠ Turn 5: LLM spike (812ms vs 485ms avg)
First Response: 1,245ms (Target: <1,200ms) ⚠
Component Summary:
STT: 142ms avg (PASS - target <150ms)
LLM: 572ms avg (WARN - target <500ms)
TTS: 96ms avg (PASS - target <100ms)
Recommendation: LLM latency slightly elevated. Consider:
1. Reducing max_tokens from 300 to 200
2. Simplifying system prompt (currently 3,456 chars)
User: /latency-report --live
Claude: Starting live latency monitor...
🔴 LIVE - Call: call_xyz789
═══════════════════════════
Turn 1 [0:03] Response: 1,156ms
STT: 145ms LLM: 865ms TTS: 98ms
Turn 2 [0:12] Response: 734ms
STT: 138ms LLM: 467ms TTS: 91ms
Turn 3 [0:20] Response: 812ms
STT: 152ms LLM: 523ms TTS: 94ms
Avg so far: 901ms
Status: GOOD
[Press Ctrl+C to stop monitoring]
User: /latency-report --daily
Claude: Generating daily latency summary...
Daily Latency Summary: 2024-01-15
═════════════════════════════════
Calls: 127
Total Minutes: 312
Hourly Breakdown:
08:00 23 calls Avg: 823ms [████████░░]
09:00 31 calls Avg: 856ms [████████░░]
10:00 28 calls Avg: 912ms [█████████░]
11:00 19 calls Avg: 798ms [████████░░]
12:00 12 calls Avg: 745ms [███████░░░]
13:00 8 calls Avg: 921ms [█████████░]
14:00 6 calls Avg: 1,045ms [██████████] ← Peak
Peak Hour: 14:00 (1,045ms avg)
Likely cause: Claude API load
Best Hour: 12:00 (745ms avg)
Trend: Stable (no significant change from yesterday)
User: /latency-bottleneck
Claude: Analyzing bottleneck patterns...
Bottleneck Analysis (Last 100 Calls)
════════════════════════════════════
Primary Bottleneck: LLM (Claude)
Accounts for 58% of total latency
Average: 512ms (target: 500ms)
Secondary: STT (AssemblyAI)
Accounts for 17% of total latency
Average: 142ms (target: 150ms)
Distribution:
LLM: [██████████████████████████████] 58%
TTS: [████████████░░░░░░░░░░░░░░░░░░] 12%
STT: [█████████░░░░░░░░░░░░░░░░░░░░░] 17%
Other: [███████░░░░░░░░░░░░░░░░░░░░░░░] 13%
LLM Latency Breakdown:
First token (TTFT): 380ms avg
Token generation: 132ms avg
Optimization Priority:
1. Reduce system prompt length
2. Lower max_tokens if possible
3. Consider model downgrade for non-critical calls
User: /latency-tune
Claude: Analyzing current configuration for optimization...
Tuning Recommendations
══════════════════════
Current Config:
Model: claude-opus-4-5-20251101
Max tokens: 300
System prompt: 3,456 chars
VAD stop_secs: 0.3
Recommendations:
1. SYSTEM PROMPT LENGTH (Medium Impact)
Current: 3,456 chars → Suggested: <2,500 chars
Expected savings: 50-100ms per turn
Action: Remove redundant examples, condense phase guidance
2. MAX_TOKENS (Low Impact)
Current: 300 → Suggested: 200
Expected savings: 20-50ms per turn
Risk: May truncate longer responses
Action: Test with 200, monitor truncation rate
3. VAD STOP_SECS (UX Impact)
Current: 0.3s → Keep current
Note: Lower values cause more interruptions
Recommendation: Keep at 0.3 for natural turn-taking
4. MODEL SELECTION (Trade-off)
Current: Opus 4.5 → Could use Sonnet 4
Expected savings: 200-300ms per turn
Trade-off: Lower reasoning quality for complex scenarios
Recommendation: A/B test with Sonnet for simple phases
Estimated Total Savings: 70-150ms per turn
Current: 847ms avg → Projected: 700-770ms avg
The dashboard monitors for latency issues and generates alerts:
| Alert Type | Condition | Action |
|---|---|---|
| Latency Spike | 3+ consecutive >1500ms | Notify, log |
| Component Failure | Service timeout >5s | Notify, failover |
| Degradation | P90 increases >30% | Notify, investigate |
| First Response | >2500ms | Log for review |
# Quick summary
python skills/pipecat-dashboard/scripts/latency_report.py
# Specific call
python skills/pipecat-dashboard/scripts/latency_report.py --call-id call_abc123
# Daily report
python skills/pipecat-dashboard/scripts/latency_report.py --daily
# JSON output
python skills/pipecat-dashboard/scripts/latency_report.py --json
# Bottleneck analysis
python skills/pipecat-dashboard/scripts/latency_report.py --bottleneck
Latency metrics are stored in conversation_transcripts.metrics:
{
"avg_response_latency_ms": 847,
"first_response_latency_ms": 1245,
"response_latencies_ms": [1245, 712, 856, ...],
"stt_latencies_ms": [165, 132, 148, ...],
"llm_latencies_ms": [890, 445, 567, ...],
"tts_latencies_ms": [108, 89, 102, ...],
"interruption_count": 2
}