[read-only] Analyze trends across pipeline runs -- quality trajectory, agent effectiveness, cost analysis, convergence patterns, memory health. Use when you want to understand how pipeline quality has evolved, identify cost optimization opportunities, or review agent and memory effectiveness across runs.
Analyze trends across pipeline runs to surface actionable insights about quality, cost, agent behavior, and pipeline health.
See shared/skill-contract.md for the standard exit-code table.
Before any action, verify:
git rev-parse --show-toplevel 2>/dev/null. If fails: report "Not a git repository. Navigate to a project directory." and STOP..claude/forge.local.md exists. If not: report "Forge not initialized. Run /forge-init first." and STOP..forge/reports/ with report files.forge/state.json with telemetry data.claude/forge-log.md with run entries
If none exist: report "No pipeline run data found. Run /forge-run to generate data, then try again." and STOP.Read all available data sources:
.forge/reports/*.json or .forge/reports/*.md): per-run summaries including scores, findings, timings, agent dispatches..forge/state.json → telemetry): current/last run metrics — token usage, wall time, stage durations, agent dispatch counts..claude/forge-log.md): human-readable run history with dates, requirements, scores, verdicts, and retrospective notes.shared/learnings/ and .forge/learnings/): accumulated patterns, PREEMPT items, agent effectiveness records..forge/state.json → score_history): per-iteration score progression within the current/last run.If a source is unavailable, skip it and note which categories will have incomplete data.
Analyze score trends across runs:
### Quality Trajectory
| Run | Date | Score | Verdict | CRITICALs | WARNINGs |
|-----|------|-------|---------|-----------|----------|
| {n} | {date} | {score} | {verdict} | {count} | {count} |
**Trend:** {improving/declining/stable} ({delta} over {n} runs)
**Recurring Findings (3+ runs):**
| Category | Occurrences | Last Seen | Suggestion |
|----------|-------------|-----------|------------|
| {cat} | {n} | {date} | {codify as convention / investigate root cause} |
Analyze which agents contribute most to quality improvement:
### Agent Effectiveness
| Agent | Dispatches | Avg Findings | Score Impact | FP Rate |
|-------|-----------|-------------|-------------|---------|
| {agent} | {n} | {avg} | {delta} | {pct}% |
**Most impactful:** {agent} — avg {delta} point improvement per dispatch
**Least triggered:** {agent} — {n} findings across {m} runs
**Mutation kill rate:** {pct}% (trend: {direction})
Analyze resource consumption and cost efficiency:
.forge/trust.json model_efficiency. Stages without score impact are reported separately as overhead.Sources: state.json.cost, state.json.tokens, .forge/trust.json model_efficiency, state.json.cost_alerting.
Recommendation generation:
### Cost Analysis
#### Per-Run Cost Trend
| Run | Date | Tokens | Est. Cost | Score | Cost/Point | Budget Used |
|-----|------|--------|-----------|-------|------------|-------------|
#### Per-Stage Cost Breakdown
| Stage | Avg Tokens | Avg Cost | % of Total | Trend |
|-------|-----------|----------|-----------|-------|
#### Cost-Per-Quality-Point (Efficiency)
| Stage | Tier | Tokens/Point | Runs | Suggestion |
|-------|------|-------------|------|------------|
#### Model Tier Distribution
| Tier | Dispatches | Tokens | % of Total | Avg Cost |
|------|-----------|--------|-----------|----------|
#### Budget Utilization
| Run | Ceiling | Used | % | Alerts Triggered |
|-----|---------|------|---|-----------------|
#### Top-3 Cost Recommendations
| # | Recommendation | Expected Savings | Confidence |
|---|---------------|-----------------|------------|
Analyze how efficiently the pipeline converges to shipping quality:
### Convergence Patterns
| Metric | Value |
|--------|-------|
| Avg iterations to ship | {n} |
| First-pass success rate | {pct}% |
| Safety gate failure rate | {pct}% |
| Most common plateau cause | {cause} |
**Iteration Distribution:**
| Iterations | Runs | % |
|-----------|------|---|
| 1-2 | {n} | {pct}% |
| 3-5 | {n} | {pct}% |
| 6+ | {n} | {pct}% |
Analyze the accumulated knowledge base:
### Memory Health
**PREEMPT Items:**
| Priority | Active | Applied (last 5 runs) | Decay Candidates |
|----------|--------|-----------------------|------------------|
| HIGH | {n} | {n} | {n} |
| MEDIUM | {n} | {n} | {n} |
| LOW | {n} | {n} | {n} |
| ARCHIVED | {n} | — | — |
**Pattern Discovery:**
- Total auto-discovered patterns: {n}
- Applied in subsequent runs: {n} ({pct}%)
- Never applied: {n} (review for removal)
**Learnings Growth:**
- Total learnings files: {n}
- New entries (last 5 runs): {n}
- Most active category: {category}
Analyze token savings and compression compliance:
state.json.tokens.output_tokens_per_agent against the expected range for their stage's compression level. Estimate tokens saved relative to verbose baseline using the stage-level token ranges from shared/output-compression.md: verbose 800-2000, standard 800-2000, terse 400-1200, minimal 100-600.state.json.tokens.compression_level_distribution. Highlight if distribution is skewed (e.g., 90% verbose suggests misconfiguration or output_compression.enabled: false).terse (expected 400-1200 tokens) producing 1800 tokens is drifting./forge-compress has been run (detect via *.original.md backup files in agents/), compute before/after line counts and estimated token savings using wc -l..forge/caveman-mode), which level, and how many sessions used it (from .forge/events.jsonl if available).### Compression Effectiveness
**Output Compression:**
| Metric | Value |
|--------|-------|
| Dispatches at verbose | {n} |
| Dispatches at standard | {n} |
| Dispatches at terse | {n} |
| Dispatches at minimal | {n} |
| Estimated output tokens saved | {n} ({pct}% vs all-verbose baseline) |
**Drift Alerts:**
| Agent | Stage Level | Expected Range | Actual Tokens | Status |
|-------|------------|----------------|---------------|--------|
| {agent} | terse | 400-1200 | {n} | DRIFT / OK |
**Input Compression:**
| Scope | Files | Before (lines) | After (lines) | Reduction |
|-------|-------|-----------------|---------------|-----------|
| agents/ | {n} | {n} | {n} | {pct}% |
**Caveman Mode:** {off/lite/full/ultra}
Synthesize the six categories into actionable recommendations:
## Pipeline Insights Report
**Project:** {project name}
**Runs analyzed:** {count}
**Date range:** {earliest} to {latest}
{Category 1-6 sections as above}
### Recommendations
| Priority | Action | Category | Expected Impact |
|----------|--------|----------|-----------------|
| {1-N} | {specific action} | {category} | {what improves} |
Prioritize recommendations by expected impact:
Write the full report to .forge/reports/insights-{date}.md where {date} is today in YYYY-MM-DD format. If the reports directory does not exist, create it.
.forge/reports/.| Condition | Action |
|---|---|
| Prerequisites fail | Report specific error message and STOP |
| No run data available | Report "No pipeline run data found. Run /forge-run to generate data, then try again." and STOP |
| Only one data source available | Generate partial report and note which categories have insufficient data |
| Fewer than 3 runs | Note that trend analysis requires more data points. Focus on single-run metrics |
| Report directory does not exist | Create .forge/reports/ before writing the report |
| Data source unparseable | Skip the malformed source, log WARNING, continue with remaining sources |
| State corruption | This skill reads state.json for telemetry but does not depend on valid pipeline state |
/forge-history -- View run history with scores and verdicts (simpler than insights)/forge-profile -- Detailed performance profiling of a single pipeline run/forge-status -- Check current pipeline run state/forge-recover diagnose -- Diagnose pipeline health issues for the current run