Use when experimental results are unexpected, confusing, or need interpretation
You are diagnosing empirical results — finding patterns in errors, generating hypotheses about root causes, and assessing whether the results mean what they appear to mean. This is the analytical complement to /synthesize (which works across accumulated findings); /diagnose works within one result set.
The argument is a path to results (CSV, log entry, analysis output) or a description of what to examine. Read the data first.
/diagnose when you have empirical results (CSVs, metrics, error logs) and want to understand what they mean — error patterns, root-cause hypotheses, validity assessment./postmortem when the problem is not "what do the results mean?" but "why did an agent report flawed results as correct?" Postmortem analyzes reasoning failures; diagnose analyzes data./review metrics when you suspect the metrics themselves may be degenerate or misleading before interpreting the results. checks whether results are interpretable; interprets them./review/diagnoseDo not start with individual examples. Start with the distribution:
If the data is in CSV or structured format, use python to compute breakdowns rather than eyeballing.
For each systematic error pattern found in Step 2, generate candidate explanations. For each hypothesis:
Resist the temptation to attribute everything to the model (L1). Most errors in automated systems come from workflow (L2), interface (L3), or methodology (L4).
Before interpreting the results as meaningful, check:
Based on the diagnosis, recommend concrete actions:
/design for designing them)If any root-cause hypothesis attributed to L1 (Model) is rated "high" plausibility, record it in the same turn.
Openakari does not ship a shared model capability registry. Record model-specific limits in one of:
If the L1 hypothesis is only "medium" plausibility, note it in the diagnosis output but avoid turning it into a hard rule until confirmed.
Skip this step if no root-cause hypothesis involves L1, or if all L1 hypotheses are low plausibility.
## Diagnosis: <what was examined>
CI layers involved: <L1-L5>
Date: YYYY-MM-DD
### Error distribution
<rates, breakdowns, error type categorization — with specific numbers>
### Systematic patterns
<numbered list of patterns with evidence>
### Root-cause hypotheses
#### Hypothesis 1: <testable claim>
Layer: <L1-L5>
Evidence for: <what supports this>
Evidence against: <what contradicts this>
Test: <what experiment would confirm/refute>
Plausibility: high | medium | low
[repeat for each hypothesis]
### Validity assessment
- Construct: <assessment>
- Statistical: <assessment>
- External: <assessment>
- Ground truth: <assessment>
### Recommended actions
- Quick wins: <bulleted>
- Experiments needed: <bulleted, with enough detail to feed into /design>
- Validity concerns: <bulleted>
- Avoid: <what not to do and why>
### Model-limit notes
<"Recorded model-specific limit: <what was added/changed>" or "No confirmed L1 root cause — skip">
Prioritize depth over breadth. One well-grounded hypothesis with clear evidence is worth more than five speculative ones.
Write the diagnosis to projects/<project>/diagnosis/diagnosis-<brief-slug>-YYYY-MM-DD.md. Create the diagnosis/ directory if it doesn't exist yet. Use the project whose results are being diagnosed.
After saving the diagnosis to disk, convert actionable recommendations to tasks:
[fleet-eligible] or [requires-opus] per fleet-eligibility checklist[skill: ...] tag matching the work typeDone when: from the recommendation's verification criteriaWhy: referencing this diagnosis file path/design for methodologyThis ensures diagnosis insights enter the task pipeline for fleet consumption rather than dying as ephemeral session output.
Follow docs/sops/commit-workflow.md. Commit message: diagnosis: <brief summary of what was diagnosed>