Generate a structured comparison report for Adapter Arena benchmark results. Use after all runner agents have completed their tasks and results need to be compiled into a comparative analysis.
Generate the Adapter Arena comparison report from collected benchmark results.
Use this skill after all runner agents have completed all benchmark tasks and posted their structured results as task comments.
# Adapter Arena Report
Generated: <timestamp>
Tasks completed: <N> / <total>
Adapters evaluated: <list>
| Category | Claude | Codex | Cursor | Gemini | OpenCode | Pi | OpenClaw |
|----------|--------|-------|--------|--------|----------|----|----------|
| Code Generation | X | X | X | X | X | X | X |
| Bug Fixing | X | X | X | X | X | X | X |
| Refactoring | X | X | X | X | X | X | X |
| Code Review | X | X | X | X | X | X | X |
| Test Writing | X | X | X | X | X | X | X |
| Documentation | X | X | X | X | X | X | X |
| Multi-file Changes | X | X | X | X | X | X | X |
| Debugging | X | X | X | X | X | X | X |
| Architecture | X | X | X | X | X | X | X |
| Performance Optimization | X | X | X | X | X | X | X |
| **Average** | **X.X** | **X.X** | **X.X** | **X.X** | **X.X** | **X.X** | **X.X** |
For each of the 10 categories, include:
## <Category Name>
**Winner**: <adapter name> (score: X/5)
### Results by adapter
| Adapter | Score | Status | Notes |
|---------|-------|--------|-------|
| Claude | X/5 | PASS | ... |
| Codex | X/5 | PASS | ... |
| Cursor | X/5 | PASS | ... |
| Gemini | X/5 | PASS | ... |
| OpenCode | X/5 | PASS | ... |
| Pi | X/5 | PASS | ... |
| OpenClaw | X/5 | PASS | ... |
### Observations
<qualitative comparison of approaches taken by different adapters>
## Overall Rankings
| Rank | Adapter | Average Score | Strengths | Weaknesses |
|------|---------|---------------|-----------|------------|
| 1 | ... | X.X | ... | ... |
| 2 | ... | X.X | ... | ... |
| 3 | ... | X.X | ... | ... |
| 4 | ... | X.X | ... | ... |
| 5 | ... | X.X | ... | ... |
| 6 | ... | X.X | ... | ... |
| 7 | ... | X.X | ... | ... |
Apply these criteria uniformly across all tasks and adapters:
For each task result, consider: