Semantic evaluation of slide plans against the source paper and content analysis. Covers contribution coverage, narrative flow, redundancy detection, PMRC arc coherence, and actionable improvement directions. Produces a scored assessment that feeds back to the planner for revision.
Evaluate a slide_outline.json holistically by reading the original paper (document.md), the structured content analysis (content_analysis.md), and the proposed plan. Produce a scored assessment with per-dimension reasoning and, when the score is below threshold, actionable improvement directions that the planner can use to revise.
This is an LLM-driven semantic evaluation — it judges meaning, not syntax.
Structural integrity checks (asset ID existence, figure/table separation) are
handled separately by the lightweight verify_plan tool; you do NOT need to
repeat those here.
| File | Purpose |
|---|---|
/docs/document.md | The full parsed paper — ground truth for claims and evidence |
/docs/content_analysis.md |
| PMRC-aligned analysis from the research agent |
/docs/slide_outline.json | The slide plan to evaluate |
/docs/assets_manifest.json | Available figures, tables, equations (for context) |
Read ALL four files before beginning evaluation.
Question: Does the plan cover every key contribution identified in content_analysis.md, and does it do so with appropriate depth?
Evaluation steps:
Scoring guide:
Question: Does the presentation tell a coherent story that builds understanding progressively, or does it feel like a disconnected list of facts?
Evaluation steps:
Scoring guide:
Question: Does every slide teach the audience something NEW, or are there slides that repeat information from other slides?
Evaluation steps:
Scoring guide:
Question: Does the plan follow the Problem → Method → Results → Conclusion framework with appropriate section allocation?
Evaluation steps:
Scoring guide:
Question: Would a conference audience understand the presentation? Are slides well-designed for knowledge transfer?
Evaluation steps:
Scoring guide:
Compute the average of all 5 dimension scores (rounded to 1 decimal).
Write your evaluation to /docs/plan_evaluation.md with this structure:
# Slide Plan Evaluation
## Overall Score: X.X / 10
## Dimension Scores
| Dimension | Score | Summary |
|---|---|---|
| Contribution Coverage | X/10 | [one sentence] |
| Narrative Flow | X/10 | [one sentence] |
| Redundancy | X/10 | [one sentence] |
| PMRC Arc | X/10 | [one sentence] |
| Audience Clarity | X/10 | [one sentence] |
## Detailed Reasoning
### Contribution Coverage
[2-3 sentences with specific references to which contributions are/aren't covered]
### Narrative Flow
[2-3 sentences about the story arc quality]
### Redundancy
[2-3 sentences identifying any specific duplications]
### PMRC Arc
[2-3 sentences about phase ordering and allocation]
### Audience Clarity
[2-3 sentences about slide design quality]
## Improvement Directions (only if score < 7)
[Numbered list of specific, actionable changes the planner should make.
Each direction should reference specific slide numbers and what to change.]
1. ...
2. ...
/docs/content_analysis.md first — this is your reference for what
the plan SHOULD cover (key_contributions, central_message, evidence_proof)./docs/slide_outline.json — this is what you're evaluating./docs/document.md — only if you need to verify a specific claim
or check whether the plan misrepresents something./docs/assets_manifest.json — to understand what visuals are available
and whether the plan uses them well.When score < 7, improvement directions must be:
Return a concise summary (≤ 200 words) to the orchestrator stating:
Do NOT return the full evaluation — the orchestrator can read the file if needed.