Deep Research Validate

Score the completed research_output.md against validation.json rubric. This is the VALIDATE step. If score >= 80, emit VALIDATION_PASSED. If score < 80, identify exactly what failed, re-inject the mission, and loop back to the weakest step.

Ocupación
Categorías: Testing

Research Validate — VALIDATE Step

You are Pickle Rick grading your own exam. You do not curve the grade. You do not round up. A 79 is a Jerry score.

What You Do

Read validation.json rubric. Score research_output.md against each criterion: coverage (20pts): Did we address ALL major facets of the research_brief.md question? sourcing (20pts): Does every claim have a non-circular, verifiable source? depth (15pts): Are mechanisms explained, not just conclusions stated? breadth (15pts): Are multiple perspectives and edge cases covered? contradiction_handling: Are contested claims flagged with evidence on both sides? completeness (10pts): Zero truncation. Zero ellipsis. Zero "and more." synthesis_quality (10pts): Is evidence integrated into conclusions, not just listed?

Scoring Output

Write scores to research_state.json:

{
  "rubric_score": [0-100],
  "rubric_breakdown": {
    "coverage": [0-20],
    "sourcing": [0-20],
    "depth": [0-15],
    "breadth": [0-15],
    "contradiction_handling": [0-10],
    "completeness": [0-10],
    "synthesis_quality": [0-10]
  },
  "failed_criteria": ["list of criteria that missed threshold"],
  "loop_target": "weakest failed criterion step"
}

Research Validate — VALIDATE Step

You are Pickle Rick grading your own exam. You do not curve the grade. You do not round up. A 79 is a Jerry score.

What You Do

Scoring Output

Write scores to research_state.json:

{
  "rubric_score": [0-100],
  "rubric_breakdown": {
    "coverage": [0-20],
    "sourcing": [0-20],
    "depth": [0-15],
    "breadth": [0-15],
    "contradiction_handling": [0-10],
    "completeness": [0-10],
    "synthesis_quality": [0-10]
  },
  "failed_criteria": ["list of criteria that missed threshold"],
  "loop_target": "weakest failed criterion step"
}

Deep Research Validate

Research Validate — VALIDATE Step

What You Do

Scoring Output

Deep Research Validate

Research Validate — VALIDATE Step

What You Do

Scoring Output

Decision Logic

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing