Name: Lingliang Grading
Author: AKCqhzdy

Buscar habilidades.../

question_manifest.csv

Sub-agents are mandatory. Each student MUST be graded by a dedicated sub-agent. No batch grading of multiple students in a single sub-agent call.
Model restrictions:
- Sub-agents: SAME model as the main agent. All sub-agents MUST use the same model that the main (orchestrating) agent is running on. Do NOT pass a model parameter override to sub-agents — let them inherit the main agent's model.
- VLM tasks: Gemini ONLY. When VLM (vision-language model) API calls are needed — e.g., text extraction from scanned PDFs where PyMuPDF fails, or interpreting handwritten mathematical notation — use gemini-3.1-flash-lite-preview via the Gemini route exclusively. NO OTHER VLM MODELS OR ROUTES ARE ALLOWED.
Gemini only (for both added chains):
- extraction skill and model-service skill must both use Gemini for API calls.
- Read GEMINI_BASE_URL, VERTEX_API_KEY from env.txt.
- Model must be gemini-3.1-flash-lite-preview.
- Do not route these two chains to Aliyun/OpenRouter/native Moonshot/Kimi/Seed.
- Chain to model-service skill: run a config-based route sanity check that confirms route=gemini is callable and non-gemini routes are ignored.
PDF rendering DPI: When converting PDF pages to images (e.g., for VLM extraction), default to 150 DPI to ensure text is legible. If processing speed is a concern, lowering DPI is acceptable as a trade-off, but do not go below 100 DPI.
Image preservation is mandatory when flagged by extraction. When a page's meaning depends on layout or visuals — e.g., geometric figures, handwritten mathematical notation with spatial arrangement (fractions, long division layouts), graphs, tables with spatial meaning, arrows/labels, or annotated drawings — the extraction API output must explicitly signal this (via needs_visual_reference + visual_reference_reason). The agent MUST inspect those returned hints and export the whole page image as a PNG artifact for every flagged page. This applies to both rubric/question pages AND student answer pages. Do not crop at this stage.
No LLM/VLM API calls for generating feedback or analysis text. See . VLM is permitted only for tasks requiring genuine visual inspection — text extraction from scanned PDFs where PyMuPDF fails, and interpreting handwritten mathematical notation (fractions, division symbols, etc.). VLM must NOT be used for any dimension assessable from extracted text.

lingliang-grading-workspaces/grading_workspace/
├── .github/skills/lingliang-grading/SKILL.md
├── START.md
├── WARNING.md
├── env.txt                     # Local credentials (never commit)
├── env.txt.example             # Template
├── pyproject.toml
├── uv.lock
├── start.sh
├── data/                       # Local data (symlinked into workspace)
│   ├── reference/lingliang/
│   │   ├── student_answers/student_07.pdf, student_08.pdf, student_11.pdf, student_17.pdf, student_23.pdf
│   │   └── reference_mapping.json
│   ├── masked_data/lingliang/
│   │   ├── student_answers/student_01.pdf … student_26.pdf (excludes 07, 08, 11, 17, 23)
│   │   ├── rubrics/rubrics_q-{N}.pdf + rubrics_q-{N}.jpg (37 per-question pairs)
│   │   ├── question/q-{N}.txt + q-{N}.jpg (37 per-question materials)
│   │   └── question_manifest.csv
│   └── z_test_data/lingliang_prework/
│       └── answer_page_map.json
├── rubric/                     # Grading artifacts (no year nesting)
│   ├── grading_guide.md
│   ├── reference_calibration.md
│   ├── page_images/            # Full-page PNG artifacts for rubric/question pages (MANDATORY when visual content detected)
│   │   ├── rubrics/page_{P}.png   # Rubric pages with diagrams, reference answer figures
│   │   └── question/page_{P}.png  # Question pages with diagrams, graphs, geometric figures
│   ├── reference_scores.json
│   └── calibration/            # Intermediate calibration artifacts
│       └── (per-student reference grading outputs, draft rubrics, score calculations)
├── rubric/reference_data_analysis/  # Insights from reference data (Phase 3)
│   ├── score_correlation_analysis.md  # Observed patterns across score ranges
│   ├── rubric_gaps.md               # Gaps/ambiguities in rubric
│   └── rubric_refinements.md        # Supplementary rubric guidance
├── scripts/                    # Python helper scripts
│   ├── generate_class_report.py
│   ├── generate_student_reports.py
│   ├── validate_extraction.py
│   ├── validate_grading_output.py
│   └── validate_reports.py
├── extracted/                  # Extracted student text (no year nesting)
│   └── students/
│       ├── student_{NN}.txt
│       └── page_images/student_{NN}/page_{P}.png   # Visual-page artifacts (MANDATORY when [IMAGE_DATA] present)
└── output/                     # Grading output (no year nesting)
    ├── student_{NN}.json
    └── final_scores.json

source env.txt
uv sync

### Question {N} ({max_marks} marks)

**Question text:** {from question_text_path}

**Marking criteria:**
- Criterion 1: {description} — {marks} mark(s)
- Criterion 2: {description} — {marks} mark(s)
- ...

**Acceptable answers:** {list of acceptable forms}
**Common errors:** {known incorrect approaches that should NOT earn marks}
**Special notes:** {any rubric-specific instructions}

{
  "mappings": [
    {"student_id": 7, "filename": "student_07.pdf", "total_score": 100, "max_score": 100, "score_percentage": 100.0},
    {"student_id": 8, "filename": "student_08.pdf", "total_score": 74, "max_score": 100, "score_percentage": 74.0},
    {"student_id": 11, "filename": "student_11.pdf", "total_score": 56, "max_score": 100, "score_percentage": 56.0},
    {"student_id": 17, "filename": "student_17.pdf", "total_score": 87, "max_score": 100, "score_percentage": 87.0},
    {"student_id": 23, "filename": "student_23.pdf", "total_score": 30, "max_score": 100, "score_percentage": 30.0}
  ]
}

{
  "scores": [
    {"student_id": 7, "total_score": 100, "ai_total_raw_score": 98, "ai_total_max_score": 100, "ai_percentage": 98.0, "score_percentage": 100.0},
    {"student_id": 8, "total_score": 74, "ai_total_raw_score": 71, "ai_total_max_score": 100, "ai_percentage": 71.0, "score_percentage": 74.0},
    {"student_id": 11, "total_score": 56, "ai_total_raw_score": 54, "ai_total_max_score": 100, "ai_percentage": 54.0, "score_percentage": 56.0},
    {"student_id": 17, "total_score": 87, "ai_total_raw_score": 85, "ai_total_max_score": 100, "ai_percentage": 85.0, "score_percentage": 87.0},
    {"student_id": 23, "total_score": 30, "ai_total_raw_score": 32, "ai_total_max_score": 100, "ai_percentage": 32.0, "score_percentage": 30.0}
  ]
}

Grade all reference students with the current rubric draft (Step 3.4 above).
Analyse grading results versus known scores:
- Do AI-graded scores correlate with ground-truth scores? (Higher ground-truth → higher AI score?)
- Which rubric criteria caused over-scoring or under-scoring?
- Are the per-question mark allocations consistent with the rubric's intent?
- What patterns distinguish high-scoring students from low-scoring ones?
- For math-specific issues: Are method marks being awarded correctly when the final answer is wrong? Are equivalent representations (e.g., 1/2 vs 0.5) being accepted? Are units and labels being checked appropriately?
Write rubric/reference_data_analysis/score_correlation_analysis.md documenting the correlation between AI scores and known scores:
- For each reference student: known score, AI score, difference
- Per-question analysis where significant discrepancies exist
- Patterns in over/under-scoring across the score range
- Specific questions where the rubric interpretation needs adjustment
- A summary table showing each reference student's known vs. AI score
Run the discrimination check (see criteria below). If the check fails, or if qualitative analysis reveals the rubric is inadequate: a. Document gaps in rubric/reference_data_analysis/rubric_gaps.md — gaps or ambiguities found in the rubric when applied to reference students (e.g., criteria that fail to distinguish partial credit correctly, missing guidance for common student errors, mark schemes that reward or penalise inconsistently, unclear treatment of alternative solution methods) b. Derive refinements in rubric/reference_data_analysis/rubric_refinements.md — supplementary rubric interpretations and practical guidance that resolve the gaps above (e.g., clarifying what constitutes a "correct method" for partial credit, specifying how to handle equivalent mathematical expressions, defining acceptable rounding behavior, handling crossed-out work vs. final answers) c. Revise rubric/grading_guide.md to incorporate these refinements — the grading guide is not simply extracted from the official rubric PDFs; it must incorporate insights from the reference data analysis d. Re-grade affected reference students with the improved rubric e. Repeat from step 2 until the discrimination check passes
The final rubric/grading_guide.md must incorporate insights from rubric/reference_data_analysis/ — it is not simply a transcription of the official rubric PDFs.

Reads rubric/grading_guide.md
Reads rubric/reference_calibration.md (both qualitative and quantitative sections — including score correlation data)

Calibration note: The reference_calibration.md data validates that the rubric interpretation is consistent. Use it as a sanity check — if a student's score seems unusually high or low, verify the per-question grading is correct against the rubric.
Reads the student's extracted text (extracted/students/student_{NN}.txt)
Reads answer_page_map.json to know which page(s) contain the answer for each question
Grades EVERY question (all 37) against the rubric — focus on accurate per-question mark allocation, applying the rubric criteria precisely
Computes total score and percentage from per-question marks
Outputs a JSON file to output/student_{NN}.json
Handling visual content with [IMAGE_DATA] (MANDATORY — direct image viewing): When the student's extracted text contains [IMAGE_DATA] for any question, the sub-agent MUST:
- View the student's page image directly using the view tool on the PNG file at extracted/students/page_images/student_{NN}/page_{P}.png
- View the question image from the per-question JPG file (data/masked_data/lingliang/question/q-{N}.jpg) to understand the question's visual context (diagrams, geometric figures, graphs)
- View the rubric's reference answer image from the per-question rubric JPG (data/masked_data/lingliang/rubrics/rubrics_q-{N}.jpg) to understand what the correct visual answer looks like
- Compare the student's visual work against the question image, rubric reference image, and grading guide criteria to award marks
- Use any [VLM_DESCRIPTION] text as supplementary context, but do NOT rely on it as the sole basis for grading visual content
- Reference specific visual elements in the evidence field (e.g., "student drew correct geometric construction with compass marks visible")
- Do NOT give generic "benefit of doubt" scores — assess the actual visual content
- If neither the page image nor VLM description is available, award 0 for visual criteria and note it in

{
  "student_id": "NN",
  "questions": [
    {
      "question_id": "1",
      "max_marks": 2,
      "awarded_marks": 2.0,
      "reasoning": "Step-by-step analysis citing rubric criteria: (1) [Rubric criterion] — student answer satisfies this because [specific part of answer]. Therefore 2/2 marks awarded.",
      "evidence": "Direct quote from student answer: '[verbatim student text]'."
    },
    {
      "question_id": "3.a",
      "max_marks": 3,
      "awarded_marks": 2.0,
      "reasoning": "Step-by-step analysis: (1) Correct method used — 2 marks; (2) Final answer has arithmetic error — 0 marks for answer mark. Therefore 2/3 marks awarded.",
      "evidence": "Student work: '[verbatim calculation steps]'. Final answer: '[student's incorrect answer]' vs. expected '[correct answer]'."
    }
  ],
  "total_raw_score": 78,
  "total_max_score": 100,
  "percentage": 78.0
}

{
  "subject": "lingliang",
  "total_students": 20,
  "max_possible_score": 100,
  "students": [
    {
      "student_id": "01",
      "total_raw_score": 78,
      "total_max_score": 100,
      "percentage": 78.0
    }
  ],
  "statistics": {
    "mean_score": 65.3,
    "median_score": 67.0,
    "std_dev": 18.4,
    "min_score": 22,
    "max_score": 95,
    "mean_percentage": 65.3
  }
}

WARNING.md

reasoning

Lingliang Grading | Skills Pool

Lingliang Grading

Lingliang Grading

Lingliang Primary School Math Exam — Grading Skill

Subject-Specific Notes: Lingliang (25 students total: 5 reference + 20 grading)

Lingliang Primary School Math Exam — Unified Grading Skill

Overview

CRITICAL RULES

Skill Chain Integration

Workspace Layout

Phase 1: Environment Setup

Step 1.1: Install Dependencies

Step 1.2: Verify Data Availability

Phase 2: Rubric Extraction

Step 2.1: Read Question Manifest

Step 2.2: Extract Per-Question Rubric Content

Step 2.3: Build Grading Guide

Phase 3: Reference Calibration

Step 3.1: Read Reference Data

Step 3.2: Extract Reference Student Answers

Step 3.3: Read Reference Rubrics

Step 3.4: Grade Reference Students (Iterative Rubric Refinement)

Step 3.4a: Reference Data Analysis and Iterative Rubric Refinement

Step 3.5: Verify Score Correlation

Step 3.6: Build Reference Calibration Document

Phase 4: Student Answer Extraction

Step 4.1: Extract Text from Student PDFs

Step 4.1a: Export Student Page Images (When Visual Content Detected)

Step 4.2: Verify Extraction Completeness

Step 4.3: Write and Run Extraction Validation Script

Phase 5: Per-Student Grading

Step 5.1: Launch Sub-Agents

Step 5.2: Per-Student Output Schema

Step 5.3: Validate Sub-Agent Output

Step 5.4: Write and Run Grading Output Validation Script

Phase 6: Score Compilation

Step 6.1: Aggregate All Student JSONs

Step 6.2: Per-Question Score Analysis

Phase 7: Level Division

Phase 8: Quality Assurance

Step 8.1: Score Distribution Review

Step 8.2: Spot-Check Sample Students

Step 8.3: Cross-Student Consistency

Step 8.4: Final Output Verification

Subject-Specific Data Notes

Lingliang (Primary School Math)

Error Handling

Checklist

Update Skills

Eval Harness

Ecc Tools Cost Audit

Code Tour

Rules Distill

Design System