Grades HKDSE elective subject exam papers for a specified subject. Use this when asked to grade typed student answer PDFs against an official rubric. Handles rubric extraction from PDFs (including split-part PDFs), reference calibration from prior-year labeled data, per-student grading with per-question mark allocation, total-score compilation, and level division (1–5). Subject-agnostic: works for ICT, Music, Religion, Tourism, Visual Arts, Biology, Economics, BAFS, and similar subjects with typed written responses.
This skill grades HKDSE elective subject exam papers by extracting rubrics from PDFs, calibrating against prior-year reference data with known levels, then grading each student's typed/printed answer PDF on a per-question basis. It produces per-student JSON results and compiles them into final scores with level assignments (1–5).
Subjects supported: ICT, Music, Ethics and Religious Studies, Tourism and Hospitality Studies, Visual Arts, Biology, Economics, BAFS — and any similar subjects with typed written responses.
Key differences from Chinese Writing grading:
Sub-agents are mandatory. Each student MUST be graded by a dedicated sub-agent. No batch grading of multiple students in a single sub-agent call.
Model restrictions:
model
parameter override to sub-agents — let them inherit the main agent's model.kimi-k2.5)
exclusively. NO OTHER VLM MODELS ARE ALLOWED.Gemini VLM for PDF extraction: Use gemini-3.1-pro via the Google Vertex AI
endpoint as the primary VLM for PDF-to-text extraction. Authenticate using the
service account JSON (project-f154aafa-a809-44c8-89f-70e8abc5e53a.json) in the
workspace root. A ready-to-use client is provided at scripts/vertex_client.py:
from scripts.vertex_client import get_openai_client
client = get_openai_client()
The module handles service-account authentication, token refresh, and
base-URL construction automatically. Use model name google/gemini-3.1-pro.
If the Gemini endpoint is unavailable, rate-limited, or returns errors, fall back
to Kimi (KIMI_BASE_URL, KIMI_API_KEY, KIMI_MODEL from env.txt).
ALIYUN as secondary fallback: If both Gemini and Kimi are unavailable or
rate-limited, switch to the ALIYUN endpoint using ALIYUN_BASE_URL,
ALIYUN_API_KEY, and ALIYUN_MODEL from env.txt.
You may also use ALIYUN in parallel with Gemini/Kimi when running multiple
extraction sub-agents simultaneously to avoid rate limits.
PDF rendering DPI: When converting PDF pages to images (e.g., for VLM extraction), default to 150 DPI to ensure text is legible. If processing speed is a concern, lowering DPI is acceptable as a trade-off, but do not go below 100 DPI.
No LLM/VLM API calls for generating feedback or analysis text. See WARNING.md.
VLM is permitted only for tasks requiring genuine visual inspection — text
extraction from scanned PDFs where PyMuPDF fails, and grading inherently visual
student work (e.g., art creation, handwriting quality, diagrams). VLM must NOT be
used for any dimension assessable from extracted text.
Read split-part PDFs in numerical order. Files named _part1.pdf, _part2.pdf,
…, _partN.pdf MUST be read in sequence. Concatenate the extracted content in order.
Ignore _original.pdf files (these are the unsplit source).
Reference data informs level assignment. Grade reference students to compute empirical score ranges per level. Agents may use these ranges, along with rubric criteria and official level descriptors, to define level boundaries. There are no hard restrictions on the level division method.
Always set the year before running any operation. Either pass --year YEAR to
scripts or export GRADING_YEAR=YYYY at the start.
All report and commentary text must be written in Traditional Chinese (繁體中文). This applies to all narrative, feedback, analysis, and section content in generated DOCX reports. English is only permitted for: variable names, file paths, technical identifiers, chart axis labels, and JSON field names. Exception: if the subject being assessed is an English-language subject, English is used throughout.
{subject}-grading-workspaces/grading_workspace/
├── .github/skills/hkdse-{subject}-grading/SKILL.md
├── START.md
├── WARNING.md
├── env.txt # Local credentials (never commit)
├── env.txt.example # Template
├── pyproject.toml
├── uv.lock
├── start.sh
├── data/ # Symlink or mount to top-level data/
│ ├── reference/{subject}/{ref_year}/
│ │ ├── student_answers/level{L}_student{M}.pdf
│ │ ├── rubrics/ OR rubric_and_question/
│ │ ├── question/ (if separate)
│ │ └── reference_mapping.json
│ ├── masked_data/{subject}/{grade_year}/
│ │ ├── student_answers/student{N}.pdf
│ │ ├── rubrics/ OR rubric_and_question/
│ │ └── question/ (if separate)
│ └── groundtruth/{subject}/{grade_year}/groundtruth_mapping.json
├── rubric/{grade_year}/ # Generated rubric artifacts
│ ├── grading_guide.md
│ ├── reference_calibration.md
│ ├── reference_scores.json
│ ├── level_division.json
│ └── calibration/ # Intermediate calibration artifacts
│ └── (draft rubrics, reference grading outputs, score calculations)
├── rubric/reference_data_analysis/ # Insights from reference data (Phase 3)
│ ├── per_level_analysis.md # Observed patterns per level 1–5
│ ├── rubric_gaps.md # Gaps/ambiguities in official rubric
│ └── rubric_refinements.md # Supplementary rubric guidance
├── scripts/ # Python helper scripts
│ ├── generate_class_report.py
│ ├── generate_student_reports.py
│ ├── validate_extraction.py
│ ├── validate_grading_output.py
│ └── validate_reports.py
├── extracted/{grade_year}/ # Extracted student text
│ └── students/student{N}.txt
└── output/{grade_year}/ # Grading output
├── student{N}.json
└── final_scores.json
source env.txt
uv sync
export GRADING_YEAR={grade_year} # Set from env.txt or manually
Confirm the following exist:
data/masked_data/{subject}/{grade_year}/student_answers/rubrics/ and rubric_and_question/ directories)data/reference/{subject}/{ref_year}/ with reference_mapping.jsonRead BATCH_SIZE from env.txt (default: 5) for parallel sub-agent control.
Read ALL rubric PDFs from the masked data directory. Handle these patterns:
rubrics/rubrics.pdf — extract directlyrubrics/rubrics_part1.pdf … rubrics_partN.pdf — read ALL
parts in numerical order, concatenate contentrubric_and_question/paper_part1.pdf …
paper_partN.pdf — read ALL parts in order; both rubric criteria and question text
are interleaved in these filesrubrics/level_descriptors.pdf exists, ALWAYS read it —
it contains official level-band boundaries critical for level assignmentUse PyMuPDF (fitz) for text extraction. Fall back to VLM only if text extraction
yields empty or garbled content.
Note on rubric extraction granularity: For rubric/criteria PDFs (which are typically typeset/printed), extracting multiple pages at a time is acceptable. However, for extremely long rubric files (e.g., 17+ split parts), consider extracting in manageable batches of a few pages each to avoid truncation or quality degradation.
If the question paper is separate from rubrics:
question/paper.pdf — extract directlyquestion/paper_part1.pdf … paper_partN.pdf — read in orderquestion/Create rubric/{grade_year}/grading_guide.md with:
Read data/reference/{subject}/{ref_year}/reference_mapping.json to get the mapping
of student IDs to known levels (1–5).
Format:
{
"mappings": [
{"student_id": 1, "filename": "level3_student1.pdf", "level": 3, ...},
...
]
}
Note: Reference files keep their original level-based names (level{L}_student{M}.pdf).
The filename field in the mapping matches the actual file on disk. The level is also
encoded in the filename, making it easy for agents to identify which level each student
belongs to without parsing the JSON.
Extract text from ALL reference student answer PDFs in
data/reference/{subject}/{ref_year}/student_answers/.
Read rubric PDFs from the REFERENCE year directory (structure may differ from the grading year). Apply the same extraction rules as Phase 2.
Grade ALL reference students using the same rubric and grading guide (from Phase 2).
For each reference student, apply the per-question mark allocation to produce a total
raw score and percentage — exactly as you would for a masked student. Record these
scores alongside their known levels from reference_mapping.json.
This step is critical: the resulting scores provide the empirical score-to-level mapping that replaces hard-coded boundaries. Launch sub-agents for reference students following the same rules as Phase 5 (one sub-agent per student, same model restriction).
Save reference grading results to rubric/{grade_year}/reference_scores.json:
{
"reference_year": "{ref_year}",
"scores": [
{"student_id": 1, "level": 3, "total_raw_score": 42, "total_max_score": 80, "percentage": 52.5},
{"student_id": 2, "level": 5, "total_raw_score": 72, "total_max_score": 80, "percentage": 90.0}
],
"level_score_ranges": {
"1": {"min_pct": 12.5, "max_pct": 22.0, "mean_pct": 17.3, "count": 2},
"2": {"min_pct": 28.0, "max_pct": 38.5, "mean_pct": 33.3, "count": 2},
"3": {"min_pct": 45.0, "max_pct": 55.0, "mean_pct": 50.0, "count": 2},
"4": {"min_pct": 62.0, "max_pct": 74.0, "mean_pct": 68.0, "count": 2},
"5": {"min_pct": 82.0, "max_pct": 95.0, "mean_pct": 88.5, "count": 2}
}
}
The rubric must not be finalised from the official PDF alone. The agent must learn from the reference student data to derive an effective grading rubric — one that actually discriminates between student levels as observed in practice. This is an iterative loop that continues until the rubric reliably separates levels.
This is especially important for Visual Arts and Chinese / English Writing, where the official rubric provides only broad bands and subjective judgment is required. For these subjects, the reference data analysis must be particularly thorough: document what distinguishes each level in observable, concrete terms.
All analytical outputs from this step are saved to rubric/reference_data_analysis/.
Iterative loop:
rubric/reference_data_analysis/per_level_analysis.md documenting
observed patterns for each level 1–5:
rubric/reference_data_analysis/rubric_gaps.md — gaps or
ambiguities found in the official rubric when applied to reference students
(e.g., criteria that fail to distinguish adjacent levels, missing guidance for
common student behaviours, mark schemes that reward or penalise inconsistently)
b. Derive refinements in rubric/reference_data_analysis/rubric_refinements.md
— supplementary rubric interpretations and practical guidance that resolve the
gaps above (e.g., clarifying what "adequate discussion" means at each level,
adding concrete examples, specifying how to handle partial credit)
c. Revise rubric/{grade_year}/grading_guide.md to incorporate these
refinements — the grading guide is not simply extracted from the official rubric
PDF; it must incorporate insights from the reference data analysis
d. Re-grade affected reference students with the improved rubric
e. Repeat from step 2 until the discrimination check passesrubric/{grade_year}/grading_guide.md must incorporate insights from
rubric/reference_data_analysis/ — it is not simply a transcription of the
official rubric PDF.Discrimination check criteria — verify ALL of the following:
Only proceed to Phase 4 (actual student grading) once the discrimination check
passes. Document the outcome in rubric/{grade_year}/reference_calibration.md.
Temp file organization: All intermediate artifacts from calibration — including
reference student grading outputs, draft rubrics, and intermediate score calculations
— must be saved under rubric/{grade_year}/calibration/ (a dedicated subdirectory).
Final artifacts (grading_guide.md, reference_calibration.md, reference_scores.json,
level_division.json) remain directly under rubric/{grade_year}/ as before.
Using the reference scores from Step 3.4, compute level boundaries to guide level assignment. One common approach is finding midpoints between adjacent levels' score ranges, but agents may use any reasonable method.
These boundaries are guidelines derived from reference data, not hard restrictions. They serve as a starting point for level assignment in Phase 7, where agents may refine them using additional information from the rubric and level descriptors.
Create rubric/{grade_year}/reference_calibration.md with TWO sections:
Section 1 — Qualitative Calibration: For each level (1–5), summarise what rubric criteria reference students at that level demonstrated:
Focus on the main marking dimensions for the subject. Include specific examples of how reference students at each level addressed key questions.
Section 2 — Quantitative Score Ranges: Include the empirical score-to-level mapping from Step 3.4:
This quantitative mapping provides useful guidance for level assignment in Phases 5 and 7. Sub-agents and level division may use these ranges alongside rubric information and level descriptors to determine levels from scores.
CRITICAL: Student answer extraction MUST be done 1 page at a time to ensure extraction quality. Do NOT batch multiple pages into a single extraction call.
For each student in data/masked_data/{subject}/{grade_year}/student_answers/:
extracted/{grade_year}/students/student{N}.txtNote: This 1-page-at-a-time rule applies to student answer PDFs. For rubric/criteria PDFs (Phase 2), multi-page extraction is acceptable since those are typeset documents with cleaner formatting.
Confirm all student text files exist and have non-trivial content. Flag any students whose extraction may be incomplete.
Write a script scripts/validate_extraction.py that:
extracted/{grade_year}/students/student{N}.txt filesRun the script and confirm all students pass before proceeding to Phase 5. If any fail, re-run extraction for those students.
For each student, launch a dedicated sub-agent that:
Reads rubric/{grade_year}/grading_guide.md
Reads rubric/{grade_year}/reference_calibration.md (both qualitative and
quantitative sections — including score-to-level ranges)
Anti-bias note: The
reference_calibration.mdscore ranges are empirical data from one prior year. Do NOT treat them as fixed cutoffs. Thepreliminary_levelshould reflect the quality of the student's work against rubric criteria. If the student's score falls slightly outside a reference range but clearly matches a level's qualitative description, assign that level and note the reasoning.
Reads the student's extracted text (extracted/{grade_year}/students/student{N}.txt)
Grades EVERY question/sub-question against the rubric — focus on accurate per-question mark allocation first, without constraining scores to fit a predetermined level
Computes total score and percentage from per-question marks
Derives preliminary_level using the reference score ranges and any other
available level information (rubric, level descriptors, reference calibration)
Outputs a JSON file to output/{grade_year}/student{N}.json
IMPORTANT: Sub-agents must NOT use hard score restrictions to constrain their grading. The correct approach is: grade each question purely on rubric merit → sum the scores → map the total to a level using reference score ranges. Do NOT adjust per-question marks to force a particular level outcome.
Control parallelism with BATCH_SIZE — launch up to BATCH_SIZE sub-agents at a time.
Each output/{grade_year}/student{N}.json MUST follow this schema:
{
"student_id": N,
"questions": [
{
"question_id": "Q1a",
"max_marks": 4,
"awarded_marks": 3.0,
"reasoning": "Brief explanation of mark allocation",
"evidence": "Relevant quote or reference from student answer"
}
],
"total_raw_score": 45,
"total_max_score": 60,
"percentage": 75.0,
"preliminary_level": 4,
"level_reasoning": "Based on reference calibration, this student's performance aligns with Level 4 characteristics: ..."
}
Fields:
student_id: Integer student numberquestions: Array of per-question grading objects
question_id: String identifier (e.g., "Q1a", "Q2bii")max_marks: Maximum marks for this questionawarded_marks: Marks awarded (float, can be 0.5 increments if rubric allows)reasoning: Why these marks were awardedevidence: Supporting evidence from the student's answertotal_raw_score: Sum of all awarded_markstotal_max_score: Sum of all max_markspercentage: (total_raw_score / total_max_score) × 100preliminary_level: Level (1–5) derived using the reference score ranges
and available level information. This is a best-fit estimatelevel_reasoning: Explanation for the level assignment, referencing the score
range it falls into (e.g., "Score 52.5% falls within Level 3 reference range
45.0%–55.0%")After each sub-agent completes:
Write a script scripts/validate_grading_output.py that:
output/{grade_year}/student{N}.json filesstudent_id, questions, total_raw_score, total_max_score, percentage, preliminary_level), value ranges (preliminary_level ∈ [1,5], percentage ∈ [0,100]), and that total_raw_score equals the sum of all awarded_marksRun the script. Re-run the sub-agent for any failing students. Re-run the script until all pass.
Read all output/{grade_year}/student{N}.json files and compile into
output/{grade_year}/final_scores.json:
{
"subject": "{subject}",
"year": "{grade_year}",
"total_students": N,
"max_possible_score": 60,
"students": [
{
"student_id": 1,
"total_raw_score": 45,
"total_max_score": 60,
"percentage": 75.0,
"preliminary_level": 4
}
],
"statistics": {
"mean_score": 42.5,
"median_score": 43.0,
"std_dev": 8.2,
"min_score": 20,
"max_score": 58,
"mean_percentage": 70.8
}
}
Agents may use the extracted reference data (from Phase 3) to inform level division. The following sources are available — agents should consider all relevant information and exercise judgment:
rubric/{grade_year}/reference_scores.json) — the
empirical score ranges from grading reference students in Phase 3. These provide
an empirical basis for mapping scores to levels.level_descriptors.pdf if available) —
these describe the qualitative characteristics expected at each levelrubric/{grade_year}/reference_calibration.md)
— contains both qualitative descriptions and quantitative score ranges per levelThere are no hard restrictions on how level boundaries must be computed. Agents should use the available reference data, rubric information, and level descriptors to define reasonable level boundaries that reflect the subject's scoring patterns.
⚠️ Reference Year Bias Prevention:
The reference year's score ranges are a starting point for calibration only, not fixed thresholds to be carried forward. Before finalising level boundaries, the agent must:
level_division.json must include an "adjustment_notes" field. It
must either explain any deviation from raw reference boundaries and the reasoning
behind the adjustment, or explicitly state "No adjustment; current cohort
distribution aligns with reference year ranges."For each student, map their total score to a level (1–5) using the reference-derived
boundaries from Step 7.1. Save to rubric/{grade_year}/level_division.json:
{
"method": "reference_score_based",
"reference_year": "{ref_year}",
"boundaries": {
"level_1": {"min_percentage": 0, "max_percentage": 27.5},
"level_2": {"min_percentage": 27.5, "max_percentage": 41.75},
"level_3": {"min_percentage": 41.75, "max_percentage": 60.0},
"level_4": {"min_percentage": 60.0, "max_percentage": 78.0},
"level_5": {"min_percentage": 78.0, "max_percentage": 100}
},
"reference_score_ranges": {
"level_1": {"min_pct": 12.5, "max_pct": 22.0, "mean_pct": 17.3},
"level_2": {"min_pct": 28.0, "max_pct": 38.5, "mean_pct": 33.3},
"level_3": {"min_pct": 45.0, "max_pct": 55.0, "mean_pct": 50.0},
"level_4": {"min_pct": 62.0, "max_pct": 74.0, "mean_pct": 68.0},
"level_5": {"min_pct": 82.0, "max_pct": 95.0, "mean_pct": 88.5}
},
"adjustment_notes": "No adjustment; current cohort distribution aligns with reference year ranges.",
"students": [
{
"student_id": 1,
"total_raw_score": 45,
"percentage": 75.0,
"preliminary_level": 4,
"final_level": 4,
"level_reasoning": "Score 75.0% falls within Level 4 reference range (62.0%–74.0%), closest to Level 4 mean (68.0%)"
}
]
}
Note: The boundaries values above are examples. Agents should derive
appropriate boundaries from the available reference data and rubric information.
Compare the level distribution of graded students against the reference year's distribution. Flag any significant discrepancies (e.g., if the reference year had students at all 5 levels but the graded year has none at Level 1).
data/masked_data/ict/2025/
rubrics/rubrics.pdf (single file)question/paper_part1.pdf, question/paper_part2.pdfdata/reference/ict/2024/
rubrics/rubrics_part1.pdf … rubrics_part17.pdf (17 split parts)question/paper.pdfdata/masked_data/music/2024/
rubric_and_question/paper_part1.pdf … paper_part21.pdfrubrics/ or question/ directoriesdata/reference/music/2025/
rubric_and_question/paper_part1.pdf … paper_part24.pdfdata/masked_data/religion/2025/
rubrics/rubrics_part1.pdf … rubrics_part7.pdf (7 split parts)question/paper.pdfdata/reference/religion/2024/
rubrics/rubrics_part1.pdf … rubrics_part7.pdf (7 split parts)question/paper.pdfdata/masked_data/tourism/2025/
rubrics/rubrics.pdf (main mark scheme) AND
rubrics/level_descriptors.pdf (official level band descriptors)level_descriptors.pdf — it contains the official level boundariesquestion/paper.pdfdata/reference/tourism/2024/
rubrics/rubrics.pdf AND rubrics/level_descriptors.pdfquestion/paper.pdfdata/masked_data/visual-arts/2020/
rubrics/rubrics.pdf (single file)question/ directory — files may have Chinese filenames
(e.g., 2020_視藝_P1QAB.pdf, 2020_視藝_P2QAB.pdf); read ALL PDFs in this directorydata/reference/visual-arts/2025/
rubrics/rubrics_part1.pdf … rubrics_part5.pdf (5 split parts)question/paper_part1.pdf … paper_part5.pdf (5 split parts)data/masked_data/biology/2025/
rubrics/rubrics.pdf (single file)question/paper.pdfdata/reference/biology/2024/
rubrics/rubrics.pdfquestion/paper.pdfreference_mapping.json availabledata/masked_data/economics/2024/
rubrics/marking_scheme.pdf and rubrics/general_rubrics.pdf (two files)question/paper1.pdf and question/paper2.pdf (two papers)data/reference/economics/2025/
rubrics/general_rubrics.pdfquestion/paper1.pdf and question/paper2.pdf (two papers)reference_mapping.json availableAfter level division, review the overall score distribution:
Randomly select 2–3 students and manually verify their grading:
Compare students at the same level:
Confirm all output files are complete and valid:
student{N}.json files exist with valid JSONfinal_scores.json contains all students with correct statisticsrubric/{grade_year}/level_division.json has valid boundaries and all student assignmentsGRADING_YEAR env varuv sync)reference_scores.json)final_scores.json)rubric/{grade_year}/level_division.json)