Grades HKDSE Information and Communication Technology exam papers for year 2025. Handles rubric extraction, reference calibration from 2024 labeled data (10 students), per-student grading of 10 students with per-question mark allocation, score compilation, and level division (1–5).
Data is pre-copied into this workspace under data/.
Masked data: data/masked_data/ict/2025/
Reference data: data/reference/ict/2024/
No symlinks or mounts needed.
Masked data path: data/masked_data/ict/2025/
student_answers/student1.pdf … student10.pdfrubrics/rubrics.pdf (single file)question/paper_part1.pdf, question/paper_part2.pdf (read both in order)Reference data path: data/reference/ict/2024/
rubrics/rubrics_part1.pdf … (17 split parts — read in order)rubrics_part17.pdfquestion/paper.pdfreference_mapping.json availableGrading notes:
The full grading workflow follows the unified skill defined in
.github/skills/hkdse-subject-grading/SKILL.md. The complete content of that skill
is reproduced below for reference.
This skill grades HKDSE elective subject exam papers by extracting rubrics from PDFs, calibrating against prior-year reference data with known levels, then grading each student's typed/printed answer PDF on a per-question basis. It produces per-student JSON results and compiles them into final scores with level assignments (1–5).
Subjects supported: ICT, Music, Ethics and Religious Studies, Tourism and Hospitality Studies, Visual Arts — and any similar subjects with typed written responses.
Key differences from Chinese Writing grading:
Sub-agents are mandatory. Each student MUST be graded by a dedicated sub-agent. No batch grading of multiple students in a single sub-agent call.
Model restrictions:
model
parameter override to sub-agents — let them inherit the main agent's model.gemini-3.1-flash-lite-preview
via the Gemini route exclusively. NO OTHER VLM MODELS OR ROUTES ARE ALLOWED.Gemini only (for both added chains):
extraction skill and model-service skill must both use Gemini
for API calls.GEMINI_BASE_URL, VERTEX_API_KEY from env.txt.gemini-3.1-flash-lite-preview.model-service skill: run a config-based route sanity check that
confirms route=gemini is callable and non-gemini routes are ignored.PDF rendering DPI: When converting PDF pages to images (e.g., for VLM extraction), default to 150 DPI to ensure text is legible. If processing speed is a concern, lowering DPI is acceptable as a trade-off, but do not go below 100 DPI.
Decide image preservation from extraction results, not pre-run config. When a page's
meaning may depend on layout or visuals — e.g., diagrams, geometry-style constructions,
charts, screenshots, tables with spatial meaning, arrows/labels, or annotated drawings —
the extraction API output must explicitly signal this (for example via
needs_visual_reference + visual_reference_reason). The agent should inspect those
returned hints and only then decide whether to export the whole page image as an
extraction artifact. Do not crop at this stage.
No LLM/VLM API calls for generating feedback or analysis text. See WARNING.md.
VLM is permitted only for tasks requiring genuine visual inspection — text
extraction from scanned PDFs where PyMuPDF fails, and grading inherently visual
student work (e.g., art creation, handwriting quality, diagrams). VLM must NOT be
used for any dimension assessable from extracted text.
Read split-part PDFs in numerical order. Files named _part1.pdf, _part2.pdf,
…, _partN.pdf MUST be read in sequence. Concatenate the extracted content in order.
Ignore _original.pdf files (these are the unsplit source).
Reference data informs level assignment. Grade reference students to compute empirical score ranges per level. Agents may use these ranges, along with rubric criteria and official level descriptors, to define level boundaries. There are no hard restrictions on the level division method.
Always set the year before running any operation. Either pass --year YEAR to
scripts or export GRADING_YEAR=YYYY at the start.
All report and commentary text must be written in Traditional Chinese (繁體中文). This applies to all narrative, feedback, analysis, and section content in generated DOCX reports. English is only permitted for: variable names, file paths, technical identifiers, chart axis labels, and JSON field names. Exception: if the subject being assessed is an English-language subject, English is used throughout.
Skill-chain requirement (new):
extraction skill.model-service skill (YAML-only, no .py edits).gemini-3.1-flash-lite-preview) and read URL/API key values from env.txt;
missing values mean that provider/API is unsupported in the current environment.Use this grading skill as the orchestrator and explicitly chain the migrated skills:
extraction skill):
model-service skill):
gemini-3.1-flash-lite-preview)extraction and model-service are config-driven execution skills; do not
modify their underlying Python scripts inside this grading workflow.Canonical ICT pilot location: tasks/ict-grading/workspace/
Legacy compatibility path: ict-grading-workspaces/grading_workspace/ (symlink to the canonical workspace)
tasks/ict-grading/workspace/
├── .github/skills/hkdse-{subject}-grading/SKILL.md
├── START.md
├── WARNING.md
├── env.txt # Local credentials (never commit)
├── env.txt.example # Template
├── pyproject.toml # Symlink to ../../../envs/ict-grading/pyproject.toml
├── uv.lock # Symlink to ../../../envs/ict-grading/uv.lock
├── start.sh
├── data/ # Local data (pre-copied into workspace)
│ ├── reference/ict/2024/
│ │ ├── student_answers/level{L}_student{M}.pdf (original level-based names)
│ │ ├── rubrics/rubrics_part1.pdf … rubrics_part17.pdf
│ │ ├── question/paper.pdf
│ │ └── reference_mapping.json
│ └── masked_data/ict/2025/
│ ├── student_answers/student1.pdf … student10.pdf
│ ├── rubrics/rubrics.pdf
│ └── question/paper_part1.pdf, paper_part2.pdf
├── rubric/{grade_year}/ # Grading-year artifacts
│ ├── grading_guide.md
│ ├── reference_calibration.md # Summary derived from reference year; used by current-year grading
│ ├── page_images/ # Optional full-page PNG artifacts for rubric/question pages
│ └── level_division.json
├── rubric/{ref_year}/ # Reference-year calibration artifacts (e.g., rubric/2024/)
│ ├── reference_scores.json # Per-student scores for the reference cohort
│ └── calibration/ # Intermediate calibration artifacts
│ └── (per-student reference grading outputs, draft rubrics, score calculations)
├── rubric/reference_data_analysis/ # Insights from reference data (Phase 3)
│ ├── per_level_analysis.md # Observed patterns per level 1–5
│ ├── rubric_gaps.md # Gaps/ambiguities in official rubric
│ └── rubric_refinements.md # Supplementary rubric guidance
├── scripts/ # Optional workspace link to ../../../skills/hkdse-ict-report/scripts
│ ├── generate_class_report.py
│ ├── generate_student_reports.py
│ ├── validate_extraction.py
│ ├── validate_grading_output.py
│ └── validate_reports.py
├── extracted/{grade_year}/ # Extracted student artifacts
│ └── students/
│ ├── student{N}.txt
│ └── page_images/student{N}/page_{P}.png # Optional visual-page artifacts
└── output/{grade_year}/ # Grading output
├── student{N}.json
└── final_scores.json
source env.txt
uv sync
export GRADING_YEAR={grade_year} # Set from env.txt or manually
Confirm the following exist:
data/masked_data/ict/2025/student_answers/student1.pdf … student10.pdfdata/masked_data/ict/2025/rubrics/rubrics.pdfdata/masked_data/ict/2025/question/paper_part1.pdf, paper_part2.pdfdata/reference/ict/2024/reference_mapping.jsondata/reference/ict/2024/rubrics/rubrics_part1.pdf … rubrics_part17.pdfRead BATCH_SIZE from env.txt (default: 5) for parallel sub-agent control.
Read ALL rubric PDFs from the masked data directory. Handle these patterns:
rubrics/rubrics.pdf — extract directlyrubrics/rubrics_part1.pdf … rubrics_partN.pdf — read ALL
parts in numerical order, concatenate contentrubric_and_question/paper_part1.pdf …
paper_partN.pdf — read ALL parts in order; both rubric criteria and question text
are interleaved in these filesrubrics/level_descriptors.pdf exists, ALWAYS read it —
it contains official level-band boundaries critical for level assignmentUse PyMuPDF (fitz) for text extraction. Fall back to VLM only if text extraction
yields empty or garbled content.
Chain to extraction skill: perform this phase through the extraction skill's
YAML-driven pipeline (target=rubrics; choose request_mode by document length and
continuity needs). Force Gemini routing (model_routes -> gemini) with env-backed
provider config only (GEMINI_BASE_URL, VERTEX_API_KEY).
If a rubric page contains grading-relevant visual structure that plain text may flatten
or lose — for example diagrams, screenshots, tables whose layout matters, or annotated
figures — make sure the extraction output marks this explicitly (for example with
needs_visual_reference=true and a concise visual_reference_reason). After reading the
API result, the agent may then export the corresponding whole-page PNGs under
rubric/{grade_year}/page_images/rubrics/. Reuse existing PNGs on reruns whenever possible.
Note on rubric extraction granularity: For rubric/criteria PDFs (which are typically typeset/printed), extracting multiple pages at a time is acceptable. However, for extremely long rubric files (e.g., 17+ split parts), consider extracting in manageable batches of a few pages each to avoid truncation or quality degradation.
If the question paper is separate from rubrics:
question/paper.pdf — extract directlyquestion/paper_part1.pdf … paper_partN.pdf — read in orderquestion/Chain to extraction skill: perform this phase through the extraction skill
(target=question) and persist structured extraction artifacts before drafting
grading_guide.md. Force Gemini routing for this chain.
Apply the same rule to question pages: if the extraction result indicates diagrams,
UI mockups, charts, geometry-like figures, or other layout-sensitive content, export the
relevant full-page PNGs under rubric/{grade_year}/page_images/question/ after the API
response is reviewed.
Create rubric/{grade_year}/grading_guide.md with:
Read data/reference/{subject}/{ref_year}/reference_mapping.json to get the mapping
of student IDs to known levels (1–5).
Format:
{
"mappings": [
{"student_id": 1, "filename": "level3_student1.pdf", "level": 3, ...},
...
]
}
Note: Reference files keep their original level-based names (level{L}_student{M}.pdf).
The filename field in the mapping matches the actual file on disk. The level is also
encoded in the filename, making it easy for agents to identify which level each student
belongs to without parsing the JSON.
Extract text from ALL reference student answer PDFs in
data/reference/{subject}/{ref_year}/student_answers/.
Chain to extraction skill: use extraction pipeline in student-PDF mode
(pdf_dir + optional students) to produce structured extraction outputs that
feed reference calibration. Force Gemini routing for this chain.
Read rubric PDFs from the REFERENCE year directory (structure may differ from the grading year). Apply the same extraction rules as Phase 2.
Grade ALL reference students using the same rubric and grading guide (from Phase 2).
For each reference student, apply the per-question mark allocation to produce a total
raw score and percentage — exactly as you would for a masked student. Record these
scores alongside their known levels from reference_mapping.json.
This step is critical: the resulting scores provide the empirical score-to-level mapping that replaces hard-coded boundaries. Launch sub-agents for reference students following the same rules as Phase 5 (one sub-agent per student, same model restriction).
Save reference grading results to rubric/{ref_year}/reference_scores.json:
{
"reference_year": "{ref_year}",
"scores": [
{"student_id": 1, "level": 3, "total_raw_score": 42, "total_max_score": 80, "percentage": 52.5},
{"student_id": 2, "level": 5, "total_raw_score": 72, "total_max_score": 80, "percentage": 90.0}
],
"level_score_ranges": {
"1": {"min_pct": 12.5, "max_pct": 22.0, "mean_pct": 17.3, "count": 2},
"2": {"min_pct": 28.0, "max_pct": 38.5, "mean_pct": 33.3, "count": 2},
"3": {"min_pct": 45.0, "max_pct": 55.0, "mean_pct": 50.0, "count": 2},
"4": {"min_pct": 62.0, "max_pct": 74.0, "mean_pct": 68.0, "count": 2},
"5": {"min_pct": 82.0, "max_pct": 95.0, "mean_pct": 88.5, "count": 2}
}
}
The rubric must not be finalised from the official PDF alone. The agent must learn from the reference student data to derive an effective grading rubric — one that actually discriminates between student levels as observed in practice. This is an iterative loop that continues until the rubric reliably separates levels.
This is especially important for Visual Arts and Chinese / English Writing, where the official rubric provides only broad bands and subjective judgment is required. For these subjects, the reference data analysis must be particularly thorough: document what distinguishes each level in observable, concrete terms.
All analytical outputs from this step are saved to rubric/reference_data_analysis/.
Iterative loop:
rubric/reference_data_analysis/per_level_analysis.md documenting
observed patterns for each level 1–5:
rubric/reference_data_analysis/rubric_gaps.md — gaps or
ambiguities found in the official rubric when applied to reference students
(e.g., criteria that fail to distinguish adjacent levels, missing guidance for
common student behaviours, mark schemes that reward or penalise inconsistently)
b. Derive refinements in rubric/reference_data_analysis/rubric_refinements.md
— supplementary rubric interpretations and practical guidance that resolve the
gaps above (e.g., clarifying what “adequate discussion” means at each level,
adding concrete examples, specifying how to handle partial credit)
c. Revise rubric/{grade_year}/grading_guide.md to incorporate these
refinements — the grading guide is not simply extracted from the official rubric
PDF; it must incorporate insights from the reference data analysis
d. Re-grade affected reference students with the improved rubric
e. Repeat from step 2 until the discrimination check passesrubric/{grade_year}/grading_guide.md must incorporate insights from
rubric/reference_data_analysis/ — it is not simply a transcription of the
official rubric PDF.Discrimination check criteria — verify ALL of the following:
Only proceed to Phase 4 (actual student grading) once the discrimination check
passes. Document the outcome in rubric/{grade_year}/reference_calibration.md.
Temp file organization: All intermediate artifacts from calibration — including
reference student grading outputs, draft rubrics, and intermediate score calculations
— must be saved under rubric/{ref_year}/calibration/ (a dedicated subdirectory, e.g.
rubric/2024/calibration/). This separates reference-year working files from the
current grading year.
Final reference artifacts (reference_scores.json) are saved under rubric/{ref_year}/.
Final grading-year artifacts (grading_guide.md, reference_calibration.md,
level_division.json) are saved under rubric/{grade_year}/.
Using the reference scores from Step 3.4, compute level boundaries to guide level assignment. One common approach is finding midpoints between adjacent levels' score ranges, but agents may use any reasonable method.
These boundaries are guidelines derived from reference data, not hard restrictions. They serve as a starting point for level assignment in Phase 7, where agents may refine them using additional information from the rubric and level descriptors.
Create rubric/{grade_year}/reference_calibration.md with TWO sections:
Section 1 — Qualitative Calibration: For each level (1–5), summarise what rubric criteria reference students at that level demonstrated:
Focus on the main marking dimensions for the subject. Include specific examples of how reference students at each level addressed key questions.
Section 2 — Quantitative Score Ranges: Include the empirical score-to-level mapping from Step 3.4:
This quantitative mapping provides useful guidance for level assignment in Phases 5 and 7. Sub-agents and level division may use these ranges alongside rubric information and level descriptors to determine levels from scores.
CRITICAL: Student answer extraction MUST be done 1 page at a time to ensure extraction quality. Do NOT batch multiple pages into a single extraction call.
For each student in data/masked_data/{subject}/{grade_year}/student_answers/:
extracted/{grade_year}/students/student{N}.txtextracted/{grade_year}/students/page_images/student{N}/ — but only after the
extraction result signals this need.Chain to extraction skill: this phase must be implemented via the extraction skill
with request_mode=page_by_page (or page_by_page_with_prev only when boundary context
is necessary) to enforce one-page extraction quality requirements, and route via
Gemini only.
Adjust the extraction prompt/output contract so the API returns visual-preservation hints
(for example needs_visual_reference and visual_reference_reason). After extraction,
inspect those returned rows; if any item on a page is flagged, export that whole page
to PNG with PyMuPDF at 150 DPI. Do not crop.
Note: This 1-page-at-a-time rule applies to student answer PDFs. For rubric/criteria PDFs (Phase 2), multi-page extraction is acceptable since those are typeset documents with cleaner formatting.
Confirm all student text files exist and have non-trivial content. Flag any students whose extraction may be incomplete. If a student has visual-page artifacts, confirm the referenced PNG files also exist.
Write a script scripts/validate_extraction.py that:
extracted/{grade_year}/students/student{N}.txt filesextracted/{grade_year}/students/page_images/student{N}/ exists, verify it contains
at least one .png file and report the page-image countRun the script and confirm all students pass before proceeding to Phase 5. If any fail, re-run extraction for those students.
For each student, launch a dedicated sub-agent that:
Reads rubric/{grade_year}/grading_guide.md
Reads rubric/{grade_year}/reference_calibration.md (both qualitative and
quantitative sections — including score-to-level ranges)
Anti-bias note: The
reference_calibration.mdscore ranges are empirical data from one prior year. Do NOT treat them as fixed cutoffs. Thepreliminary_levelshould reflect the quality of the student's work against rubric criteria. If the student's score falls slightly outside a reference range but clearly matches a level's qualitative description, assign that level and note the reasoning.
Reads the student's extracted text (extracted/{grade_year}/students/student{N}.txt)
Grades EVERY question/sub-question against the rubric — focus on accurate per-question mark allocation first, without constraining scores to fit a predetermined level
Computes total score and percentage from per-question marks
Derives preliminary_level using the reference score ranges and any other
available level information (rubric, level descriptors, reference calibration)
Outputs a JSON file to output/{grade_year}/student{N}.json
IMPORTANT: Sub-agents must NOT use hard score restrictions to constrain their grading. The correct approach is: grade each question purely on rubric merit → sum the scores → map the total to a level using reference score ranges. Do NOT adjust per-question marks to force a particular level outcome.
Control parallelism with BATCH_SIZE — launch up to BATCH_SIZE sub-agents at a time.
Each output/{grade_year}/student{N}.json MUST follow this schema:
{
"student_id": N,
"questions": [
{
"question_id": "Q1a",
"max_marks": 4,
"awarded_marks": 3.0,
"reasoning": "Step-by-step analysis citing rubric criteria: (1) [Rubric criterion 1] — student answer satisfies this because [specific part of answer]; (2) [Rubric criterion 2] — student answer partially satisfies this: [what was present] vs. [what was missing]; (3) [Rubric criterion 3] — student answer does not satisfy this because [reason]. Therefore 3/4 marks awarded.",
"evidence": "Direct quote from student answer for each criterion: '[verbatim student text satisfying criterion 1]'; '[verbatim student text partially satisfying criterion 2]'. Missing element: '[what the rubric requires but the student did not provide]'."
}
],
"total_raw_score": 45,
"total_max_score": 60,
"percentage": 75.0,
"preliminary_level": 4,
"level_reasoning": "Based on reference calibration, this student's performance aligns with Level 4 characteristics: ..."
}
Fields:
student_id: Integer student numberquestions: Array of per-question grading objects
question_id: String identifier (e.g., "Q1a", "Q2bii")max_marks: Maximum marks for this questionawarded_marks: Marks awarded (float, can be 0.5 increments if rubric allows)reasoning: Step-by-step analysis of mark allocation, always grounded in the rubric first.
For each marking point in the grading guide: (a) state the rubric criterion being assessed,
(b) explain whether and how the student's answer satisfies it, (c) for partial marks, specify
exactly what was present and what was absent. General subject-knowledge conventions may
supplement the rubric only where the rubric is genuinely silent — and this must be stated
explicitly (e.g., "Rubric does not specify, but convention requires…").evidence: Direct verbatim quote(s) from the student's answer that demonstrate satisfaction
(or non-satisfaction) of each rubric criterion. Do not paraphrase — quote the student answer
exactly. For partial-mark cases, quote both the satisfying portion and identify the missing element.total_raw_score: Sum of all awarded_markstotal_max_score: Sum of all max_markspercentage: (total_raw_score / total_max_score) × 100preliminary_level: Level (1–5) derived using the reference score ranges
and available level information. This is a best-fit estimatelevel_reasoning: Explanation for the level assignment, referencing the score
range it falls into (e.g., "Score 52.5% falls within Level 3 reference range
45.0%–55.0%")Grading methodology — sub-agents MUST follow this sequence:
reasoning field must reference the specific rubric criterion and the specific student text (verbatim in evidence) that supports or fails each mark.After each sub-agent completes:
Write a script scripts/validate_grading_output.py that:
output/{grade_year}/student{N}.json filesstudent_id, questions, total_raw_score, total_max_score, percentage, preliminary_level), value ranges (preliminary_level ∈ [1,5], percentage ∈ [0,100]), and that total_raw_score equals the sum of all awarded_marksRun the script. Re-run the sub-agent for any failing students. Re-run the script until all pass.
Read all output/{grade_year}/student{N}.json files and compile into
output/{grade_year}/final_scores.json:
{
"subject": "{subject}",
"year": "{grade_year}",
"total_students": N,
"max_possible_score": 60,
"students": [
{
"student_id": 1,
"total_raw_score": 45,
"total_max_score": 60,
"percentage": 75.0,
"preliminary_level": 4
}
],
"statistics": {
"mean_score": 42.5,
"median_score": 43.0,
"std_dev": 8.2,
"min_score": 20,
"max_score": 58,
"mean_percentage": 70.8
}
}
Agents may use the extracted reference data (from Phase 3) to inform level division. The following sources are available — agents should consider all relevant information and exercise judgment:
rubric/{grade_year}/reference_scores.json) — the
empirical score ranges from grading reference students in Phase 3. These provide
an empirical basis for mapping scores to levels.level_descriptors.pdf if available) —
these describe the qualitative characteristics expected at each levelrubric/{grade_year}/reference_calibration.md)
— contains both qualitative descriptions and quantitative score ranges per levelThere are no hard restrictions on how level boundaries must be computed. Agents should use the available reference data, rubric information, and level descriptors to define reasonable level boundaries that reflect the subject's scoring patterns.
⚠️ Reference Year Bias Prevention:
The reference year's score ranges are a starting point for calibration only, not fixed thresholds to be carried forward. Before finalising level boundaries, the agent must:
level_division.json must include an "adjustment_notes" field. It
must either explain any deviation from raw reference boundaries and the reasoning
behind the adjustment, or explicitly state "No adjustment; current cohort
distribution aligns with reference year ranges."For each student, map their total score to a level (1–5) using the reference-derived
boundaries from Step 7.1. Save to rubric/{grade_year}/level_division.json:
{
"method": "reference_score_based",
"reference_year": "{ref_year}",
"boundaries": {
"level_1": {"min_percentage": 0, "max_percentage": 27.5},
"level_2": {"min_percentage": 27.5, "max_percentage": 41.75},
"level_3": {"min_percentage": 41.75, "max_percentage": 60.0},
"level_4": {"min_percentage": 60.0, "max_percentage": 78.0},
"level_5": {"min_percentage": 78.0, "max_percentage": 100}
},
"reference_score_ranges": {
"level_1": {"min_pct": 12.5, "max_pct": 22.0, "mean_pct": 17.3},
"level_2": {"min_pct": 28.0, "max_pct": 38.5, "mean_pct": 33.3},
"level_3": {"min_pct": 45.0, "max_pct": 55.0, "mean_pct": 50.0},
"level_4": {"min_pct": 62.0, "max_pct": 74.0, "mean_pct": 68.0},
"level_5": {"min_pct": 82.0, "max_pct": 95.0, "mean_pct": 88.5}
},
"adjustment_notes": "No adjustment; current cohort distribution aligns with reference year ranges.",
"students": [
{
"student_id": 1,
"total_raw_score": 45,
"percentage": 75.0,
"preliminary_level": 4,
"final_level": 4,
"level_reasoning": "Score 75.0% falls within Level 4 reference range (62.0%–74.0%), closest to Level 4 mean (68.0%)"
}
]
}
Note: The boundaries values above are examples. Agents should derive
appropriate boundaries from the available reference data and rubric information.
Compare the level distribution of graded students against the reference year's distribution. Flag any significant discrepancies (e.g., if the reference year had students at all 5 levels but the graded year has none at Level 1).
data/masked_data/ict/2025/
rubrics/rubrics.pdf (single file)question/paper_part1.pdf, question/paper_part2.pdfdata/reference/ict/2024/
rubrics/rubrics_part1.pdf … rubrics_part17.pdf (17 split parts)question/paper.pdfdata/masked_data/music/2025/
rubric_and_question/paper_part1.pdf … paper_part24.pdfrubrics/ or question/ directoriesdata/reference/music/2024/
rubric_and_question/paper_part1.pdf … paper_part21.pdfdata/masked_data/religion/2025/
rubrics/rubrics_part1.pdf … rubrics_part7.pdf (7 split parts)question/paper.pdfdata/reference/religion/2024/
rubrics/rubrics_part1.pdf … rubrics_part7.pdf (7 split parts)question/paper.pdfdata/masked_data/tourism/2025/
rubrics/rubrics.pdf (main mark scheme) AND
rubrics/level_descriptors.pdf (official level band descriptors)level_descriptors.pdf — it contains the official level boundariesquestion/paper.pdfdata/reference/tourism/2024/
rubrics/rubrics.pdf AND rubrics/level_descriptors.pdfquestion/paper.pdfdata/masked_data/visual-arts/2020/
rubrics/rubrics.pdf (single file)question/ directory — files may have Chinese filenames
(e.g., 2020_視藝_P1QAB.pdf, 2020_視藝_P2QAB.pdf); read ALL PDFs in this directorydata/reference/visual-arts/2025/
rubrics/rubrics_part1.pdf … rubrics_part5.pdf (5 split parts)question/paper_part1.pdf … paper_part5.pdf (5 split parts)gemini-3.1-flash-lite-preview, 1 page per call)⚠️ DO NOT EXECUTE — This phase is disabled for now. Skip Phase 8 entirely. Do not run any steps in this phase.
After level division, review the overall score distribution:
Randomly select 2–3 students and manually verify their grading:
Compare students at the same level:
Confirm all output files are complete and valid:
student{N}.json files exist with valid JSONfinal_scores.json contains all students with correct statisticsrubric/{grade_year}/level_division.json has valid boundaries and all student assignmentsGRADING_YEAR env varuv sync)model-service skill run at least once to confirm Gemini (gemini-3.1-flash-lite-preview) route availability from env.txtreference_scores.json)extraction skill used in Phase 2/3/4 extraction tasks with Gemini (gemini-3.1-flash-lite-preview) routingfinal_scores.json)rubric/{grade_year}/level_division.json)