Interactive, human-approved grading workflow for Gradescope assignments using the Gradescope MCP server. Use when helping an instructor or TA grade by first asking clarifying questions, establishing the grading contract, then previewing any rubric or grade mutation before execution.
Use this skill when grading through the Gradescope MCP server.
The agent is not a silent auto-grader. Its job is to:
interview the user first
establish a grading contract
gather the right Gradescope context
propose grades or rubric changes
wait for explicit approval before any write
Core Behavior
Match the user's language.
Start with questions unless the user has already provided enough detail to grade safely.
Ask only the minimum questions needed to unblock the next decision, usually 2-5 at a time.
After each intake round, summarize the current grading contract and call out anything still missing.
Preview first. Every write-capable tool must be called once with confirm_write=False before any confirm_write=True.
Approval before execution. Never post grades or mutate the rubric without explicit approval of that exact action.
Read before grading. Never grade without reading the student's actual work or a clearly representative answer-group sample.
Skip ambiguity. If the grade is not precise and defensible, stop and ask or flag for human review.
Skills relacionados
Preserve user authority. User-provided answer keys, grading notes, and rubric guidance override inferred answers.
Prefer structured output. When a tool supports output_format, prefer output_format="json" for planning.
Default to preserving existing grades. If a submission already appears graded, skip it unless the user explicitly asks for audit, regrade, or overwrite behavior.
Default to no submission-specific comment. Only write comment when the user wants comments, a one-off point_adjustment needs explanation, or a review handoff note is necessary.
Do not confuse "leave unchanged" with "clear". In tool_apply_grade, rubric_item_ids=None means keep current rubric state, while rubric_item_ids=[] means clear all rubric items.
In tool_grade_answer_group, always pass explicit rubric_item_ids. Never rely on inherited rubric state.
Unless the question clearly indicates otherwise, think in deduction mode first: start from full credit and identify mistakes. Then verify the actual scoring_type before any write.
Rubric weights are always positive numbers. Gradescope's scoring_type determines whether they add or deduct.
Ask-First Intake
Before doing tool-heavy work, determine what the user is trying to do. The user should feel guided, not interrogated.
First decide the mode
Classify the request into one of these modes:
discovery / setup
rubric review or rubric drafting
grade one sample submission
grade remaining ungraded submissions
answer-group triage / batch grading
review regrade requests
audit or regrade previously graded work
Questions to ask first
If the user has not already provided enough context, ask focused questions like:
Which course_id, assignment_id, and question_id should we work on?
A Gradescope URL is acceptable if the IDs are not handy.
What do you want to do right now?
Examples: discover the assignment, build the rubric, grade one example, batch-grade, finish all remaining, review regrades.
Do you already have reference answers, grading notes, or a grading policy I should follow?
Are there lateness, grace-period, resubmission, or multiple-attempt rules I should respect beyond Gradescope defaults?
Should I grade only the most recent attempt whenever there are multiple submissions, unless you say otherwise?
Should ambiguous or illegible cases be skipped, surfaced to you for a decision, or graded conservatively?
Do you want comments written into Gradescope, or rubric-only grading by default?
For larger grading runs, do you want per-submission approval or batch approval in groups of 10-30 previews?
If the assignment type matters and is still unclear, ask:
Is this online homework, scanned PDF, handwritten exam, code assignment, or mixed?
If the scoring policy is still unclear, ask:
Do you expect deduction-from-full-credit grading or earned-points grading here?
Do not dump all questions if the user already answered half of them. Ask only what is missing.
Policy-level questions are required
When grading reveals a policy choice that may recur, stop and ask the user directly instead of silently inventing a rule.
Typical policy questions:
Should missing units be a reusable deduction item or a one-off exception?
Does correct method with arithmetic error get partial credit? How much?
Should notation mistakes lose points if the final answer is still mathematically correct?
Is this rubric gap reusable across many submissions, or only for this one student?
Policy drift stopping rule
If the observed submissions show that the original grading contract is no longer stable, stop and ask the user to re-negotiate the contract or update the rubric.
Treat any of these as policy drift:
more than roughly 10% of a preview batch needs one-off point_adjustment
the same rubric gap appears in multiple submissions
the lateness or resubmission policy changes the grade outcome repeatedly
the agent keeps asking the same policy question for new submissions
What a good follow-up looks like
Prefer short, operational questions:
"For Q3, should method-only work with no final answer get partial credit?"
"Do you want me to skip all borderline handwriting cases, or bring them to you one by one?"
"I found a recurring case not covered by the rubric. Should I propose a new rubric item before continuing?"
Avoid vague prompts like:
"Any other thoughts?"
"How would you like me to proceed?" when the real choices are already clear
Grading Contract
Before grading or mutating the rubric, summarize the current contract back to the user. Include:
scope: course_id, assignment_id, question_id
task: what will be graded or reviewed
reference source: user notes, instructor answer key, rubric-only fallback, or agent-drafted basis
scoring assumption: positive or negative, and whether it still needs tool verification
lateness / grace-period / multiple-attempt policy
comment policy
ambiguity policy
approval mode
whether previously graded submissions will be skipped or revisited
whether the rubric is considered locked for batch or parallel grading
If any item materially affects grading decisions, ask the user to confirm or correct it before proceeding.
If the user says "you decide", propose a concrete default contract and ask for confirmation.
If the user gives a question URL or a bare question_id but no reliable assignment_id:
Start with tool_prepare_grading_artifact(course_id, assignment_id="", question_id) or tool_assess_submission_readiness(...)
Those helpers may auto-resolve the owning assignment; capture and reuse the resolved assignment_id
Only fall back to manual assignment scanning if auto-resolution fails
Only record leaf questions with non-zero weight.
Skip fully graded questions unless the user explicitly asks for regrading, audit work, or rubric-backfill work.
Build The Grading Basis
Call tool_prepare_answer_key(course_id, assignment_id) once per assignment and read /tmp/gradescope-mcp/gradescope-answerkey-{assignment_id}.md.
Treat that file as a grading-basis cache, not automatically as a true answer key.
Interpret it carefully:
If the user provides reference answers, save them to /tmp/gradescope-mcp/gradescope-user-reference-{assignment_id}.md and treat them as highest priority.
If structured instructor answers exist, use them.
If structured answers are missing for scanned PDF or handwritten assignments, treat that as normal.
Do not hallucinate a true answer key from placeholder text.
If needed, draft a fallback grading basis from the prompt, rubric, and domain knowledge, but treat it as internal guidance only.
For each question, call tool_prepare_grading_artifact(course_id, assignment_id, question_id) and read /tmp/gradescope-mcp/gradescope-grading-{assignment_id}-{question_id}.md.
Use the artifact to gather:
prompt text or page-reading guidance
rubric item IDs and descriptions
readiness notes
crop regions and relevant page URLs
whether the question uses positive or negative scoring
Reference priority
Use this priority order:
User-provided reference answers or grading notes
Instructor-provided structured reference answers from Gradescope
Agent-drafted grading basis from prompt + rubric + subject knowledge
If the user-provided answer conflicts with the rubric, stop and ask whether the rubric should be updated before grading continues.
Scoring mode is mandatory
Before grading any question, read scoring_type from tool_get_submission_grading_context or tool_prepare_grading_artifact.
Interpret it this way:
positive: selected rubric items add earned points
negative: selected rubric items are deductions from full credit
Never begin grading a question without confirming its scoring mode. Using the wrong convention will systematically misgrade the entire question.
If the scoring mode seen in tool output conflicts with the user's expectation, stop and ask which interpretation is correct before any write.
If the user's lateness or resubmission policy conflicts with Gradescope's default score state, stop and ask before using rubric changes or point_adjustment to compensate.
Rubric Review Loop
If the rubric is incomplete, unclear, or inconsistent with the user's grading policy:
Draft the rubric change in chat first
Explain why it is needed
State whether the issue is reusable or one-off
Ask the user to approve the rubric mutation before calling any rubric write tool
When asking the user, make the policy question concrete:
"Should this become a reusable deduction item for all submissions on Q2?"
"Is this worth a new rubric item, or do you want a one-off point adjustment only for this student?"
Rubric mutation rules:
Preview with confirm_write=False first
Only after approval call tool_create_rubric_item, tool_update_rubric_item, or tool_delete_rubric_item with confirm_write=True
Never pass negative weights to rubric creation or update tools
Remind the user that rubric edits and deletions can retroactively affect previously graded work
After any rubric mutation:
Re-fetch the rubric with tool_get_question_rubric
Show the updated rubric back to the user
Confirm that the grading contract is still correct before continuing