技能檔案

Checkpoints

Name: Checkpoints
Author: djwmobley

Human-in-the-loop checkpoint taxonomy — MUST/SHOULD/MAY classification for all pipeline decision points

djwmobley0 星標2026年4月18日

職業: 項目管理專家
分類: 機器學習

技能內容

Human-in-the-Loop Checkpoints

Overview

Every point in the pipeline where a human decision is required or recommended is a checkpoint. Checkpoints are classified into three tiers that determine how they are presented and whether they can be skipped.

Core principle: Safety-critical decisions are never optional. Recommended checks are prompted with a default. Optional checks are available but off by default.

Taxonomy

Tier	Behavior	Prompt Format	Can Be Skipped?
MUST	Hard stop. Pipeline blocks until condition is met or user explicitly confirms.	Block statement + rationalization prevention table + "Cannot proceed until [condition]."	Never. Not configurable.
SHOULD	Prompted with default-yes. User can decline with explanation of what they are skipping.

相關技能

Checkpoints | Skills Pool

<!-- checkpoint:MUST [id] -->

[Condition statement. What must be true before proceeding.]

| Rationalization | Reality |
|---|---|
| "[common excuse 1]" | [why it's wrong] |
| "[common excuse 2]" | [why it's wrong] |
| "[common excuse 3]" | [why it's wrong] |

Stop. Don't proceed to [next step].

<!-- checkpoint:SHOULD [id] -->

[Action description]? (Y/n)

[What this does. What you risk by skipping.]

[Checkpoint name]: skipped by user

<!-- checkpoint:MAY [id] -->

[Action description]? (y/N)

[What this does.]

ID	Command	Description	Tier	Rationale
`orientation`	all phase skills	Assert cwd, branch, HEAD, worktree, dirty flag before Step 0	MUST	Bash tool persists cwd across calls; invisible drift corrupts history
`finish-tests-pass`	finish	Tests must pass before merge options	MUST	Merging with failing tests breaks the main branch
`finish-merge-verify`	finish	Re-run tests after merge	MUST	Merge can introduce conflicts that break tests
`finish-discard-confirm`	finish	Type "discard" to delete branch	MUST	Permanent data loss — branch and all commits deleted
`review-adversarial`	review	Must produce findings or clean certificate	MUST	Empty reviews are rubber-stamps that miss real bugs
`plan-coverage`	plan	Every spec requirement traces to a task	MUST	Untraced requirements get silently dropped from implementation
`build-qa-large`	build	QA verification for LARGE+ changes	SHOULD	Large changes have higher regression risk; QA catches integration failures
`debate-large`	debate	Design debate for LARGE+ specs	SHOULD	Complex specs have hidden assumptions that only adversarial challenge surfaces
`build-resume`	build	Confirm resume of interrupted build	SHOULD	User may want to start fresh rather than resume stale state
`plan-no-debate`	plan	Warn when LARGE+ spec has no debate	SHOULD	Plans without debate have historically required full rewrites
`remediate-proceed`	remediate	Confirm batch plan before executing fixes	SHOULD	User should review which findings will be auto-fixed before code changes
`build-completion`	build	Post-build option selection	SHOULD	User chooses next workflow step (review, commit, leave)
`finish-completion`	finish	Post-finish option selection	SHOULD	User chooses merge strategy (merge+push, PR, keep, discard)
`debate-medium`	debate	Design debate for MEDIUM specs	MAY	Low risk — most MEDIUM plans succeed without debate

Classify the tier:
- Does skipping this risk data loss, broken deploys, or security bypass? → MUST
- Is this a recommended check that most users should accept? → SHOULD
- Is this an optional enhancement most users will skip? → MAY
Pick an ID following the {command}-{action-noun} convention
Add the annotation to the command file at the checkpoint location:
```

```
Add a row to the registry table above with ID, command, description, tier, and rationale
Format the prompt according to the rendering rules for the chosen tier
Add skip logging if the tier is SHOULD

Thought	Reality
"This checkpoint is just for documentation"	If it requires a human decision, it is a checkpoint. Classify and register it.
"This is obviously a SHOULD, not a MUST"	If skipping it can lose data or break deploys, it is a MUST. The user's convenience does not override safety.
"Nobody will ever skip this"	If nobody skips it, making it MUST costs nothing. If someone does skip it, the classification matters.
"Adding to the registry is overhead"	One table row. The overhead of an unregistered checkpoint is discovering it by accident during an incident.
"I'll register it later"	Register it now. Unregistered checkpoints are invisible checkpoints.

Checkpoints

Human-in-the-Loop Checkpoints

Overview

Taxonomy

Checkpoints

Human-in-the-Loop Checkpoints

Overview

Taxonomy

MUST Rendering Rules

SHOULD Rendering Rules

MAY Rendering Rules

Checkpoint Registry

Skip Logging

ID Naming Convention

Adding New Checkpoints

Red Flags / Rationalization Prevention

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns