Use when implementing a feature, fix, refactor, or content change that should go through a complete delivery loop: implement, test, vibe-check, AI review, and measurement before keeping or discarding the change. Trigger when the user asks to build something, fix a bug, write or revise content, improve quality, or when you need to decide whether a change is actually ready to ship. 中文触发:开发、实现、修复、写代码、改一下、做这个功能、加个功能、修个 bug、帮我写、帮我改、重构。
Run a complete development loop instead of stopping at "tests pass". The loop is:
implement -> test -> vibe-check -> AI review -> measure -> keep or discard
Use it for code, docs, papers, content, and other artifacts where correctness alone is not enough. The goal is to catch regressions early, verify real-world quality, and make keep/discard decisions using explicit observations rather than gut feel alone.
Choose the mode that matches the task:
Both modes use the same stages. The difference is who gates progression.
Tests answer: "Does it do what I intended?"
Experience the change like a user or consumer would. This catches unspecified but obvious problems.
Examples:
Guidance:
Do an objective alignment pass, not just a style review.
Check:
Run the project's observations and compare against a baseline or the previous successful iteration.
Typical observation types:
Possible outcomes:
If you discard, revert to the pre-change snapshot and try a different approach instead of patching blindly.
Adjust how heavy the loop is based on the change:
When the project uses iteration tracking, keep a simple progress log such as progress.tsv with:
baseline, keep, discard, investigate)Use a human-readable location in the repo root unless the project specifies another path.
If your environment includes telemetry tooling such as skill-evolve, log the keep/discard result as a best-effort step after measurement. Do not let telemetry block the actual work.
north-star: defines the goals or indicators that measurement should checklong-horizon: governs how to make autonomous decisions during extended loopsnotify: can report flagged decisions or iteration outcomes without blocking the workThis Codex version keeps the procedural guidance from the Claude skill but removes Claude-specific frontmatter fields such as category. Any project-specific commands should live in the repo's own instructions or adjacent references, not hard-coded into this skill.