Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing. Use when AI agents perform most implementation work and humans enforce quality and risk controls.
Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.
Example workflow:
1. Write test that captures desired behavior (eval)
2. Run test → capture baseline failures
3. Implement feature
4. Re-run test → verify improvements
5. Check for regressions in other tests
Apply the 15-minute unit rule:
Good decomposition:
Task: Add user authentication
├─ Unit 1: Add password hashing (15 min, security risk)
├─ Unit 2: Create login endpoint (15 min, API contract risk)
├─ Unit 3: Add session management (15 min, state risk)
└─ Unit 4: Protect routes with middleware (15 min, auth logic risk)
Bad decomposition:
Task: Add user authentication (2 hours, multiple risks)
Choose model tier based on task complexity:
Haiku: Classification, boilerplate transforms, narrow edits
Sonnet: Implementation and refactors
Opus: Architecture, root-cause analysis, multi-file invariants
Cost discipline: Escalate model tier only when lower tier fails with a clear reasoning gap.
Continue session for closely-coupled units
Start fresh session after major phase transitions
Compact after milestone completion, not during active debugging
Prioritize:
Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.
Review checklist:
Track per task:
Example tracking:
Task: Implement user login
Model: Sonnet
Tokens: ~5k input, ~2k output
Retries: 1 (initial implementation had auth bug)
Time: 8 minutes
Outcome: Success