技能檔案

Self-Driving Audit Loop

Name: Self-Driving Audit Loop
Author: Lbstrydom

Self-driving plan-audit-fix loop with adaptive learning. Three-model system: Claude (author) + GPT-5.4 (auditor) + Gemini 3.1 Pro (final arbiter). Features: R2+ suppression via adjudication ledger, map-reduce for large codebases, repo-aware prompt tuning, cloud learning store (Supabase), Thompson Sampling prompt selection. Triggers on: "audit loop", "plan and audit", "run the audit loop", "auto-audit", "plan-audit-fix loop", "iterate on the plan", "GPT audit", "audit the plan", "check the implementation", "verify the plan", "review against the plan", "audit docs/plans/", "audit this", "audit my code". Usage: /audit-loop <task-description> — Full cycle: plan + audit loop Usage: /audit-loop plan <plan-file> — Audit an existing plan iteratively Usage: /audit-loop code <plan-file> — Audit code against plan iteratively Usage: /audit-loop full <task-description> — Plan + implement + audit code Usage: /audit-loop <plan-file> — Same as code (shorthand)

Lbstrydom0 星標2026年4月14日

職業
分類: 自動化工具

技能內容

Orchestrate an automated plan-audit-fix quality loop with adaptive learning.

Input: $ARGUMENTS — task description or plan|code|full <path>.

Step 0 — Parse Mode and Validate

Input	Mode
`plan docs/plans/X.md`	PLAN_AUDIT — audit plan iteratively
`code docs/plans/X.md`	CODE_AUDIT — audit code against plan
`full <description>`	FULL_CYCLE — plan → audit → implement → audit code
`<description>`	PLAN_CYCLE — plan → audit → fix → repeat

Validate: plan file exists (if applicable). Do NOT pre-check API keys — the scripts load .env automatically via dotenv/config. Checking process.env.OPENAI_API_KEY before running will always return empty because the key lives in the repo's .env, not the shell environment. Let the script fail with its own error if the key is truly missing.

相關技能

Self-Driving Audit Loop | Skills Pool

技能檔案

Self-Driving Audit Loop

Lbstrydom0 星標2026年4月14日

職業
分類: 自動化工具

技能內容

Orchestrate an automated plan-audit-fix quality loop with adaptive learning.

Input: $ARGUMENTS — task description or plan|code|full <path>.

Step 0 — Parse Mode and Validate

Input	Mode
`plan docs/plans/X.md`	PLAN_AUDIT — audit plan iteratively
`code docs/plans/X.md`	CODE_AUDIT — audit code against plan
`full <description>`	FULL_CYCLE — plan → audit → implement → audit code
`<description>`	PLAN_CYCLE — plan → audit → fix → repeat

相關技能

═══════════════════════════════════════
  AUDIT LOOP — [MODE] — Starting
  Plan: <path> | Max 6 rounds | SID: $SID
═══════════════════════════════════════

pwd  # Must be the repo being audited, not claude-audit-loop or any other repo

Scope mode	When to use	Behavior
`--scope diff` (DEFAULT)	"audit my recent work", "/audit-loop my PR", after implementing a phase/feature	Auto-scopes to `git diff HEAD~1..HEAD` + unstaged + untracked files. Findings focus on YOUR changes.
`--scope plan`	Plan describes a large refactor touching many files; user wants broad view	All files referenced in the plan (legacy behavior).
`--scope full`	"audit the entire codebase", user explicitly asks for a codebase-wide review	Full repo audit — slowest, catches cross-cutting issues.

# Default: audit only recent changes (preferred)
# Run FOREGROUND — do not use run_in_background for R1
node scripts/openai-audit.mjs code <plan-file> \
  --out /tmp/$SID-r1-result.json \
  2>/tmp/$SID-r1-stderr.log

# Broader scope examples (only when user asks):
# --scope plan    → all plan-referenced files
# --scope full    → entire repo
# --base main     → diff against main instead of HEAD~1

Flag	Default	Effect
`--no-tools`	off	Skip Phase 0 entirely. Use for untrusted repos — ESLint configs can `require()` arbitrary code.
`--strict-lint`	off (advisory)	Count tool findings in verdict math. Without this flag, tool findings are surfaced but don't affect PASS/NEEDS_FIXES/SIGNIFICANT_ISSUES.

# Generate diff from fixes
git diff HEAD~1 -- . > /tmp/$SID-diff.patch

# Build changed + files lists from Step 4 fix list
CHANGED="scripts/shared.mjs,scripts/openai-audit.mjs"
FILES="$CHANGED,scripts/gemini-review.mjs"  # changed + dependents

# Determine passes
PASSES="sustainability"  # always include
# Add backend if any backend file changed, frontend if frontend changed, etc.

node scripts/openai-audit.mjs code <plan-file> \
  --round 2 \
  --ledger /tmp/$SID-ledger.json \
  --diff /tmp/$SID-diff.patch \
  --changed $CHANGED \
  --files $FILES \
  --passes $PASSES \
  --out /tmp/$SID-r2-result.json \
  2>/tmp/$SID-r2-stderr.log

# R1 — writes the cache
node scripts/openai-audit.mjs code <plan-file> \
  --session-cache /tmp/$SID-ctx.json \
  --out /tmp/$SID-r1-result.json

# R2 — reads the cache (brief generation skipped)
node scripts/openai-audit.mjs code <plan-file> \
  --round 2 --ledger /tmp/$SID-ledger.json \
  --session-cache /tmp/$SID-ctx.json \
  --out /tmp/$SID-r2-result.json

Flag	Source	Purpose
`--round <n>`	Orchestrator	Triggers R2+ mode (rulings, suppression, annotations)
`--ledger <path>`	Step 3.5 output	Adjudication ledger for rulings injection + suppression
`--diff <path>`	`git diff` output	Line-level change annotations in code context
`--changed <list>`	Step 4 fix list	Authoritative source for what was modified (reopen detection)
`--files <list>`	changed + dependents	Audit scope — what GPT sees in context
`--passes <list>`	Smart selection	Which passes to run
`--session-cache <path>`	SID-derived temp path	Cross-round brief + profile cache (skip LLM on R2+)

Pass	When to skip on R2+
`structure`	Skip ONLY if zero file additions/deletions/renames in the diff. Re-run if fixes created or deleted files.
`wiring`	Skip unless a route or API file was changed
`backend`	Run if any backend file changed
`frontend`	Run if any frontend file changed
`sustainability`	Always run (cross-cutting)

cat /tmp/$SID-r1-stderr.log

cat /tmp/$SID-r1-result.json

═══════════════════════════════════════
  ROUND 1 AUDIT — SIGNIFICANT_ISSUES
  H:6 M:10 L:5 | Deduped: 3 | Cost: ~$0.45
  Top: [H1] Missing auth on /api/...
═══════════════════════════════════════

Dimension	Values	Meaning
validity	`valid` \| `invalid` \| `uncertain`	Is the concern real?
scope	`in-scope` \| `out-of-scope`	Does it cite code this audit targeted?
action	`fix-now` \| `defer` \| `dismiss` \| `rebut`	What happens next?

node scripts/openai-audit.mjs rebuttal <plan-file> <rebuttal-file> \
  --out /tmp/$SID-resolution.json 2>/tmp/$SID-rebuttal-stderr.log

Condition	Action
Threshold NOT met	Fix → re-audit
Threshold met, new architectural	Fix → re-audit (stability resets)
Threshold met, mechanical only	Fix → re-audit (stability NOT reset)
Threshold met, 0 new, 2/2 stable	CONVERGED → Step 6, then REQUIRED Step 7
Round 6, not stable	Present to user, then REQUIRED Step 7

git diff HEAD~1 --stat | tail -1  # e.g. "6 files changed, 134 insertions(+), 28 deletions(-)"

Condition	Action
R1 → R2 HIGH count drops significantly (>30%)	Continue to R3
R2 → R3 HIGH count drops significantly	Continue to R4 (rare)
HIGH count plateaus or INCREASES across rounds	STOP — remaining findings are scope pressure, not correctness gaps
R2+ findings push for v2 features, parser dependencies, framework expansion	STOP — challenge as out-of-scope, document as "known limitations" in plan

node -e "
import { writeLedgerEntry, generateTopicId, populateFindingMetadata } from './scripts/shared.mjs';

// Example: dismissed finding
const finding = { section: 'scripts/shared.mjs', category: 'SOLID-SRP Violation', principle: 'SRP', _pass: 'backend' };
populateFindingMetadata(finding, 'backend');

writeLedgerEntry('/tmp/$SID-ledger.json', {
  topicId: generateTopicId(finding),
  semanticHash: 'abcd1234',
  adjudicationOutcome: 'dismissed',   // 'dismissed' | 'accepted' | 'severity_adjusted'
  remediationState: 'pending',        // 'pending' | 'planned' | 'fixed' | 'verified'
  severity: 'MEDIUM',
  originalSeverity: 'MEDIUM',
  category: finding.category,
  section: finding.section,
  detailSnapshot: 'shared.mjs mixes concerns...',
  affectedFiles: ['scripts/shared.mjs'],
  affectedPrinciples: ['SRP'],
  ruling: 'overrule',
  rulingRationale: '300-line file, 2 consumers, acceptable',
  resolvedRound: 1,
  pass: 'backend'
});
" --input-type=module

node scripts/debt-auto-capture.mjs --ledger /tmp/$SID-ledger.json

# blocked by an upstream issue
node scripts/debt-auto-capture.mjs --ledger /tmp/$SID-ledger.json \
  --reason blocked-by --blocked-by "owner/repo#123"

# planned for a follow-up PR
node scripts/debt-auto-capture.mjs --ledger /tmp/$SID-ledger.json \
  --reason deferred-followup --followup-pr "owner/repo#456"

# see what would be captured without writing
node scripts/debt-auto-capture.mjs --ledger /tmp/$SID-ledger.json --dry-run

`deferredReason`	Valid scope	Additional required fields
`out-of-scope`	out-of-scope	(none beyond rationale)
`blocked-by`	any	`blockedBy` (issue/PR/topicId ref)
`deferred-followup`	any	`followupPr` (e.g. `owner/repo#123`)
`accepted-permanent`	any	`approver` + `approvedAt`
`policy-exception`	any	`policyRef` + `approver`

node -e "
import { writeDebtEntries } from './scripts/lib/debt-ledger.mjs';
import { buildDebtEntry } from './scripts/lib/debt-capture.mjs';

const finding = { /* enriched finding with _hash, _primaryFile, _pass, affectedFiles, classification */ };
const { entry, sensitivity, redactions } = buildDebtEntry(finding, {
  deferredReason: 'out-of-scope',
  deferredRationale: 'pre-existing god-module concern, not in this phase scope — tracked for refactor pass',
  deferredRun: '$SID',
});

const result = await writeDebtEntries([entry]);
console.log(JSON.stringify({ inserted: result.inserted, updated: result.updated, rejected: result.rejected.length, sensitive: sensitivity.sensitive, redactions: redactions.length }));
" --input-type=module

═══════════════════════════════════════
  DEBT CAPTURE — Auto (Step 3.6)
  Deferred: 7 entries (reason: out-of-scope)
  Inserted: 5 | Updated: 2
  Sensitive (redacted): 1
  Total ledger: 23 entries
  Cloud sync: ok
═══════════════════════════════════════

═══════════════════════════════════════
  FIXING — 17 findings
  Auto-fixed: 3 (mechanical)
  Fixed per recommendation: 8
  Compromises: 2
  Skipped (LOW): 4
  Files modified: shared.mjs, openai-audit.mjs
═══════════════════════════════════════

═══════════════════════════════════════
  R2 POST-PROCESSING
  Kept: 2 | Suppressed: 11 | Reopened: 1
  Suppressed: a1b2c3 (0.82), 9f4d1e (0.78)...
═══════════════════════════════════════

═══════════════════════════════════════
  ROUND 2 → ROUND 3 (R2+ mode)
  H:0 M:2 L:1 | New: 0 | Suppressed: 11
  Stable: 1/2
═══════════════════════════════════════

═══════════════════════════════════════
  DEBT RESOLVED? — abc12345
  Category: [SYSTEMIC] God Module / Excessive File Size
  Files: scripts/openai-audit.mjs
  Reopened this round but no matching finding raised.
  Resolve? [y/N]
═══════════════════════════════════════

node scripts/debt-resolve.mjs abc12345 \
  --rationale "fixed in commit <hash> — <brief description>" \
  --run-id $SID

═══════════════════════════════════════
  CONVERGED — Round 4
  Final: H:0 M:2 L:1
  Rounds: 4 | Time: 14m | Cost: ~$1.20
  Files changed: 6
  Remaining (accepted): [M3], [M7]
═══════════════════════════════════════

{
  "code_files": ["src/foo.ts", "src/bar.ts"],
  "rounds": [...],
  ...
}

node scripts/gemini-review.mjs review <plan-file> /tmp/$SID-transcript.json \
  --mode plan \
  --out /tmp/$SID-gemini-result.json 2>/tmp/$SID-gemini-stderr.log

Verdict	Action
`APPROVE`	Done → final report
`CONCERNS`	Step 7.1: Deliberate → Fix → Gemini re-verify
`CONCERNS_REMAINING`	Step 7.1: Deliberate on unresolved items → author decides disputed ones → Gemini re-verify
`REJECT`	Present to user — needs human judgment (unambiguous missed bugs or bias, not just disputed findings)

node scripts/gemini-review.mjs review <plan-file> /tmp/$SID-transcript-v2.json \
  --mode plan \
  --out /tmp/$SID-gemini-result-v2.json 2>/tmp/$SID-gemini-stderr-v2.log

Environment	Skill Location	Notes
Claude Code	`.claude/skills/audit-loop/`	Native bash
VS Code Copilot	`.github/skills/audit-loop/`	Terminal tool
Cursor / Windsurf	`.github/skills/audit-loop/`	Terminal tool
Any AI + terminal	Direct script	`node scripts/openai-audit.mjs`

Finding Severity	Deliberation
HIGH rebut	ALWAYS send to GPT deliberation
MEDIUM rebut	ALWAYS send to GPT deliberation
LOW rebut	Claude decides locally

Self-Driving Audit Loop

Step 0 — Parse Mode and Validate

Self-Driving Audit Loop

Step 0 — Parse Mode and Validate

Step 1 — Plan Generation (PLAN_CYCLE / FULL_CYCLE only)

Step 2 — Run GPT-5.4 Audit

Working Directory — Verify First

Audit Scope — Choose Deliberately

Round 1 — Audit (scope-aware)

Phase 0 — Tool Pre-Pass (Phase C)

Round 2+ — R2+ mode with ledger, diff, and changed files

Session Context Cache (Round 2+)

CLI Flag Contract

Smart Pass Selection (Round 2+)

R2+ Automatic Behavior

Handle Results

Show Results

Step 3 — Triage (validity × scope × action)

Triage Rules

Finding Classification (existing)

Tiered Rebuttal (when action=rebut)

Convergence

Auto-Skip R2 for Small Code Diffs

PLAN audits: GPT R1 only — R2 is opt-in

PLAN audits: Early-Stop on Rigor Pressure

Step 3.5 — Update Adjudication Ledger

Step 3.6 — Debt Capture (Phase D)

Fast path — single command (preferred)

Required fields per deferredReason

Manual capture (when per-entry control is needed)

Status card (auto-capture output)

Execution Order

Step 4 — Fix Findings

Step 5 — Verify and Loop (R2+ Mode)

R2+ Post-Processing Report

Step 5.1 — Debt Resolution Prompt (Phase D)

Step 6 — Convergence Report (Pre-Final)

Step 7 — Gemini Independent Review (Final Gate)

Build Transcript

Run Review

Process Verdict

Step 7.1 — Deliberate on Gemini Findings (CONCERNS / CONCERNS_REMAINING)

Step 8 — Code Audit Transition (FULL_CYCLE only)

UX Rules

Key Principles

Compatibility

Coding Agent (bash-first)

Fix

Commit

Init

Github Copilot Upgrader

Rebuilding Flutter Tool