A disciplined agentic skill for autonomous task execution. Use this skill whenever an agent needs to: capture app screenshots using Playwright (installing it automatically if absent), deeply understand the full context and structure of any prompt before taking any action, track its own progress in a progress markdown file, operate in strict step-by-step fashion, and consult a mistakes log to learn from prior errors. Trigger this skill whenever the agent is doing any multi-step autonomous task, visual capture/screenshot workflow, UI automation, or self-improving agentic loop. All state files live in the `.codex/` folder.
A self-aware, self-correcting, step-by-step agentic execution framework. The agent captures app
screenshots via Playwright, tracks its own progress, and actively learns from its past mistakes —
all managed inside the .codex/ folder.
Before doing anything, the agent commits to these principles:
.codex/progress.md immediately after it happens..codex/mistakes.md
and apply its lessons proactively.All agent state lives under .codex/ in the project root:
.codex/
├── SKILL.md ← This file (the agent's operating manual)
├── progress.md ← Running log of what has been done and what is next
├── mistakes.md ← Catalogue of past errors and how to avoid them
└── screenshots/ ← Playwright-captured images, named by step and timestamp
On first run, the agent creates this folder and all files if they do not exist.
mkdir -p .codex/screenshots
touch .codex/progress.md .codex/mistakes.md
Rule: The agent must never write code, run commands, or make decisions until Phase 0 is complete.
Read the entire user prompt from beginning to end. Do not skim.
Extract and explicitly state:
If any ambiguity cannot be resolved from context alone, log it in progress.md under a
## Unresolved section and ask the user before proceeding.
Write a numbered list of steps — the full execution plan — into progress.md before starting
execution. Each step must have:
Read .codex/mistakes.md in full. For each past mistake, check whether the current task is at
risk of repeating it. If so, annotate the relevant step in the plan with a ⚠️ Watch: note.
Before running any screenshot capture, verify Playwright is available:
npx playwright --version 2>/dev/null || echo "NOT_FOUND"
If not found, install it:
npm install playwright
npx playwright install chromium
Log the install result to progress.md.
Confirm the target app or URL the agent needs to screenshot is accessible:
curl -o /dev/null -s -w "%{http_code}" <TARGET_URL>
If the app is a local server, check the process or start it before proceeding. Log status.
The agent executes the plan from progress.md one step at a time.
For every step:
progress.md:
- [ ] 🔄 Step N: <description> — started <timestamp>
- [x] ✅ Step N: <description> — completed <timestamp>
or
- [ ] ❌ Step N: <description> — FAILED <timestamp> — see mistakes.md
Screenshots are taken at every major state transition of the app:
// playwright-screenshot.js
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto(process.env.TARGET_URL || 'http://localhost:3000');
await page.waitForLoadState('networkidle');
const label = process.env.STEP_LABEL || 'step';
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
const filename = `.codex/screenshots/${label}_${timestamp}.png`;
await page.screenshot({ path: filename, fullPage: true });
console.log(`Screenshot saved: ${filename}`);
await browser.close();
})();
Run it as:
STEP_LABEL="step-03-login" TARGET_URL="http://localhost:3000/login" node .codex/playwright-screenshot.js
Log the screenshot path in progress.md under the relevant step.
If a later step seems simpler or more urgent, the agent resists the impulse to jump ahead. Dependencies between steps are real even when not explicit. Sequence matters.
Whenever a step fails, produces unexpected output, or requires a correction:
Append to .codex/mistakes.md:
## Mistake: <short title>
- **Date**: <ISO timestamp>
- **Step**: Step N — <step description>
- **What happened**: <factual description of what went wrong>
- **Root cause**: <why it happened — be specific>
- **Impact**: <what had to be redone or what was lost>
- **Fix applied**: <what the agent did to correct it>
- **Prevention rule**: <a concrete rule to avoid this in the future>
Correct the failed step. Re-run it. Re-take any affected screenshots. Update progress.md.
If the mistake revealed a flaw in the plan (missing step, wrong order, wrong assumption),
update the plan in progress.md and note the revision:
> ⚠️ Plan revised at <timestamp> due to: <reason>
progress.md is the single source of truth for where the agent is at any point in time.
# Task Progress
## Task
<One-line summary of the overall goal>
## Started
<ISO timestamp>
## Execution Plan
- [ ] Step 1: <action> → <success criterion>
- [ ] Step 2: <action> → <success criterion>
- [ ] Step 3: ...
## Log
### <ISO timestamp> — Phase 0 complete
Prompt fully parsed. N ambiguities found. N resolved. Execution plan written.
mistakes.md reviewed — N past mistakes noted. Watch notes applied to steps: N, N.
### <ISO timestamp> — Step 1 started
<Brief note>
### <ISO timestamp> — Step 1 complete
Screenshot: `.codex/screenshots/step-01_<ts>.png`
### <ISO timestamp> — Step 2 FAILED
See mistakes.md: "Mistake: <title>"
### <ISO timestamp> — Step 2 retried and complete
## Status
IN PROGRESS | COMPLETE | BLOCKED
## Unresolved
- <list any ambiguities waiting on user input>
mistakes.md is the agent's long-term memory of what not to do.
# Mistakes Log
> This file is read at the start of every task and before every major decision.
> Each entry is a permanent lesson. Do not delete entries.
## Mistake: <title>
- **Date**: ...
- **Step**: ...
- **What happened**: ...
- **Root cause**: ...
- **Impact**: ...
- **Fix applied**: ...
- **Prevention rule**: ...
---
Before declaring a task complete, the agent verifies:
progress.md are marked ✅progress.md status is updated to COMPLETE with a final timestampmistakes.md has entries for every failure that occurred during the taskUnresolved items remain in progress.mdThese are recurring failure modes. The agent checks this list before every decision:
| Anti-pattern | Correct behaviour |
|---|---|
| Acting before reading the full prompt | Always complete Phase 0 first |
| Assuming Playwright is installed | Always check; install if missing |
| Skipping progress logging | Log before AND after every step |
| Treating mistakes.md as write-only | Read it at start of task and after every failure |
| Taking one screenshot and calling it done | Screenshot every meaningful state change |
| Proceeding past ambiguity | Log it, ask the user, wait for resolution |
| Editing an earlier step silently | Note plan revisions explicitly in progress.md |
| Deleting old mistake entries | Never delete; only append |
1. mkdir -p .codex/screenshots && touch .codex/progress.md .codex/mistakes.md
2. Read .codex/mistakes.md in full
3. Read the full user prompt — do not skip a word
4. Decompose, plan, resolve ambiguities → write plan to progress.md
5. Check Playwright: `npx playwright --version` — install if missing
6. Execute plan one step at a time, logging every action
7. Screenshot the app at every key state
8. On any failure: log to mistakes.md, fix, retry, continue
9. Run the completion checklist
10. Report done with a summary of artifacts produced