Launch Plamen Web3 security audit pipeline
/plamen [light|core|thorough] [path/to/project]
When invoked, follow this orchestration sequence.
Parse $ARGUMENTS:
MODE accordingly (default: core).PROJECT_ROOT to that path. Otherwise use cwd.docs: followed by a path, set DOCS_PATH.scope: followed by a path, set SCOPE_FILE.notes: followed by text, set SCOPE_NOTES.If MODE was not specified in arguments:
If a path was not specified, use cwd and confirm: "Target: {cwd} -- correct? [y/n]" If the user answers "n", ask for the correct path before proceeding.
Detect the project's smart contract language by scanning PROJECT_ROOT:
| Detection | Language |
|---|---|
foundry.toml or .sol files | evm |
Anchor.toml or programs/ with .rs | solana |
Move.toml with [addresses] + aptos deps | aptos |
Move.toml with sui deps | sui |
Cargo.toml with soroban-sdk | soroban |
Set LANGUAGE to the detected value. This resolves all {LANGUAGE} placeholders
in file paths throughout the pipeline.
Detect the host shell. This determines how you run commands throughout the audit:
# PowerShell (Windows)
$IS_WINDOWS = $true
$PY = "python"
# Bash (macOS/Linux)
IS_WINDOWS=false
PY="python3"
Use AGENTS.md Platform Awareness table for all shell commands in this skill.
Never use grep, rg, find, wc, cat, fc raw on Windows — translate
to PowerShell equivalents.
# PowerShell
New-Item -ItemType Directory -Force "{PROJECT_ROOT}/.scratchpad" | Out-Null
# Bash
mkdir -p "{PROJECT_ROOT}/.scratchpad"
Set SCRATCHPAD = {PROJECT_ROOT}/.scratchpad.
# Check if target is a git repo — skip git steps if not
git rev-parse --is-inside-work-tree 2>$null
$IS_GIT = ($LASTEXITCODE -eq 0)
If $IS_GIT is false, skip ALL git commands (log, rev-list, blame, diff) throughout the audit.
Use file-system analysis only.
# PowerShell
& $PY ~/.codex/plamen/hooks/phase_gate.py --init "$SCRATCHPAD" $MODE "$PROJECT_ROOT"
# Bash
$PY ~/.codex/plamen/hooks/phase_gate.py --init "$SCRATCHPAD" "$MODE" "$PROJECT_ROOT"
Read ~/.codex/plamen/hooks/phase_manifest.json for the phase ordering and
artifact requirements. Execute phases in order, checking gates between phases.
Do NOT spawn a single monolithic recon agent. Split into 4 parallel agents for timeout isolation (confirmed failure on large projects with single agent).
Read the full recon prompt structure from:
~/.codex/plamen/prompts/{LANGUAGE}/phase1-recon-prompt.md
Agent 1A: RAG Meta-Buffer (FIRE-AND-FORGET)
spawn_agent and do NOT wait for completionmeta_buffer.mdmeta_buffer.md with ## RAG: UNAVAILABLE - agent timed outAgent 1B: Docs + External + Fork (foreground)
~/.codex/agents/recon.toml (with task subset)design_context.md, external_production_behavior.mdAgent 2: Build + Static + Tests (foreground)
~/.codex/agents/recon.toml (with task subset)build_status.md, function_list.md, call_graph.md,
state_variables.md, modifiers.md, event_definitions.md,
external_interfaces.md, static_analysis.md, test_results.mdAgent 3: Patterns + Surface + Templates (foreground)
~/.codex/agents/recon.toml (with task subset)contract_inventory.md, attack_surface.md, detected_patterns.md,
setter_list.md, emit_list.md, constraint_variables.md,
template_recommendations.mdWait for Agents 1B, 2, 3 to complete. Check Agent 1A status:
meta_buffer.md outputmeta_buffer.md and proceed
Then write recon_summary.md (orchestrator, not an agent).
Verify all required artifacts exist per phase_manifest.json.Read {SCRATCHPAD}/template_recommendations.md for agent count and scope split.
Spawn breadth agents in batches of max 6 (from ~/.codex/agents/breadth.toml):
{N} and scope assignmentanalysis_*.md files existSpawn the inventory agent (from ~/.codex/agents/inventory.toml):
analysis_*.md filesfindings_inventory.mdIf MODE is thorough:
~/.codex/plamen/rules/phase3b-rescan-prompt.md for re-scan methodologyrescan agents (from ~/.codex/agents/rescan.toml) with exclusion listper-contract agents (from ~/.codex/agents/per-contract.toml),
one per contract clusterIf MODE is core or thorough:
semantic-invariant agent (from ~/.codex/agents/semantic-invariant.toml)semantic_invariants.mdSpawn in 2 batches to respect the 8-thread limit:
Batch 1 (4 agents): Spawn depth agents from their respective TOML roles:
depth-token-flow.tomldepth-state-trace.tomldepth-edge-case.tomldepth-external.toml
Wait for all 4 to complete.Batch 2 (up to 6 agents): Spawn scanners + niche agents:
scanner.tomlniche-agent.toml
Wait for all to complete.For Thorough mode:
scoring.toml agentrag-sweep.toml agent
Read ~/.codex/plamen/rules/phase4-confidence-scoring.md for the full process.Spawn chain-analyzer agents sequentially:
Read ~/.codex/plamen/rules/phase4c-chain-prompt.md for prompts.
Spawn verifier agents in batches of 6 for each hypothesis batch:
~/.codex/plamen/rules/phase5-poc-execution.md for PoC rulesSpawn report agents sequentially per ~/.codex/plamen/rules/phase6-report-prompts.md:
report-index.toml agent (1 agent -- assigns clean report IDs, tier assignments). Wait for completion.report-tier-writer.toml agents (Critical+High, Medium, Low+Info). Wait for all 3.report-assembler.toml agent (1 agent -- combines into AUDIT_REPORT.md). Wait for completion.Between each phase, verify required artifacts exist:
python3 ~/.codex/plamen/hooks/phase_gate.py --stop
If artifacts are missing, the gate will block. Complete the current phase before proceeding.
Not all Claude pipeline features have full Codex parity yet. This table shows what is supported, what is experimental, and what is not yet implemented.
| Phase | Light | Core | Thorough | Notes |
|---|---|---|---|---|
| Recon (4-agent split) | Supported | Supported | Supported | |
| Breadth | Supported | Supported | Supported | |
| Inventory | Supported | Supported | Supported | |
| Re-scan (3b) | N/A | N/A | Experimental | Convergence not validated on Codex |
| Per-contract (3c) | N/A | N/A | Experimental | Clustering logic untested |
| Semantic Invariants | N/A | Supported | Supported | |
| Depth Loop iter 1 | Supported | Supported | Supported | |
| Depth Loop iter 2-3 | N/A | N/A | Experimental | DA role + anti-dilution untested |
| Niche Agents | N/A | Supported | Supported | |
| Confidence Scoring | N/A | Supported | Experimental | 4-axis scoring untested |
| RAG Sweep | N/A | Supported | Supported | Fallback chain may differ |
| Chain Analysis | Supported | Supported | Supported | |
| Verification + PoC | Supported | Supported | Experimental | No fuzz variant support |
| Skeptic-Judge | N/A | N/A | Not implemented | Requires Claude pipeline feature |
| Invariant Fuzz | N/A | N/A | Not implemented | Foundry-specific, needs adaptation |
| Medusa Fuzz | N/A | N/A | Not implemented | Parallel campaign, needs adaptation |
| Design Stress Test | N/A | N/A | Experimental | 1 agent slot, untested |
| Finding Perturbation | N/A | N/A | Not implemented | |
| Report (multi-agent) | Supported | Supported | Supported |
| Step | Light | Core | Thorough |
|---|---|---|---|
| Re-scan (3b/3c) | Skip | Skip | Full |
| Semantic invariants | Skip | Yes | Yes |
| Depth iterations | 1 | 1 | Up to 3 |
| Confidence scoring | Skip | 2-axis | 4-axis |
| Niche agents | Skip | Flag-triggered | Flag-triggered |
| RAG sweep | Skip | 1 agent | 1 agent |
| Verification scope | Chains + Medium+ | Chains + Medium+ | ALL severities |