Name: Autoresearch
Author: zhengxuyu

Autoresearch | Skills Pool

Command	Required Context	If Missing → Ask
`$autoresearch`	Goal, Scope, Metric, Direction, Verify	Batch 1 (4 questions) + Batch 2 (3 questions) from Setup Phase below
`$autoresearch plan`	Goal	Ask via direct prompting per `references/plan-workflow.md`
`$autoresearch debug`	Issue/Symptom, Scope	4 batched questions per `references/debug-workflow.md`
`$autoresearch fix`	Target, Scope	4 batched questions per `references/fix-workflow.md`
`$autoresearch security`	Scope, Depth	3 batched questions per `references/security-workflow.md`
`$autoresearch ship`	What/Type, Mode	3 batched questions per `references/ship-workflow.md`
`$autoresearch scenario`	Scenario, Domain	4-8 adaptive questions per `references/scenario-workflow.md`
`$autoresearch predict`	Scope, Goal	3-4 batched questions per `references/predict-workflow.md`
`$autoresearch learn`	Mode, Scope	4 batched questions per `references/learn-workflow.md`
`$autoresearch reason`	Task, Domain	3-5 adaptive questions per `references/reason-workflow.md`

Subcommand	Purpose
`$autoresearch`	Run the autonomous loop (default)
`$autoresearch plan`	Interactive wizard to build Scope, Metric, Direction & Verify from a Goal
`$autoresearch security`	Autonomous security audit: STRIDE threat model + OWASP Top 10 + red-team (4 adversarial personas)
`$autoresearch ship`	Universal shipping workflow: ship code, content, marketing, sales, research, or anything
`$autoresearch debug`	Autonomous bug-hunting loop: scientific method + iterative investigation until codebase is clean
`$autoresearch fix`	Autonomous fix loop: iteratively repair errors (tests, types, lint, build) until zero remain
`$autoresearch scenario`	Scenario-driven use case generator: explore situations, edge cases, and derivative scenarios
`$autoresearch predict`	Multi-persona swarm prediction: pre-analyze code from multiple expert perspectives before acting
`$autoresearch learn`	Autonomous codebase documentation engine: scout, learn, generate/update docs with validation-fix loop
`$autoresearch reason`	Adversarial refinement for subjective domains: isolated multi-agent generate→critique→synthesize→blind judge loop until convergence

Flag	Purpose
`--diff`	Delta mode — only audit files changed since last audit
`--fix`	After audit, auto-fix confirmed Critical/High findings using autoresearch loop
`--fail-on {severity}`	Exit non-zero if findings meet threshold (for CI/CD gating)

# Unlimited — keep finding vulnerabilities until interrupted
$autoresearch security

# Bounded — exactly 10 security sweep iterations
$autoresearch security
Iterations: 10

# With focused scope
$autoresearch security
Scope: src/api/**/*.ts, src/middleware/**/*.ts
Focus: authentication and authorization flows

# Delta mode — only audit changed files since last audit
$autoresearch security --diff

# Auto-fix confirmed Critical/High findings after audit
$autoresearch security --fix
Iterations: 15

# CI/CD gate — fail pipeline if any Critical findings
$autoresearch security --fail-on critical
Iterations: 10

# Combined — delta audit + fix + gate
$autoresearch security --diff --fix --fail-on critical
Iterations: 15

Type	Example Ship Actions
`code-pr`	`gh pr create` with full description
`code-release`	Git tag + GitHub release
`deployment`	CI/CD trigger, `kubectl apply`, push to deploy branch
`content`	Publish via CMS, commit to content branch
`marketing-email`	Send via ESP (SendGrid, Mailchimp)
`marketing-campaign`	Activate ads, launch landing page
`sales`	Send proposal, share deck
`research`	Upload to repository, submit paper
`design`	Export assets, share with stakeholders

Flag	Purpose
`--dry-run`	Validate everything but don't actually ship (stop at Phase 5)
`--auto`	Auto-approve dry-run gate if no errors
`--force`	Skip non-critical checklist items (blockers still enforced)
`--rollback`	Undo the last ship action (if reversible)
`--monitor N`	Post-ship monitoring for N minutes
`--type <type>`	Override auto-detection with explicit shipment type
`--checklist-only`	Only generate and evaluate checklist (stop at Phase 3)

# Auto-detect and ship (interactive)
$autoresearch ship

# Ship code PR with auto-approve
$autoresearch ship --auto

# Dry-run a deployment before going live
$autoresearch ship --type deployment --dry-run

# Ship with post-deployment monitoring
$autoresearch ship --monitor 10

# Prepare iteratively then ship
$autoresearch ship
Iterations: 5

# Just check if something is ready to ship
$autoresearch ship --checklist-only

# Ship a blog post
$autoresearch ship
Target: content/blog/my-new-post.md
Type: content

# Ship a sales deck
$autoresearch ship --type sales
Target: decks/q1-proposal.pdf

# Rollback a bad deployment
$autoresearch ship --rollback

ship_score = (checklist_passing / checklist_total) * 80
           + (dry_run_passed ? 15 : 0)
           + (no_blockers ? 5 : 0)

Flag	Purpose
`--domain <type>`	Set domain (software, product, business, security, marketing)
`--depth <level>`	Exploration depth: shallow (10), standard (25), deep (50+)
`--scope <glob>`	Limit to specific files/features
`--format <type>`	Output: use-cases, user-stories, test-scenarios, threat-scenarios, mixed
`--focus <area>`	Prioritize dimension: edge-cases, failures, security, scale

# Unlimited — keep exploring until interrupted
$autoresearch scenario

# Bounded with context
$autoresearch scenario
Scenario: User attempts checkout with multiple payment methods
Domain: software
Depth: standard
Iterations: 25

# Quick edge case scan
$autoresearch scenario --depth shallow --focus edge-cases
Scenario: File upload feature for profile pictures

# Security-focused
$autoresearch scenario --domain security
Scenario: OAuth2 login flow with third-party providers
Iterations: 30

# Generate test scenarios
$autoresearch scenario --format test-scenarios --domain software
Scenario: REST API pagination with filtering and sorting

Flag	Purpose
`--chain <targets>`	Chain to tools. Single: `--chain debug`. Multi: `--chain scenario,debug,fix` (sequential)
`--personas N`	Number of personas (default: 5, range: 3-8)
`--rounds N`	Debate rounds (default: 2, range: 1-3)
`--depth <level>`	Depth preset: shallow (3 personas, 1 round), standard (5, 2), deep (8, 3)
`--adversarial`	Use adversarial persona set (Red Team, Blue Team, Insider, Supply Chain, Judge)
`--budget <N>`	Max total findings across all personas (default: 40)
`--fail-on <severity>`	Exit non-zero if findings at or above severity (for CI/CD)
`--scope <glob>`	Limit analysis to specific files

# Standard analysis
$autoresearch predict
Scope: src/**/*.ts
Goal: Find reliability issues

# Quick security scan
$autoresearch predict --depth shallow --chain security
Scope: src/api/**

# Deep analysis with adversarial debate
$autoresearch predict --depth deep --adversarial
Goal: Pre-deployment quality audit

# CI/CD gate
$autoresearch predict --fail-on critical --budget 20
Scope: src/**
Iterations: 1

# Chain to debug for hypothesis-driven investigation
$autoresearch predict --chain debug
Scope: src/auth/**
Goal: Investigate intermittent 500 errors

# Multi-chain: predict → scenario → debug → fix (sequential pipeline)
$autoresearch predict --chain scenario,debug,fix
Scope: src/**
Goal: Full quality pipeline for new feature

Mode	Purpose	Autoresearch Loop?
`init`	Learn codebase from scratch, generate all docs	Yes — validate-fix cycle
`update`	Learn what changed, refresh existing docs	Yes — validate-fix cycle
`check`	Read-only health/staleness assessment	No — diagnostic only
`summarize`	Quick codebase summary with file inventory	Minimal — size check only

Flag	Purpose
`--mode <mode>`	Operation: init, update, check, summarize (default: auto-detect)
`--scope <glob>`	Limit codebase learning to specific dirs
`--depth <level>`	Doc comprehensiveness: quick, standard, deep
`--scan`	Force fresh scout in summarize mode
`--topics <list>`	Focus summarize on specific topics
`--file <name>`	Selective update — target single doc
`--no-fix`	Skip validation-fix loop
`--format <fmt>`	Output format: markdown (default). Planned: confluence, rst, html

# Auto-detect mode and learn
$autoresearch learn

# Initialize docs for new project
$autoresearch learn --mode init --depth deep

# Update docs after changes
$autoresearch learn --mode update
Iterations: 3

# Read-only health check
$autoresearch learn --mode check

# Quick summary
$autoresearch learn --mode summarize --scan

# Selective update of one doc
$autoresearch learn --mode update --file system-architecture.md

# Scoped learning
$autoresearch learn --scope src/api/**
Iterations: 5

Flag	Purpose
`--iterations N`	Bounded mode — run exactly N rounds
`--judges N`	Judge count (3-7, odd preferred, default: 3)
`--convergence N`	Consecutive wins to converge (2-5, default: 3)
`--mode <mode>`	convergent (default), creative (no auto-stop), debate (no synthesis)
`--domain <type>`	Shape judge personas: software, product, business, security, research, content
`--chain <targets>`	Chain to tools. Single: `--chain debug`. Multi: `--chain scenario,debug,fix` (sequential)
`--judge-personas <list>`	Override default judge personas
`--no-synthesis`	Skip synthesis step (A vs B only, alias for `--mode debate`)

# Standard convergent refinement
$autoresearch reason
Task: Should we use event sourcing for our order management system?
Domain: software

# Bounded with custom judges
$autoresearch reason --judges 5 --iterations 10
Task: Write a compelling pitch for our Series A
Domain: business

# Creative mode — explore alternatives, no convergence stop
$autoresearch reason --mode creative --iterations 8
Task: Design the authentication architecture for a multi-tenant SaaS platform
Domain: software

# Chain to downstream tools after convergence
$autoresearch reason --chain scenario,debug,fix
Task: Propose a caching strategy for high-traffic API endpoints
Domain: software
Iterations: 6

# Debate mode — A vs B, no synthesis
$autoresearch reason --mode debate --judges 5
Task: Is microservices the right architecture for our 5-person startup?
Domain: software

# Multi-chain pipeline: reason → plan → fix
$autoresearch reason --chain plan,fix
Task: Design the database schema for our order management system
Domain: software
Iterations: 5

$autoresearch plan
Goal: Make the API respond faster

$autoresearch plan Increase test coverage to 95%

$autoresearch plan Reduce bundle size below 200KB

$autoresearch
Goal: Increase test coverage to 90%

$autoresearch
Goal: Increase test coverage to 90%
Iterations: 25

Scenario	Recommendation
Run overnight, review in morning	Unlimited + `Plateau-Patience: off`
Quick 30-min improvement session	`Iterations: 10`
Targeted fix with known scope	`Iterations: 5`
Exploratory — see if approach works	`Iterations: 15`
CI/CD pipeline integration	`--iterations N` flag (set N based on time budget)
Long run with safety net (default)	Unlimited (plateau detection after 15 iterations)

$autoresearch
Goal: Reduce bundle size below 200KB
Verify: npx esbuild src/index.ts --bundle --minify | wc -c
Plateau-Patience: 20

$autoresearch
Goal: Increase test coverage to 95%
Verify: npx jest --coverage 2>&1 | grep 'All files' | awk '{print $4}'
Guard: npx esbuild src/index.ts --bundle --minify | wc -c
Guard-Direction: lower is better
Guard-Threshold: 5%

Parameter	Required	Description
`Guard`	Yes	Command that outputs a number (metric-valued) or exits 0/1 (pass/fail)
`Guard-Direction`	Only for metric-valued	`higher is better` or `lower is better`
`Guard-Threshold`	Only for metric-valued	Max allowed regression as % of baseline (e.g., `5%`, `0%` for strict)

#	Header	Question	Options (smart defaults from codebase scan)
1	`Goal`	"What do you want to improve?"	"Test coverage (higher)", "Bundle size (lower)", "Performance (faster)", "Code quality (fewer errors)"
2	`Scope`	"Which files can autoresearch modify?"	Suggested globs from project structure (e.g. "src/*/.ts", "content/*/.md")
3	`Metric`	"What number tells you if it got better? (must be a command output, not subjective)"	Detected options: "coverage % (higher)", "bundle size KB (lower)", "error count (lower)", "test pass count (higher)"
4	`Direction`	"Higher or lower is better?"	"Higher is better", "Lower is better"

#	Header	Question	Options
5	`Verify`	"What command produces the metric? (I'll dry-run it to confirm)"	Suggested commands from detected tooling
6	`Guard`	"Any command that must ALWAYS pass? (prevents regressions)"	"npm test", "tsc --noEmit", "npm run build", "Skip — no guard"
7	`Launch`	"Ready to go?"	"Launch (unlimited)", "Launch with iteration limit", "Edit config", "Cancel"

LOOP (FOREVER or N times):
  1. Review: Read current state + git history + results log
  2. Ideate: Pick next change based on goal, past results, what hasn't been tried
  3. Modify: Make ONE focused change to in-scope files
  4. Commit: Git commit the change (before verification)
  5. Verify: Run the mechanical metric (tests, build, benchmark, etc.)
  6. Guard: If guard is set, run the guard command
  7. Decide:
     - IMPROVED + guard passed (or no guard) → Keep commit, log "keep", advance
     - IMPROVED + guard FAILED → Revert, then try to rework the optimization
       (max 2 attempts) so it improves the metric WITHOUT breaking the guard.
       Never modify guard/test files — adapt the implementation instead.
       If still failing → log "discard (guard failed)" and move on
     - SAME/WORSE → Git revert, log "discard"
     - CRASHED → Try to fix (max 3 attempts), else log "crash" and move on
  8. Log: Record result in results log
  9. Repeat: Go to step 1.
     - If unbounded: NEVER STOP. NEVER ASK "should I continue?"
     - If bounded (N): Stop after N iterations, print final summary

Domain	Metric	Scope	Verify Command	Guard
Backend code	Tests pass + coverage %	`src/*/.ts`	`npm test`	—
Frontend UI	Lighthouse score	`src/components/**`	`npx lighthouse`	`npm test`
ML training	val_bpb / loss	`train.py`	`uv run train.py`	—
Blog/content	Word count + readability	`content/*.md`	Custom script	—
Performance	Benchmark time (ms)	Target files	`npm run bench`	`npm test`
Refactoring	Tests pass + LOC reduced	Target module	`npm test && wc -l`	`npm run typecheck`
Security	OWASP + STRIDE coverage + findings	API/auth/middleware	`$autoresearch security`	—
Shipping	Checklist pass rate (%)	Any artifact	`$autoresearch ship`	Domain-specific
Debugging	Bugs found + coverage	Target files	`$autoresearch debug`	—
Fixing	Error count (lower)	Target files	`$autoresearch fix`	`npm test`
Scenario analysis	Scenario coverage score (higher)	Feature/domain files	`$autoresearch scenario`	—
Scenarios	Use cases + edge cases + dimension coverage	Target feature/files	`$autoresearch scenario`	—
Prediction	Findings + hypotheses (higher)	Target files	`$autoresearch predict`	—
Documentation	Validation pass rate (higher)	`docs/*.md`	`$autoresearch learn`	`npm test`
Subjective refinement	Judge consensus + convergence (higher)	Any subjective content	`$autoresearch reason`	—

Autoresearch

Codex Autoresearch — Autonomous Goal-directed Iteration

MANDATORY: Interactive Setup Gate

Autoresearch

Codex Autoresearch — Autonomous Goal-directed Iteration

MANDATORY: Interactive Setup Gate

Subcommands

$autoresearch security — Autonomous Security Audit

$autoresearch ship — Universal Shipping Workflow

$autoresearch scenario — Scenario-Driven Use Case Generator

$autoresearch predict — Multi-Persona Swarm Prediction

$autoresearch learn — Autonomous Codebase Documentation Engine

$autoresearch reason — Adversarial Refinement for Subjective Domains

$autoresearch plan — Goal → Configuration Wizard

When to Activate

Bounded Iterations

When to Use Bounded Iterations

Plateau Detection

Metric-Valued Guards

Setup Phase (Do Once)

Interactive Setup (when invoked without full config)

Setup Steps (after config is complete)

The Loop

Critical Rules

Principles Reference

Adapting to Different Domains

Post-Completion: Support Prompt (Once Per Project)

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api