Run Karpathy-style autoresearch optimization on any content. Generates 50+ variants, scores with a 5-expert simulated panel, evolves winners through multiple rounds, outputs optimized version + full experiment log. Use when optimizing landing pages, email sequences, ad copy, headlines, form pages, CTA text, or any conversion-focused content. Triggers on "optimize this page", "run autoresearch", "score these variants", "A/B test this copy".
Karpathy-style optimization loops for any conversion-focused content. No traffic needed. Simulated expert panel. Minutes, not weeks.
When to use this: Pre-launch content optimization. Generate 50+ variants, score with 5 simulated experts, evolve winners, output the best version + full experiment log.
When NOT to use this: Post-launch real-traffic A/B testing — that requires real analytics, not simulated scoring.
The sequence: Run autoresearch FIRST to hit 85+ simulated score. Then deploy. Then validate with real traffic.
Every run outputs 3 files:
| File | Purpose |
|---|---|
{name}-optimized.{ext} | The winning optimized content |
data/{name}-experiments.json | Full experiment log — all variants + all scores |
data/{name}-optimization-report.md |
| Human-readable summary with winner rationale |
Score every variant against all 5. Batch all variants into a single API call per round.
| # | Persona | Scoring Lens |
|---|---|---|
| 1 | CMO at a mid-market B2B company (50M+ revenue) | "Would this make me stop and engage?" |
| 2 | Skeptical founder | "Do I believe this? Would I trust this company?" |
| 3 | Conversion rate optimizer | "Is this clear, specific, and action-driving?" |
| 4 | Senior copywriter | "Is this compelling, differentiated, and well-crafted?" |
| 5 | Your CEO/founder | "Direct, ROI-obsessed, no BS. Would I put this on my site?" |
Customization: Replace persona #5 with your own CEO/founder voice. Define their priorities and communication style in a
references/founder-voice.mdfile.
Each judge scores 0–100. Final score = average across all 5 judges.
Round 1:
→ Generate 10 variants of the element
→ Batch-score all 10 with the 5-expert panel (1 API call)
→ Rank by average score
→ Keep top 3
Round 2 (Evolution):
→ Analyze what the top 3 did right
→ Generate 10 new variants that push those winning patterns further
→ Batch-score all 10 (1 API call)
→ Keep top 3
Round 3 (If score < threshold):
→ Identify weakest scoring dimension
→ Generate 10 variants optimized for that dimension
→ Batch-score → keep top 1
Multi-element cross-breeding:
→ Take top 1 winner from each element
→ Generate 5 combinations that mix winning elements
→ Score holistically as complete units
→ Output the single best combination
Stop condition: Top variant hits minimum score threshold (default: 80) OR 3 rounds complete.
Elements to optimize: Hero headline, subheadline, CTA text, problem section, social proof
Score dimensions:
first_impression — Does it grab immediately?clarity — Is the offer instantly understood?trust — Does it feel credible?urgency — Is there a reason to act now?would_convert — Would the judge actually click?Elements to optimize: Subject line, opening line, body copy, CTA, PS line
Score dimensions:
would_open — Subject line pass ratewould_read — Does the opening hook?would_click — Is the CTA compelling?would_reply — Does it feel personal enough to respond to?spam_risk — Does it feel spammy? (lower = better; invert for final score)Elements to optimize: Headline, description, CTA
Score dimensions:
scroll_stopping — Does it interrupt the scroll?clarity — Is the value prop clear in 3 seconds?click_worthiness — Does the judge want to click?relevance — Does it match likely audience intent?differentiation — Does it stand out from competitors?Elements to optimize: Headline, subtext, value prop bullets, button text, field order, thank-you copy
Score dimensions:
first_impression — Does it feel worth filling out?trust — Do they believe their info is safe and the offer is real?completion_likelihood — Would the judge start filling it out?lead_quality — Would this attract serious prospects (not tire-kickers)?would_fill_out — Final gut check: would they submit?Read the source content. Identify content type automatically or confirm with user:
Extract all optimizable elements. List them back to user:
Found 5 elements to optimize:
1. Hero headline: "We help B2B companies grow"
2. Subheadline: "Full-service digital marketing..."
3. CTA: "Get Started"
4. Problem statement: [excerpt]
5. Social proof: [excerpt]
Optimizing: all | Variants per round: 10 | Min score: 80
Check for Anthropic API key: $ANTHROPIC_API_KEY environment variable.
export ANTHROPIC_API_KEY="your-api-key-here"
For each element, run the round structure above.
Critical API efficiency rule: ALWAYS batch all variants into a single prompt. Never call the API once per variant. A round with 10 variants = 1 API call.
Model preference (in order):
claude-sonnet-4-5 (preferred — fast + smart)claude-opus-4 (if highest quality needed)After all elements have winners:
# Create output directory
mkdir -p data
# Write optimized content
# Write experiments JSON
# Write optimization report
Experiments JSON structure:
{
"run_id": "autoresearch-{name}-{timestamp}",
"content_type": "landing_page",
"source_file": "path/to/original",
"min_score_threshold": 80,
"rounds": [
{
"round": 1,
"element": "hero_headline",
"variants": [
{
"id": 1,
"text": "...",
"scores": {
"cmo": 72,
"skeptical_founder": 68,
"cro": 75,
"copywriter": 70,
"founder": 65
},
"avg_score": 70
}
],
"top_3": [1, 4, 7],
"winner_score": 82
}
],
"final_winner": {
"hero_headline": "...",
"subheadline": "...",
"cta": "...",
"holistic_score": 87
}
}
Summarize results to user:
| Option | Default | Description |
|---|---|---|
elements | all | Which elements to optimize |
variants_per_round | 10 | How many variants to generate per round |
min_score | 80 | Stop when this score is hit |
rounds | 3 | Max rounds before stopping |
auto_apply | false | Whether to overwrite the source file with winners |
content_type | auto-detect | Force a content type if auto-detect is wrong |