E-SPL: Evolutionary System Prompt Learning
Source: arXiv:2602.14697v3 (February 2026) Utility: 0.90 Authors: Lunjun Zhang, Ryan Chen, Bradly C. Stadie
E-SPL is a method for jointly improving LLM contexts (system prompts) and weights via RL and evolutionary optimization. It enables agentic systems to self-improve from experience.
Core Innovation: Simultaneous optimization of declarative knowledge (in prompts) and procedural knowledge (in weights).
llm - For prompt mutation and crossover generationrl_framework - PPO, GRPO, or other RL algorithms for weight updatesevaluator - Task performance measurementsqlite-kg - Knowledge graph for storing prompt evolution historyUser: 我的推理 agent 在数学问题上表现不稳定,如何改进?
Agent: 使用 E-SPL 方法优化:
初始化 5 个系统提示变体:
每个 RL iteration:
结果:数学推理成功率从 38% 提升到 45%
User: 我的 agent 需要处理多种任务类型,如何优化提示?
Agent: E-SPL 自然分离知识类型:
进化过程会自动产生针对不同任务类型的提示变体。
Each RL iteration:
Mutation: LLM self-reflection generates prompt variations
Input: Current prompt + performance feedback
Output: Modified prompt with specific improvements
Crossover: Combine successful prompt components
Input: Two high-performing prompts
Output: Hybrid prompt merging strengths
prompt_population = [
"Base system prompt",
"Reflective prompt variant",
"Task-specific prompt variant",
# ... N prompts
]
for prompt in prompt_population:
trajectories = sample_trajectories(agent, prompt, env)
performance[prompt] = evaluate(trajectories)
# Standard RL update (PPO, GRPO, etc.)
weights = rl_update(weights, all_trajectories)
# Mutation
mutated_prompts = llm_reflect(prompt_population, performance)
# Crossover
crossbred_prompts = crossover(top_prompts)
# Selection
new_population = select(mutated_prompts + crossbred_prompts, performance)
| Task | Baseline | E-SPL | Improvement |
|---|---|---|---|
| AIME → BeyondAIME | 38.8% | 45.1% | +6.3% |
| Reflective Evolution | 40.0% | 45.1% | +5.1% |
| Reasoning tasks | - | ↑ | Consistent gains |
Key Finding: RL and prompt evolution are deeply synergistic.
E-SPL aligns with AGENTS.md self-evolution principles:
declarative-self-improvement - Self-evolution principlesprompt-optimization - Prompt engineering techniquesmeta-cognitive-reflection - Reflection-driven improvement38:["$","$L3f",null,{"content":"$40","frontMatter":{"name":"espl-evolutionary-system-prompt","description":"E-SPL: Evolutionary System Prompt Learning"}}]