Manages persistent research memory across ideation and experimentation cycles. Maintains two stores: Ideation Memory M_I (feasible/unsuccessful directions) and Experimentation Memory M_E (reusable strategies for data processing, model training, architecture, debugging). Three evolution mechanisms: IDE (after idea-tournament), IVE (after experiment failure — classifies failures as implementation vs fundamental), ESE (after experiment success — extracts reusable strategies). Use when: updating memory after completing idea tournaments or experiment pipelines, classifying why a method failed (implementation vs fundamental failure), starting a new research cycle needing prior knowledge, user mentions 'update memory', 'classify failure', 'what worked before', 'research history', 'evolution'. Do NOT use for running experiments (use experiment-pipeline), debugging experiment code (use experiment-craft), or generating ideas (use idea-tournament).
A persistent learning layer that accumulates research knowledge across ideation and experimentation cycles. Maintains two memory stores and implements three evolution mechanisms that feed learned patterns back into future research.
idea-tournament and needs to update Ideation Memoryexperiment-pipeline and needs to update memoryResearch is iterative. Each cycle — from ideation through experimentation — generates knowledge that should inform the next cycle. Without persistent memory, every new project starts from scratch, repeating mistakes and rediscovering patterns.
Evo-memory solves this by maintaining two structured memory stores and three evolution mechanisms that extract, classify, and inject knowledge across cycles.
Location: /memory/ideation-memory.md
Records what you've learned about research DIRECTIONS — which areas are promising and which are dead ends.
Two sections:
| Section | What It Contains | Example Entry |
|---|---|---|
| Feasible Directions | Directions that showed promise in prior cycles | "Contrastive learning for few-shot classification — confirmed feasible, top-3 in tournament cycle 2" |
| Unsuccessful Directions | Directions that were tried and failed, with failure classification | "Autoregressive generation for real-time video — fundamental failure: latency constraint incompatible with autoregressive decoding" |
Each entry records: Direction name, one-sentence summary, evidence (which cycle, what results), classification (feasible / implementation failure / fundamental failure), date.
How it's used: idea-tournament reads M_I at the start of Phase 1. The paper uses embedding-based retrieval with cosine similarity, selecting the top-k_I most similar items (k_I=2 in experiments). Feasible directions from prior cycles can seed new tree branches. Unsuccessful directions are used during pruning — fundamental failures are pruned; implementation failures may be retried.
See assets/ideation-memory-template.md for the template.
Location: /memory/experiment-memory.md
Records what you've learned about research STRATEGIES — which technical approaches and configurations work in practice.
The paper defines M_E as storing "reusable data processing and model training strategies." ESE jointly summarizes (i) a data processing strategy and (ii) a model training strategy. We extend this with two additional practical sections (architecture and debugging) for comprehensive coverage.
Two core sections (from paper) + two practical extensions:
| Section | Source | What It Contains | Example Entry |
|---|---|---|---|
| Data Processing Strategies | Paper (core) | Preprocessing, augmentation, and data handling patterns | "For noisy sensor data: median filter before normalization reduces training instability by ~40%" |
| Model Training Strategies | Paper (core) | Hyperparameters, training tricks, and training schedules | "Learning rate warmup for 10% of steps prevents early divergence in transformer fine-tuning" |
| Architecture Strategies | Extension | Design choices, module configurations, and structural patterns | "Residual connections are critical for modules inserted deeper than 10 layers in transformers" |
| Debugging Strategies | Extension | Diagnostic patterns that resolved experiment failures | "When loss plateaus after 50% of training: check gradient norm — clipping threshold may be too aggressive" |
Each entry records: Strategy name, context (when to use this), evidence (which cycle, what results), generality (domain-specific or broadly applicable), date.
How it's used: experiment-pipeline reads M_E at the start of each cycle. The paper uses embedding-based retrieval with cosine similarity, selecting the top-k_E most similar items (k_E=1 in experiments). Relevant strategies from prior cycles inform hyperparameter choices, data processing decisions, and debugging approaches, reducing the number of attempts needed.
See assets/experiment-memory-template.md for the template.
Trigger: After idea-tournament completes (Phase 3 direction summary is available).
Purpose: Extract promising research directions from the tournament results and store them in M_I for future cycles.
Paper Prompt: Use the IDE prompt from references/paper-prompts.md as the primary extraction mechanism. Fill in {user_goal} from the original research direction and {top_ranked_ideas} from /direction-summary.md, then reason through the prompt step by step. The output (DIRECTION SUMMARY with Title, Core idea, Why promising, Requirements, Validation plan) feeds directly into the steps below.
Process:
/memory/ideation-memory.mdKey principle: Store directions, not ideas. A direction like "contrastive learning for structured data" can spawn many specific ideas across future cycles. A specific idea like "SimCLR with graph augmentations on molecular datasets" is too narrow to be reusable.
See references/ide-protocol.md for the full process.
Trigger (two conditions, following the paper):
Purpose: Classify WHY the method failed and update M_I accordingly. This is the most critical evolution mechanism because it prevents future cycles from repeating dead-end directions.
Paper Prompt: Use the IVE prompt from references/paper-prompts.md as the primary classification mechanism. Fill in {research_proposal} from /research-proposal.md and {execution_report} from the stage trajectory logs, then reason through the prompt step by step. The prompt classifies the failure as FAILED(NoExecutableWithinBudget), FAILED(WorseThanBaseline), or NOT_FAILED.
After running the paper prompt:
Five diagnostic questions (for WorseThanBaseline cases):
If 3+ answers point to one type, classify as that type. If split, classify as implementation failure (more conservative — allows retry).
Retry escalation rule: If a direction has been classified as "implementation failure" 3 times across different cycles, escalate to a careful re-evaluation — three separate implementation failures may indicate the direction is harder than it appears. Consider reclassifying as fundamental.
See references/ive-protocol.md for the full process and worked examples.
Trigger: After experiment-pipeline succeeds — all 4 stages complete and gates met.
Purpose: Distill reusable strategies from the successful experiment run and store them in M_E for future cycles.
Paper Prompt: Use the ESE prompt from references/paper-prompts.md as the primary extraction mechanism. Fill in {research_proposal} from /research-proposal.md and {trajectories} from all 4 stage trajectory logs, then reason through the prompt step by step. The prompt outputs DATA SUMMARY and MODEL SUMMARY, which map to our Data Processing Strategies and Model Training Strategies sections.
Process:
Generalization guidelines: A strategy is broadly applicable if it addresses a general challenge (training instability, overfitting, slow convergence) rather than a domain-specific characteristic. When in doubt, record the context alongside the strategy and let future users judge applicability.
See references/ese-protocol.md for the full process.
When starting a new research cycle (loading idea-tournament or experiment-pipeline):
/memory/ideation-memory.md and /memory/experiment-memory.mdidea-tournament: Use M_I feasible directions to seed tree branches. Use M_I unsuccessful directions (fundamental failures only) during pruning.experiment-pipeline: Use M_E strategies to inform hyperparameter ranges, training schedules, and debugging approaches.Don't blindly apply old strategies. Context matters. A strategy that worked for image classification may not work for text generation. Always check the recorded context against the current problem.
Retrieval method: The paper uses embedding-based cosine similarity for retrieval. In practice, perform this semantic comparison by reading each entry's Summary/Context and Retrieval Tags, then judging relevance to the current goal. If automated embedding tools are available in your environment, use those instead for larger memory stores.
/memory/ideation-memory.mdFailure Classification: Fundamental: flag for pruning. Example injection: "Prior cycle confirmed 'Autoregressive real-time video generation' is a fundamental failure (O(n) latency). Prune any tree branch matching this pattern."/memory/experiment-memory.mdPeriodically review both memory stores and remove or archive entries that are no longer relevant:
Each memory file maintains a Last Updated field and a cycle counter. When entries are modified (not just appended), note what changed in the evolution report. This creates an audit trail of how your research knowledge evolves.
After each evolution mechanism triggers, generate a report saved to /memory/evolution-reports/cycle_N_type.md:
See assets/evolution-report-template.md for the template.
Prioritize these rules when updating and using memory:
Abstract before storing: Store directions and strategies, not specific experiment details. "Contrastive learning improves few-shot classification" is reusable across many projects; "set lr=0.001 for ResNet-50 on CIFAR-10" is not. The goal is transferable knowledge, not a lab notebook.
Failed directions are more valuable than successful ones: Knowing what NOT to try saves more time than knowing what worked. Success stories are published in papers — everyone can access them. Failure stories are rarely shared, making your failure memory a unique competitive advantage.
Implementation failures are not direction failures: The most common evolution mistake is marking a good direction as failed because the implementation was buggy. IVE exists specifically to make this distinction. When in doubt, classify as implementation failure — it's cheaper to retry a good idea than to permanently discard it.
Memory decays without pruning: A strategy that worked 10 cycles ago on different data may no longer be relevant. Accumulating stale entries adds noise that makes it harder to find useful strategies. Prune actively — a smaller, curated memory is more valuable than a large, noisy one.
Cross-pollination beats deep specialization: Strategies from M_E in one domain often transfer to another. Learning rate warmup helps in NLP AND vision AND speech. Review the full M_E before starting a new experiment pipeline, not just domain-specific entries.
The evolution report is for humans: Write reports that a researcher — not just an AI agent — can understand and act on. Include enough context that someone reading the report 6 months later understands WHY the change was made, not just WHAT changed.
How evo-memory connects to other skills in the pipeline:
| Trigger | Source Skill | Mechanism | Memory Updated |
|---|---|---|---|
| Tournament completed | idea-tournament | IDE | M_I (feasible directions) |
| No executable code within budget, or method underperforms baseline | experiment-pipeline | IVE | M_I (unsuccessful directions) |
| Pipeline succeeded | experiment-pipeline | ESE | M_E (data processing + model training; optionally architecture + debugging) |
| New cycle starts | idea-tournament | Read (top-k_I=2) | M_I read for seeding/pruning |
| New cycle starts | experiment-pipeline | Read (top-k_E=1) | M_E read for strategy guidance |
| Topic | Reference File | When to Use |
|---|---|---|
| IDE process details | ide-protocol.md | After completing idea-tournament |
| IVE process details | ive-protocol.md | After experiment-pipeline failure (no executable code or method underperforms) |
| ESE process details | ese-protocol.md | After experiment-pipeline succeeds |
| Paper's actual prompts | paper-prompts.md | Reference for exact IDE/IVE/ESE prompt design |
| Memory data structures | memory-schema.md | Understanding M_I and M_E formats |
| Ideation memory template | ideation-memory-template.md | Initializing M_I |
| Experiment memory template | experiment-memory-template.md | Initializing M_E |
| Evolution report template | evolution-report-template.md | Documenting memory updates |