Hypothesis-driven research workflow agent for AI and scientific research with two explicit roles: Clawbot Executor (execution) and Research Advisor (heartbeat check-ins). Starts with literature survey, builds hypothesis tree, evaluates via judgment gate, executes experiments, and reflects in a research loop. Trigger words: "research", "hypothesis", "literature survey", "experiment", "write paper", "meta-research", "clawbot", "advisor review", "heartbeat".
id: "H1" statement: "Testable claim" parent: null motivation: "Why worth testing" status: pending judgment: { novelty , importance , feasibility , verdict } experiment: { design_summary , protocol_path , status } results: { summary , outcome , key_metrics , artifacts_path } children: [ "H1.1" , "H1.2" ] 2. research-log.md — Timeline of Exploration Chronological entries with date, phase, and 2-4 sentence summaries. See templates/research-log.md for format and examples.
| # | Date | Phase | Summary |
|---|---|---|---|
| 1 | 2026-02-28 | Literature Survey | Searched 4 databases... |
| 2 | 2026-03-01 | Hypothesis Gen | Generated 8 candidates... |
| User Project Directory Structure | |||
| project/ | |||
| ├── research-tree.yaml # Hypothesis hierarchy (central data structure) | |||
| ├── research-log.md # Chronological exploration timeline | |||
| ├── literature/ | |||
| │ ├── survey.md # Search protocol, screening, evidence map | |||
| │ ├── evidence-map.md # Detailed evidence synthesis | |||
| │ └── references.bib # Bibliography | |||
| ├── experiments/ | |||
| │ ├── H1-scaling-hypothesis/ | |||
| │ │ ├── protocol.md # Locked experiment protocol | |||
| │ │ ├── src/ # Experiment code | |||
| │ │ ├── results/ # Raw results and metrics | |||
| │ │ └── analysis.md # Consolidated analysis | |||
| │ └── H2-alternative-approach/ | |||
| └── drafts/ |
├── paper.md # Paper draft
└── figures/ # Publication-ready figures
Research Workflow State Machine The workflow has 6 phases (+ Writing as an optional exit). The core innovation is the research loop : after experiments, reflection decides whether to continue or conclude. Literature Survey → Hypothesis Generation → Judgment Gate → Experiment Design → Experiment Execution → Reflection ^ ^ | | | | +--------------------+------------------------------------------------------------────────────────+ (loop) Reflection → Writing (when concluding) Phase Purpose Detail File Literature Survey Understand SOTA, identify gaps, open problems, underexplored areas phases/literature-survey.md Hypothesis Generation Generate broad testable hypotheses, maintain tree in YAML phases/hypothesis-generation.md Judgment Gate Evaluate: novel? important? feasible? falsifiable? already solved? phases/judgment.md Experiment Design Rigorous per-hypothesis protocol phases/experiment-design.md Experiment Execution Run experiments, track results, update tree phases/experiment-execution.md Reflection Analyze results, decide: go deeper, go broader, pivot, or conclude phases/reflection.md Writing (Optional exit) Draft paper, prepare artifacts. Study 2-3 top related papers to learn their format, style, section structure, and experimental setup as a template before drafting. phases/writing.md Transition Rules (when to loop back) Current Phase Go back to... Trigger condition Hypothesis Gen Literature Survey Need more context to generate good hypotheses Judgment Hypothesis Gen All hypotheses rejected — need new candidates Judgment Literature Survey Uncertain about novelty — need targeted search Experiment Design Literature Survey Missing baseline or dataset discovered Experiment Execution Experiment Design Pipeline bugs, data leakage, protocol issues Experiment Execution Literature Survey New related work invalidates assumptions Reflection Hypothesis Gen Go deeper (sub-hypotheses) or go broader (new roots) Reflection Literature Survey Pivot — need to reassess the field Reflection Writing Conclude — sufficient evidence for a contribution Writing Reflection Missing evidence discovered during writing Writing Experiment Design Reviewer requests new experiments When transitioning back : log the reason in the research log, update the research tree, and carry forward any reusable artifacts. How to Operate On invocation Determine role first : Use Research Advisor (Heartbeat) role for heartbeat check-ins or advisor-review invocations. Otherwise default to Clawbot Executor role. If role is Research Advisor (Heartbeat) : jump to Research Advisor Check-in Protocol (Heartbeat Role) , complete advisor review, and stop unless the user explicitly asks to execute work immediately. For Clawbot Executor role, always start with the literature survey unless the user explicitly says they have already completed one. Do NOT skip to hypothesis generation without understanding the field first. Check for existing artifacts : look for research-tree.yaml and research-log.md in the project root. If they exist, read them to understand the current state and resume from the appropriate phase. If no artifacts exist : initialize both files: Create research-tree.yaml from templates/research-tree.yaml Create research-log.md with the header format from templates/research-log.md Load the relevant phase file for detailed instructions: phases/literature-survey.md — Search, screen, synthesize, identify gaps phases/hypothesis-generation.md — Generate and organize hypotheses phases/ideation-frameworks.md — 12 cognitive frameworks for idea generation (loaded during hypothesis generation) phases/judgment.md — Evaluate hypotheses before investing phases/experiment-design.md — Protocol, data, controls phases/experiment-execution.md — Run, analyze, determine outcomes phases/reflection.md — Strategic decisions and looping phases/writing.md — Reporting, dissemination, artifacts Create a task list for the current phase using TaskCreate, so the user sees progress. Per-phase protocol (Clawbot Executor role) For EVERY phase, follow this loop: ENTER PHASE ├─ Log entry: "Entering [phase] because [reason]" ├─ Read the phase detail file for specific instructions ├─ Execute phase tasks (with user checkpoints at key decisions) ├─ Produce phase outputs → save to appropriate location ├─ Update research tree with new information ├─ Run exit criteria check: │ ├─ PASS → log completion, advance to next phase │ └─ FAIL → identify blocker, decide: │ ├─ Fix within phase → iterate │ └─ Requires earlier phase → log reason, transition back └─ Update research log with summary Exit criteria per phase Phase Exit Artifact Exit Condition Literature Survey Evidence map + open problems + underexplored areas Field understanding populated in research tree Hypothesis Gen Hypothesis tree with testable statements At least 5 hypotheses in tree, all pass two-sentence test Judgment Evaluated hypotheses with verdicts At least one hypothesis approved Experiment Design Locked protocol per hypothesis Protocol reviewed; no known leakage or confounders Experiment Execution Results + outcome per hypothesis Primary claim determined with pre-specified evidence Reflection Strategic decision (deeper/broader/pivot/conclude) Decision is justified and logged Writing Draft with methods, results, limitations, artifacts Reproducibility checklist passes Git Commit Timing Create a git commit at these four points in the research loop. The protocol lock must be committed before results exist — this ordering is your lightweight pre-registration.
When Message Pattern 1 After hypotheses/reflection and experiment plan are generated research(plan): hypotheses + locked protocol for H[N] 2 After experiment code is generated research(code): experiment implementation for H[N] 3 After experiment results are generated research(results): outcomes for H[N] — [supported/refuted/inconclusive] 4 After writing is finished research(writing): complete draft — [title] Rule : commit #1 and commit #3 must never be combined. The git history must prove the experiment plan existed before the results. On loop iterations (reflection → new hypotheses → new experiments), repeat commits 1-3 for each loop. Tag submission-v[N] on commit #4. Bias Mitigation (Active Throughout) These are not phase-specific — enforce them continuously: Separate exploratory vs confirmatory : label every analysis as one or the other Constrain degrees of freedom early : lock primary metric, dataset, baseline before large-scale runs Reward null results : negative findings are logged as valid milestones, not failures Pre-commit before scaling : write down the analysis plan before running big experiments Multiple comparisons awareness : if testing N models x M datasets x K metrics, acknowledge the multiplicity and use corrections or frame as exploratory Quick Reference: Templates Load these templates when needed during the relevant phase: templates/research-tree.yaml — Hypothesis tree starter template templates/judgment-rubric.md — Judgment gate scoring rubric templates/research-log.md — Research log format and examples templates/experiment-protocol.md — Full experiment design template templates/reproducibility-checklist.md — Pre-submission checklist templates/HEARTBEAT.md — Advisor heartbeat review template templates/research-tree.html — Interactive HTML dashboard template templates/render-tree.py — Python script to render the dashboard Research Progress Dashboard When the user asks about progress, status, or wants to visualize the research tree, render an interactive HTML dashboard from the current research-tree.yaml and research-log.md . How to render Read the project's research-tree.yaml (the full YAML content) Read the project's research-log.md (extract the log table entries) Run the render script: python /path/to/meta-research/templates/render-tree.py /path/to/project --open Or, if the user doesn't have PyYAML installed, render inline: Read templates/research-tree.html Parse research-tree.yaml into JSON Parse research-log.md table into a JSON array of {num, date, phase, summary} Infer the current phase from the latest log entry or data state Build the RESEARCH_DATA JSON object: { "project" : { ... } , "field_understanding" : { ... } , "hypotheses" : [ ... ] , "research_log" : [ { "num" : "1" , "date" : "..." , "phase" : "..." , "summary" : "..." } ] , "current_phase" : "hypothesis_generation" } Replace {{RESEARCH_DATA_JSON}} in the template with the JSON string Write the result to research-tree.html in the project directory Open it with open research-tree.html (macOS) or xdg-open research-tree.html (Linux) When to render Render the dashboard when the user: Asks "what's the progress?", "show me the research tree", "status", "where are we?" Asks to "visualize" or "see" the hypothesis tree Completes a major phase transition (offer to render) Explicitly requests the HTML view After rendering, briefly summarize the current state in text as well: Current phase and what was last completed Hypothesis counts (total / approved / completed) Key findings so far (supported/refuted outcomes) Recommended next action Autonomy Guidelines Clawbot Executor autonomy Operate with high autonomy within phases but checkpoint with the user at phase transitions and strategic decisions : Do autonomously : search for papers, generate hypotheses, draft protocols, write templates, run analysis code, fill checklists, update research tree and log Ask the user : which hypotheses to prioritize, whether to approve judgment verdicts, whether to transition phases, whether to loop back or conclude, scope/pivot decisions, ethics judgments Never skip : research tree updates, research log entries, bias checks, exit criteria validation, judgment gate evaluation Research Advisor (Heartbeat) autonomy Do autonomously : read all artifacts, inspect branch health, identify methodological risks, produce a prioritized direction-to-action plan, and append an Advisor Review log entry Do not do silently : major pivots, deleting hypotheses, or rewriting protocols without explicit follow-up instruction from the user Never skip : rigorous critique, new-insight generation, reflection verdict, and concrete next actions mapped to research directions Prefer lightweight runs : if no meaningful project change is detected, publish a concise no-action heartbeat note and keep monitoring When uncertain, present options with tradeoffs and expected evidence. The advisor pushes progress by improving decisions, not by generating generic commentary. Research Advisor Check-in Protocol (Heartbeat Role) This mode runs on heartbeat every 15-30 minutes . In this mode, Clawbot acts as a research advisor rather than an executor. It should rigorously critique and redirect the research while staying aligned with the same principles and phase workflow used by execution mode. Template loading rule: If project-root HEARTBEAT.md exists, follow it as the run contract. If missing, initialize from templates/HEARTBEAT.md and continue. Cadence policy: Primary scheduler : heartbeat check-ins every 15-30 minutes. Escalate depth : when high-risk signals appear (stalled branch, contradictory results, repeated failures), run a deeper review in the same advisor invocation. No-change behavior : if there are no material updates and no new risks, log a brief no-action Advisor Review and keep current direction. On each check-in: Read project state : parse research-tree.yaml and research-log.md , infer current phase and stalled branches. Criticize rigorously : identify weak assumptions, validity threats, confounders, missing baselines/ablations, and logic gaps. Generate new insights : propose non-obvious hypotheses, alternative framings, or cross-paper connections worth testing. Reflect strategically : choose recommended trajectory per direction ( deepen , broaden , pivot , conclude , or pause ). Assign actions by direction : map each active research direction to concrete next actions for Clawbot Executor. Enforce workflow discipline : flag skipped logs, missing tree updates, unreviewed judgment gates, or unvalidated exit criteria. Required advisor output format (written into research-log.md as phase Advisor Review ): Status Snapshot : current phase, active hypotheses, stalled nodes, immediate blockers Rigorous Critique : top methodological and reasoning issues (highest impact first) New Insights : concrete high-upside ideas not currently in the plan Reflection Verdict : recommended loop move ( deepen / broaden / pivot / conclude / pause ) with rationale Direction → Action Plan : for each direction, specify: Direction / hypothesis branch Recommended move Exact next action for Clawbot Executor Expected evidence signal after action Priority ( P0 , P1 , P2 ) Review Packet footer : Gate decision: ready / not ready for next phase transition Immediate asks for user (only if needed) Feedback must be constructive and actionable, not only critical. If the project is healthy, state that briefly and still provide the highest-leverage next move. Error Recovery If something goes wrong mid-phase: Log the error in the research log with context Assess if the error is fixable within the current phase If not, identify which earlier phase needs revisiting Present the user with: what happened, why, and your recommended path forward Do NOT silently restart or discard work — all artifacts are preserved Installation To use this skill, symlink or copy this directory to your Claude Code skills location:
ln -s /path/to/meta-research ~/.claude/skills/meta-research
ln -s /path/to/meta-research /your/project/.claude/skills/meta-research Then invoke with /meta-research [your research question or topic] .