Collaborative theorem proving orchestrator. Uses lc CLI for state, spawns parallel agents.
YOU ARE THE ORCHESTRATOR. Keep running until proof is complete.
./bin/lc status onlyWe combine the LLM's mathematical reasoning with Lean's verification to avoid wasted attempts.
The old suggest-first approach fails for complex goals:
suggest| What LLMs Do Best | What Lean Does Best |
|---|---|
| Proof architecture design | Type checking |
| Creative strategies | Finding direct lemmas |
| Understanding math concepts | Verification |
| Multi-step reasoning | Detecting sorry/errors |
Workflow:
sorry placeholders--skeleton flag)Before decomposing, validate your architecture compiles:
./bin/lc verify --goal $GOAL_ID --tactic "have h1 : <subgoal1> := sorry; have h2 : <subgoal2> := sorry; <final_step>" --skeleton
If skeleton fails → try different architecture BEFORE creating subgoals. If skeleton passes → safe to decompose.
MATH CARD (Prover - for complex tactics):
┌─MATH─────────────────────────────────────────┐
│ GOAL: <the goal> │
├──────────────────────────────────────────────┤
│ CLASS: <arith/analytical/geometric/etc> │
│ KEY: <core mathematical insight> │
│ WHY: <why this insight applies> │
├──────────────────────────────────────────────┤
│ TACTICS: <candidate tactics based on insight> │
│ PATTERN: <Lean pattern to follow> │
└──────────────────────────────────────────────┘
ARCHITECTURE CARD (Decomposer - before creating subgoals):
┌─ARCHITECTURE────────────────────────────────────────┐
│ GOAL: <the goal> │
│ PROOF IDEA: <1-2 sentence description> │
│ KEY INSIGHT: <what mathematical fact makes this work>│
│ SUBGOALS: │
│ 1. <subgoal1> - because: <why needed> │
│ 2. <subgoal2> - because: <why needed> │
│ ASSEMBLY: <how subgoals combine> │
│ LEAN SKETCH: have h1 := sorry; have h2 := sorry; ... │
│ POTENTIAL ISSUES: <circular deps? missing hyps?> │
└──────────────────────────────────────────────────────┘
We produce proofs that satisfy math professors. Axioms are unacceptable shortcuts.
| Depth | Axiom Allowed? | Action on Failure |
|---|---|---|
| < max_depth - 2 | NO | Always backtrack for better decomposition |
| >= max_depth - 2 | Only if atomic + cited | Prefer backtrack, axiom is last resort |
0 < Real.pi (atomic constant, Mathlib: Real.pi_pos)Real.sin 0 = 0 (atomic evaluation, Mathlib: Real.sin_zero)sin x ≤ f(x) - This is THE PROBLEM, not an axiom!When a prover fails, it should backtrack to let decomposer try a different strategy. Multiple backtrack cycles are GOOD - they explore the proof space. Axioms are BAD - they leave holes.
Before doing ANYTHING else, run:
./bin/lc init && ./bin/lc status
open_goals or working_goals exist → Resume immediately. Go to Main Loop.ready_to_compose is true → Run compose. You're done.DO NOT ask the user what to do if there's existing work. Just resume.
Do NOT read agent logs or output files to check progress.
All coordination happens through Ensue:
./bin/lc status → check goal states (open, working, solved, etc.)./bin/lc listen → receive real-time events when goals changeWhen you spawn an agent in the background, do NOT read its output file. Instead, run ./bin/lc listen to wait for goal state changes. The agent will update Ensue when it finishes, and you'll receive the event.
Only run this if no existing work was found above.
You are running from the plugin directory. The lc binary is at ./bin/lc.
./bin/lc init --create-root \
--theorem "<theorem goal statement>" \
--hypotheses "<hyp1>;;hyp2;;..."
CRITICAL: The --hypotheses flag is required for theorems with assumptions. Without it, child goals won't have access to the theorem's hypotheses and proofs will fail.
Example for Putnam B1:
./bin/lc init --create-root \
--theorem "∃ c : Bool, ∀ P : EuclideanSpace ℝ (Fin 2), color P = c" \
--hypotheses "color : EuclideanSpace ℝ (Fin 2) → Bool;;h : ∀ (s : Affine.Simplex ℝ (EuclideanSpace ℝ (Fin 2)) 2), (∀ i j : Fin 3, color (s.points i) = color (s.points j)) → color s.circumcenter = color (s.points 0)"
Output:
{
"success": true,
"theorem_id": "putnam-2025-a2",
"workspace": "/path/to/workspace/putnam-2025-a2",
"ensue_url": "https://api.ensue-network.ai",
"config": {
"max_parallel_agents": 12,
"max_depth": 12,
"claim_ttl_seconds": 300
},
"active_subscriptions": 5
}
All persistent state lives in Ensue:
proofs/{theorem_id}/
├── goals/
│ └── {goal_id} # Goal JSON object
├── solutions/
│ └── {goal_id} # Verified tactic string
└── final-proof # Composed Lean proof
strategies/{goal_hash}/
└── {strategy_id} # Failed/succeeded strategy records
Each goal at proofs/{tid}/goals/{gid} contains: id, goal_type, state, depth, parent, hypotheses, has_quantifiers, has_transcendentals, is_numeric, strategy_attempts, attempt_count.
| State | Fields | Meaning |
|---|---|---|
open | — | Available for work |
working | agent, claimed_at, expires_at | Claimed by agent |
solved | tactic, imports[], solved_at | Proof verified |
decomposed | children[], strategy, decomposed_at | Split into subgoals |
axiom | justification, axiomatized_at | Accepted as axiom |
backtracked | attempt, backtracked_at | Decomposition failed, retry |
exhausted | attempts, exhausted_at | All strategies failed |
abandoned | parent_backtrack_attempt, abandoned_at | Goal invalidated (see below) |
parent_backtrack_attempt Values:
| Value | Meaning |
|---|---|
0 | Manual abandonment (prover called abandon directly) - should be rare |
1-N | Cascade from parent backtrack attempt N - normal backtrack flow |
4294967295 (u32::MAX) | Auto-orphan cleanup (invalid ancestor detected during claim) |
State Transitions:
open ──────────┬──► working ──► solved
│ │
│ └──► open (claim expired)
│
└──► (via decomposer) decomposed ──► backtracked ──► open
│
children ──► abandoned
axiom ◄── (last resort, depth >= max-2, atomic + cited)
exhausted ◄── (all strategies failed, no more options)
Do NOT Spawn Agents For:
abandoned - Parent was backtracked; these goals are preserved for history onlyexhausted - All strategies failed; requires human intervention or proof redesignsolved - Already provenaxiom - Accepted as axiomdecomposed - Already split into children (work on children instead)Cascading Invalidation:
When a parent is backtracked, its direct children are abandoned. Grandchildren may still appear as open/axiom/etc., but the claim command will detect the invalid ancestor and auto-abandon them. This is lazy evaluation - descendants are cleaned up when accessed, not eagerly.
CLI detects these syntactically from goal_type:
| Flag | Detection |
|---|---|
has_quantifiers | Contains ∀, ∃, forall, exists |
has_transcendentals | Contains Real.sin, Real.cos, Real.exp, Real.log, Real.pi, etc. |
is_numeric | Only numbers and arithmetic operators |
is_numeric → Trivial (prover: norm_num, decide)
has_quantifiers → Structural (decomposer: intro, cases)
has_transcendentals → Analytical (decomposer: calculus setup)
otherwise → NeedsJudgment (YOU MUST CLASSIFY - see below)
When complexity: "needs_judgment", YOU must analyze the goal before spawning. Don't guess - think:
┌─CLASSIFY────────────────────────────────────────┐
│ GOAL: <goal_type> │
│ HYPOTHESES: <list them> │
├──────────────────────────────────────────────────┤
│ Q1: Can any hypothesis directly prove this? │
│ Q2: Do I need to CONSTRUCT something first? │
│ Q3: Is this a single tactic or multi-step? │
├──────────────────────────────────────────────────┤
│ VERDICT: SIMPLE → prover | COMPLEX → decomposer │
└──────────────────────────────────────────────────┘
Examples:
| Goal | Hypotheses | Verdict | Why |
|---|---|---|---|
2 + 2 = 4 | none | SIMPLE | arithmetic |
color P = color 0 | h : monochromatic simplex → ... | COMPLEX | need to construct simplices |
x ∈ S | hx : x ∈ S | SIMPLE | direct hypothesis |
P ∧ Q | hP : P, hQ : Q | SIMPLE | constructor with both available |
f x = g x | h : ∀ y, f y = g y | SIMPLE | direct apply |
dist A B = dist C D | geometric setup | COMPLEX | likely needs construction |
Event-driven with timeout fallback. React to events, but don't block forever.
./bin/lc listen --prefix proofs/{theorem_id}/ --output workspace/events.jsonl
Run with run_in_background=true. This runs for the entire session.
./bin/lc status
ready_to_compose → run composer, DONEopen_goals → spawn appropriate agent (background)Wait for new events, but don't block forever:
timeout 45 tail -f workspace/events.jsonl
Events look like:
{"method":"notifications/resources/updated","params":{"uri":"memory:///proofs/.../goals/root"}}
# WRONG - tight polling loop
sleep 5 && ./bin/lc status
sleep 5 && ./bin/lc status
sleep 5 && ./bin/lc status
# 1. Start listener once
./bin/lc listen --prefix proofs/theorem/ --output workspace/events.jsonl &
# 2. Loop: check status → spawn → wait for events (with timeout) → repeat
while not ready_to_compose:
./bin/lc status → spawn agents for open_goals
timeout 45 tail -f workspace/events.jsonl # blocks until event OR timeout
Key principle: One status check per iteration, not one per agent. Stay active via timeout fallback.
./bin/lc status
Returns: summary (counts by state), open_goals, working_goals, ready_to_compose, config.
./bin/lc status <goal_id>
Returns: goal_type, state, depth, hypotheses, has_quantifiers, has_transcendentals, is_numeric, strategy_attempts.
You decide the agent type based on goal properties from ./bin/lc status.
1. state is "backtracked"?
→ decomposer (try different strategy)
2. complexity is "needs_judgment"?
→ Check DEPTH first:
a) depth >= 3 → prover FIRST (try ≥3 tactics before giving up)
b) depth < 3 → YOU CLASSIFY (see below)
→ At deeper depths, goals are likely leaves that just need the right tactic
→ At shallow depths, goals often need decomposition
3. can_decompose is TRUE and complexity is NOT needs_judgment?
→ decomposer (goal needs to be split)
4. can_decompose is FALSE?
→ prover (goal is a leaf, try tactics)
The CLI computes can_decompose based on:
IMPORTANT: When complexity: "needs_judgment":
This prevents the failure mode where agents decompose endlessly without ever trying to prove anything.
When has_transcendentals is true, ask: "Point evaluation or behavior analysis?"
| Question | Example | Agent |
|---|---|---|
| Can this be solved by evaluating at specific points? | sin 0 = 0, cos π = -1, π > 3 | prover |
| Does this require understanding how the function behaves across a domain? | sin x ≤ f(x) for x ∈ [0,π], bounds requiring calculus | decomposer |
The test: "Does proving this require reasoning about derivatives, extrema, or function shape?"
state | complexity | depth | → Agent |
|---|---|---|---|
| backtracked | — | — | decomposer |
| open/working | needs_judgment | ≥ 3 | prover (try ≥3 tactics first) |
| open/working | needs_judgment | < 3 | YOU CLASSIFY → likely decomposer |
| open/working | structural/analytical | — | decomposer |
| open/working | trivial | — | prover |
When complexity: "needs_judgment" at depth ≥ 3: Send to prover. Don't overthink it - deeper goals are usually leaves that need the right tactic found.
When complexity: "needs_judgment" at depth < 3: Use the CLASSIFY card. Shallow goals often need decomposition before proving.
Use run_in_background=true for parallelism.
Task(
subagent_type="lean-collab:decomposer",
prompt="Goal ID: root\nType: ∀ x ∈ [0,π], sin x ≤ f(x)",
run_in_background=true
)
Just pass goal context. The prover agent has REPL exploration instructions.
Task(
subagent_type="lean-collab:lean-prover",
prompt="Goal ID: <id>
Goal: <goal_type>
Hypotheses:
<hyp1>
<hyp2>
...",
run_in_background=true
)
The prover will:
./bin/lc explore to test tactics and see remaining goalsSpawn multiple in ONE message for parallelism.
./bin/lc listen
Outputs JSON lines:
{"event":"goal_updated","key":"proofs/test/goals/root","goal_id":"root"}
When ready_to_compose is true:
./bin/lc compose
Output: Writes verified proof to workspace/{theorem_id}/output/proof.lean
Tell the user the file path when complete so they can access it.
| Command | Purpose |
|---|---|
./bin/lc init --create-root --theorem "..." --hypotheses "..." | Setup + create root goal with hypotheses |
./bin/lc status | Theorem summary with open/working goals |
./bin/lc status <gid> | Full goal details |
./bin/lc create-goal --id X --goal-type T --depth D | Create goal (auto-subscribes, checks for duplicates) |
./bin/lc decompose <gid> --children X,Y --strategy S | Mark goal as decomposed |
./bin/lc search "query" --prefix tactics/ | Search collective intelligence |
./bin/lc listen | SSE event stream |
./bin/lc claim <gid> | Claim goal (checks ancestors, auto-abandons if invalid) |
./bin/lc unclaim <gid> | Release goal (hooks use this) |
./bin/lc verify --goal <gid> --tactic <t> [--imports X,Y] [--skeleton] | Verify tactic (--skeleton allows sorry for architecture validation) |
./bin/lc axiomatize <gid> --reason <r> [--force] | Mark as axiom (REFUSES invalid reasons/depth) |
./bin/lc backtrack <gid> [--reason <r>] | Undo decomposition, abandon children |
./bin/lc abandon <gid> [--reason <r>] | Abandon goal (auto-backtracks parent for leaf goals) |
./bin/lc compose [-o path] | Build final proof |
create-goal checks for duplicates/circular decompositions (NEW):
--check-similar=false to disable (not recommended)abandon does NOT cascade:
decomposed state./bin/lc backtrack <parent> to abandon entire decomposition strategyaxiomatize REFUSES if:
depth < max_depth - 2 (must decompose further)--force to override (logged as warning)claim checks ancestors:
abandoned or backtracked → auto-abandons the goal{"error": "invalid_ancestor", "action_taken": "auto_abandoned"}verify error handling:
error field (fixed bug where errors went to stdout only)error field to understand WHY tactics failSuccessful tactics and failed strategies are recorded:
tactics/solutions/{hash}-{ts} # Successful proofs (from ./bin/lc verify)
strategies/{hash}/{strategy}-{ts} # Failed decompositions (from ./bin/lc backtrack)
Different agents use this differently:
./bin/lc search "numeric conjunction" --prefix tactics/solutions/
Finds tactics that worked on similar goals. Adapts the reasoning to the current goal.
./bin/lc status $GOAL_ID
# Returns strategy_attempts[] with error reasons
When a goal is backtracked, the decomposer reads strategy_attempts to understand WHY previous decompositions failed, then tries a different approach. The error message guides the next attempt.
For backtracked goals, check ./bin/lc status <gid> for strategy_attempts array. Include failed strategies in decomposer prompt so it tries a different approach.
Environment variables override .lean-collab.json:
export LEAN_COLLAB_MAX_PARALLEL_AGENTS=6
export LEAN_COLLAB_MAX_DEPTH=8
export LEAN_COLLAB_CLAIM_TTL=600