Name: Codex Orchestration
Author: Develata

SkillsPool

搜索技能.../

Codex Orchestration | Skills Pool

"read-only"

"workspace-write"

"read-only"

mcp__codex__codex(
  prompt="Read src/engine_health/diagnosis/track_b_sequence_detection.py lines 28-50 and explain the TrackBSequenceDetectionConfig fields. Answer in under 200 words.",
  sandbox="read-only",
  approval-policy="never",
  cwd="/Users/charles/Desktop/BYSJ/projectv2"
)

mcp__codex__codex-reply(
  threadId="019d70b2-...",
  prompt="Now check what default threshold_mode is used and whether it matches the formal package."
)

Task Type	Executor	Reviewer	Rationale
Implementation	Codex	Claude	Codex is cheaper for bounded execution; Claude has deeper context for correctness review.
Analysis / research	Codex	Claude	Claude has stronger critical judgment for synthesis and review.
Critical decisions	Both independently	Claude synthesizes	Avoids single-model blind spots.
Mechanical / low-priority	Codex alone	None	Not worth reviewer overhead.

Two-file task handoff: Always use two separate files to dispatch tasks to Codex:
1. /tmp/codex_user_context_<ID>.md — the user's recent messages (verbatim, last 1-3 turns as relevant). This is the raw requirement source.
2. /tmp/codex_task_<ID>.md — Claude's analysis, scope, constraints, expected output, and non-goals. This is Claude's instruction layer. <ID> is a short unique suffix (e.g., first 8 chars of a UUID or timestamp) to avoid collisions between concurrent sessions. Then pass Codex a short prompt: "Read /tmp/codex_user_context_<ID>.md for the raw user request and /tmp/codex_task_<ID>.md for your task instructions. Execute accordingly." This keeps Claude's conversation context small and preserves a clean separation between raw user intent and Claude's framing.
Always pass approval-policy: Every mcp__codex__codex call must include approval-policy="never" to prevent interactive approval popups. Omitting this parameter may cause Codex to prompt for every shell command, blocking autonomous execution.
Reuse sessions: After the first mcp__codex__codex call returns a threadId, use mcp__codex__codex-reply with that threadId for follow-up questions in the same domain. This avoids cold-start overhead.
Prefer read-only sandbox: For pure analysis/inspection tasks, pass sandbox: "read-only". This reduces sandbox overhead.
Narrow the prompt: Each Codex prompt should target 1-3 specific files or one specific question. Never send "analyze the entire X module" — instead send "read X.py lines 100-200 and explain function Y."
Specify file paths in prompt: When you already know the relevant files, include their paths directly in the Codex prompt so it doesn't need to search. When you don't know the paths, use the Iterative Retrieval Protocol (§6) instead of guessing.
Cap output expectations: Tell Codex "answer in under 300 words" or "list only file paths" when full prose is unnecessary.
Avoid redundant delegation: If Claude already read a file this conversation, do not ask Codex to re-read it. Synthesize from existing context.
Parallel over serial: When 2-4 independent questions are needed, launch parallel mcp__codex__codex calls rather than sequential ones.
Fail fast: If a Codex call is taking too long or returns partial results, do not retry the same broad prompt. Narrow scope and retry, or fall back to Claude's own Read/Grep tools for the specific data needed.

Task Profile	Route To	Reason
High-stakes (architecture, tradeoff, review, final synthesis)	Claude directly	Quality matters more than cost.
Low-priority mechanical (bulk inspection, formatting, docs, simple summaries)	Codex	Save Claude token budget.
Bounded execution (implementation, testing, debugging)	Codex	Well-scoped, Codex is sufficient.
Open-ended planning, synthesis, multi-source integration	Claude	Requires deep reasoning.

Scope:           [files, directories, or modules]
Goal:            [what to accomplish]
Constraints:     [what not to change, time/size limits]
Expected output: [format and content of the result]
Non-goals:       [what this task explicitly does NOT cover]

Round 1 — Broad discovery (Codex searches, Claude evaluates)
  Claude → Codex: "Search <directory/module> for <keywords/patterns>.
                   List files with: path, one-line summary, relevance tier, mtime (if artifact).
                   Cap at 15 files."
  Codex → Claude: file list + relevance notes

  Claude evaluates: assign relevance tiers, pick top 3-5 files for Round 2.
  Claude may add new keywords discovered from Codex's file list.

Round 2 — Narrowed inspection (standard Two-File Handoff resumes)
  Claude writes discovered paths into /tmp/codex_task_<ID>.md as the Scope field.
  Claude → Codex: "Read <specific files/line ranges>.
                   Answer <targeted analytical question>."
  Codex → Claude: detailed findings

Round 3 — Targeted verification (only if Round 2 reveals a gap)
  Allowed only for: verifying a specific claim, chasing one newly discovered dependency.
  NOT allowed for: introducing a new hypothesis or widening scope.
  Claude → Codex (via codex-reply on same threadId):
                   "Read <newly discovered file> lines X-Y.
                   Verify whether <specific claim from Round 2>."
  Codex → Claude: verification result

Tier	Meaning	Action
High	Directly implements or contains the target logic/data	Read in Round 2
Medium	Tangentially related (imports, configs, test files)	Read only if High files are insufficient
Low	Unlikely relevant (naming coincidence, unrelated module)	Drop unless no better candidates exist

Scope:    <top-level directory or module — intentionally broad>
Goal:     Find files related to <topic/keywords/patterns>.
          For each file report: path | one-line summary | relevance (High/Med/Low) | mtime (artifacts only).
Constraints:
  - Cap at 15 files
  - Do NOT read file contents — list only
  - Include mtime for result artifacts (JSON/CSV/NPZ) so staleness can be assessed
Expected output: Markdown table
Non-goals: Deep analysis — that comes in Round 2.

Level 1 — Triage（判定是否相关）
  每篇论文只喂：abstract + introduction。
  获取优先级：arXiv TeX 源 > 期刊 HTML/Markdown > 本地 PDF 前 ~2 页。
  Codex 输出（每篇一条）：
    - go / kill
    - 一句话理由
    - relevance tag（与本项目哪条研究线相关）

Level 2 — Method Reading（通过 triage 的论文）
  只喂相关 section（method / experiments），不灌入全文。
  优先级同 Level 1。

Level 3 — Full Paper（需要复现或深度引用）
  完整加载，但必须同轮更新
  `docs/10_diagnosis/literature_*/literature_manifest.json` 登记。

Level	TeX / HTML / Markdown	PDF
1. Triage	✅ 只喂 abstract + intro	❌ 整篇禁。如只有 PDF，先手工 / 脚本抽 abstract+intro 段落，再喂纯文本
2. Method reading	✅ 只喂相关 section	⚠️ 仅允许按 page range 抽取的节选（例：pp. 3–7），整篇禁
3. Full paper	✅	✅ 但必须同轮更新 `literature_manifest.json` 登记

Stage 1 — Divergent Generator (Codex session A)
  Input: raw user request + baseline context + constraints only.
  NO "prior attempts"、NO "previously rejected ideas"、NO Claude 自己的 lean。
  Ask: "Generate N=10 candidate approaches, each with: name, one-paragraph
        rationale, key assumption, failure mode. Do not rank. Do not self-filter."
  Output: 10 candidates, flat list.

Stage 2 — Strict Scorer (Codex session B, isolated from A)
  Input: raw user request + baseline context + ONE candidate at a time.
  Session B sees NO other candidates, NO generator's self-assessment.
  Ask: "Score this candidate on: novelty, feasibility, baseline-compatibility,
        implementation complexity, expected gain. Give go / revise / kill plus
        rationale. Cite specific code or artifact paths from baseline when
        claiming compatibility."
  Output: 10 independent score cards (parallel dispatch, different threadIds).

Stage 3 — Decisive Synthesis (Claude, main thread)
  Read all 10 score cards. Rank by score + fit with project constraints
  (journal bar / Track B scope / comparability_impact risk).
  Produce a single ranked short-list (top 2-3) with rationale.
  If top candidate has UNVERIFIED compatibility claim, route to Codex
  for verification before committing.

## Divergent-Strict-Decisive Decision Log

Stage 1 candidates (Codex session A, threadId=<...>):
1. <name> — <one-sentence>
2. ...
10. ...

Stage 2 score cards (parallel, 10 independent sessions):
| Candidate | novelty | feasibility | compat | complexity | gain | verdict | key evidence |
|-----------|---------|-------------|--------|------------|------|---------|--------------|

Stage 3 ranked short-list (Claude):
1. <top pick> — rationale + UNVERIFIED claims to resolve
2. <runner-up>

Tool	Purpose	Key Return
`mcp__codex__codex`	Start a new Codex session	`{ threadId, content }`
`mcp__codex__codex-reply`	Continue an existing session	`{ threadId, content }`

Parameter	Required	Description
`prompt`	Yes	The task description. Be specific, include file paths.
`sandbox`	No	(analysis/inspection) or (file changes). Default: . Prefer when possible.

Tool	Purpose	Key Return
`mcp__codex__codex`	Start a new Codex session	`{ threadId, content }`
`mcp__codex__codex-reply`	Continue an existing session	`{ threadId, content }`

Parameter	Required	Description
`prompt`	Yes	The task description. Be specific, include file paths.
`sandbox`	No	(analysis/inspection) or (file changes). Default: . Prefer when possible.

Parameter	Required	Description
`threadId`	Yes	The `threadId` returned by the initial `mcp__codex__codex` call.
`prompt`	Yes	The follow-up question or instruction.

Codex Orchestration

Codex MCP Orchestration

1. How to Call Codex MCP

Tool Names

Parameters for mcp__codex__codex

Codex Orchestration

Codex MCP Orchestration

1. How to Call Codex MCP

Tool Names

Parameters for mcp__codex__codex

Parameters for mcp__codex__codex-reply

Example: New Session

Example: Continue Session

2. Executor / Reviewer Role Assignment

Reviewer Selection Principles

3. Call Efficiency Rules

4. Cost-Aware Routing

General Rules

5. Task Framing Template

Framing Rules

6. Iterative Retrieval Protocol

When to use

Protocol: DISPATCH → EVALUATE → REFINE → LOOP

Relevance tiers

Integration with existing protocols

Dispatch template — Round 1

When iterative retrieval ends early

Anti-patterns for iterative retrieval

7. Anti-Patterns

8. Literature Triage Dispatch

When to use

三级读入策略

获取顺序

登记义务

防误杀

PDF Admission Gate（强制执行）

9. Divergent–Strict–Decisive Dispatch Pattern

When to use

Three-stage protocol（全部通过 mcp__codex__codex 新 session 完成）

Context isolation rules（硬性约束）

Relation to existing rules

Output format（Stage 3 的最终输出）

10. Long-Running Task Discipline

When to create a TaskCreate chain

Rules

为什么

Anti-patterns

Mcporter

Sonoscli

Openhue

Healthcheck

Things Mac

Eightctl

Parameters for `mcpcodexcodex`

Parameters for `mcpcodexcodex`

Parameters for `mcpcodexcodex-reply`

Three-stage protocol（全部通过 `mcpcodexcodex` 新 session 完成）