Guidelines for using Zen MCP tools effectively in this repo. Use for complex multi-model tasks, architectural decisions, or when cross-model validation adds value.
This Skill defines when and how to use Zen MCP tools in the hti-zen-harness project.
Zen provides multi-model orchestration (planner, consensus, codereview, thinkdeep, debug, clink). Use them deliberately when they add real value, not reflexively.
Prefer direct Zen MCP tools over clink:
chat, thinkdeep, consensus, codereview, precommit, debug, plannerAvoid clink unless absolutely necessary:
Bottom line: Direct API tools (mcp__zen__chat, mcp__zen__consensus, etc.) do everything you need without the CLI overhead.
Complex architectural work
Safety-critical code
Ambiguous or contentious decisions
Deep investigation needed
Simple changes
For these, direct implementation is faster and more appropriate.
planner - Multi-step planning with reflectionUse when:
Example: "Plan migration of adapter interface to support streaming responses"
consensus - Multi-model debate and synthesisUse when:
Example: "Should we use async generators or callback patterns for streaming? Get consensus from multiple models."
Models to include: At least 2, typically 3-4. Mix code-specialized models with general reasoning models.
codereview - Systematic code analysisUse when:
Example: "Review the new HTI band scheduler implementation for correctness and edge cases."
thinkdeep - Hypothesis-driven investigationUse when:
Example: "Investigate why adapter timeout logic behaves differently under load."
debug - Root cause analysisUse when:
Example: "Debug why HTI band transitions occasionally skip validation steps."
clink - Delegating to external CLI toolsUse when:
Example: "Use clink with gemini CLI for large-scale codebase exploration."
chat - General-purpose thinking partnerUse for:
When calling Zen tools, choose models deliberately based on the task:
consensus or sequential codereviewWhen model selection matters for auditability:
# HTI-NOTE: Implementation reviewed by code-specialized models (consensus check).
# No race conditions detected in band transition logic.
def transition_band(current: Band, target: Band) -> Result:
...
clinkZen's clink tool can execute shell commands. Use it responsibly.
ls, pwd, cat, head, tail, findgit status, git diff, git log, git branchpytest, python -m pytest, test runnersruff check, black --check, mypy, static analysispython --version, uv --version, dependency checkspip install, uv add, npm installgit commit, git push, git reset, git checkout -b, git rebaserm, mv, file deletions/movescurl, wget, API calls.env filesHow to ask:
I need to run: `pip install pytest-asyncio`
Reason: Required for testing async adapter implementations
Approve?
When Zen tools or model calls fail, follow these rules (aligned with hti-fallback-guard):
1. Report clearly:
Zen `codereview` call failed:
Tool: codereview
Model: <model-name>
Error: Rate limit exceeded (429)
Step: Reviewing src/adapters/openai.py
2. Propose alternatives:
3. Document in code if relevant:
# HTI-TODO: Codereview via Zen failed (rate limit).
# Manual review needed for thread safety in adapter pool.
When appropriate, return explicit error states:
@dataclass
class ZenResult:
ok: bool
tool: str
data: dict | None = None
error: str | None = None
# Never set ok=True when Zen call actually failed
For non-trivial work (multi-file refactors, new features, safety-critical edits):
planner for complex, multi-faceted taskshti-fallback-guard principlescodereview for:
precommit before finalizingTell the user:
When working on tests or CI:
Prefer changes that tighten guarantees:
Use Zen tools to:
codereview with focus on testing)thinkdeep on pipeline behavior)consensus on approach)Document how changes affect:
Ask yourself:
Is this complex enough to need multi-model orchestration?
Does this change affect safety or timing?
consensus or codereviewAm I using Zen to avoid thinking, or to think better?
The goal is thoughtful tool use, not tool maximalism.
Available Models (as of 2025-11-30):
gemini-2.5-pro (1M context, deep reasoning), gemini-2.5-flash (ultra-fast)gpt-5.1, gpt-5.1-codex, gpt-5-pro, o3, o3-mini, o4-miniRecommended by Task:
gpt-5.1-codex (code-focused structured planning)gemini-2.5-pro (deep reasoning, 1M context)o3 (strong logical analysis)gpt-5.1 (comprehensive reasoning)gemini-2.5-flash (ultra-fast, 1M context)gpt-5.1 + gemini-2.5-pro + o3)Use Case: Starting v0.X implementation (5+ files, new subsystems)
Pattern:
Use planner with gpt-5.1-codex to design [FEATURE]:
Context:
- Current state: [what exists now]
- Goal: [what we're building]
- Constraints: [HTI invariants, backward compatibility]
Plan should include:
1. Architecture changes needed
2. File modifications (existing + new)
3. Testing strategy
4. Migration path (if breaking changes)
Example:
Use planner with gpt-5.1-codex to design v0.6 RL policy integration:
Context:
- Current: PD/PID controllers via ArmBrainPolicy protocol
- Goal: Support stateful RL policies (PPO, SAC, DQN)
- Constraints: Zero harness changes, brain-agnostic design
Plan should include:
1. BrainPolicy extension for stateful policies
2. Episode buffer interface
3. Checkpoint loading/saving
4. Testing with dummy RL brain
Use Case: Multiple valid approaches, safety-critical choices
Pattern:
Use consensus to decide: [QUESTION]
Models:
- gpt-5.1 with stance "for" [OPTION A]
- gemini-2.5-pro with stance "against" [OPTION A, argue for OPTION B]
- o3 with stance "neutral" (objective analysis)
Context:
[Relevant technical details]
Criteria:
- [Criterion 1]
- [Criterion 2]
Example:
Use consensus to decide: RL framework for HTI v0.6
Models:
- gpt-5.1 with stance "for" Stable-Baselines3
- gemini-2.5-pro with stance "against" SB3, argue for CleanRL
- o3 with stance "neutral"
Context:
- Need PPO, SAC, DQN implementations
- Must integrate with HTI ArmBrainPolicy protocol
- Want good documentation and active maintenance
Criteria:
- Ease of integration with HTI
- Code quality and maintainability
- Performance and stability
Use Case: Complex questions about control theory, physics, tuning
Pattern:
Use thinkdeep with [MODEL] to investigate: [QUESTION]
Known evidence:
- [Observation 1]
- [Observation 2]
Initial hypothesis:
[What you think might be happening]
Files to examine:
[Absolute paths]
Example:
Use thinkdeep with o3 to investigate: Why does PD with Kd=2.0 converge faster than Kd=3.0?
Known evidence:
- Kd=2.0: avg 455 ticks to converge
- Kd=3.0: avg 520 ticks to converge
- Both use same Kp=8.0
Initial hypothesis:
Over-damping (Kd too high) slows response
Files to examine:
/home/john2/claude-projects/hti-zen-harness/hti_arm_demo/brains/arm_pd_controller.py
/home/john2/claude-projects/hti-zen-harness/hti_arm_demo/env.py
Use Case: Before committing v0.X release (10+ files changed)
Pattern:
Use codereview with gpt-5.1 to review [SCOPE]:
Review type: full
Focus areas:
- Code quality and maintainability
- Security (HTI safety invariants)
- Performance (timing band compliance)
- Architecture (brain-agnostic design preserved)
Files to review:
[List of absolute file paths]
Example:
Use codereview with gpt-5.1 to review HTI v0.5 implementation:
Review type: full
Focus areas:
- Brain-agnostic design preserved
- EventPack metadata extension correct
- No timing band violations
- Fallback logic compliance (hti-fallback-guard)
Files to review:
/home/john2/claude-projects/hti-zen-harness/hti_arm_demo/brains/arm_imperfect.py
/home/john2/claude-projects/hti-zen-harness/hti_arm_demo/run_v05_demo.py
/home/john2/claude-projects/hti-zen-harness/hti_arm_demo/shared_state.py
/home/john2/claude-projects/hti-zen-harness/hti_arm_demo/bands/control.py
Use Case: Large codebase exploration, heavy reviews, save tokens
Pattern:
Use clink with [CLI] [ROLE] to [TASK]
Available CLIs: gemini, codex, claude
Available Roles: default, planner, codereviewer
Examples:
# Code review in isolated context (saves our tokens)
Use clink with gemini codereviewer to review hti_arm_demo/ for safety issues
# Large codebase exploration
Use clink with gemini to map all brain implementations and document their interfaces
# Strategic planning
Use clink with gemini planner to design phase-by-phase migration to MuJoCo physics
Why use clink:
Use Case: Before git commit on major changes
Pattern:
Use precommit with gpt-5.1 to validate changes in [PATH]:
Focus:
- Security issues
- Breaking changes
- Missing tests
- Documentation completeness
Example:
Use precommit with gpt-5.1 to validate changes in /home/john2/claude-projects/hti-zen-harness:
Focus:
- HTI safety invariants preserved
- No regressions in existing tests
- New tests for v0.6 features
- CHANGELOG and SPEC updated