What It Is

Multi-Fleet is a cross-machine coordination layer for Claude Code. It enables multiple machines running independent Claude Code sessions to collaborate in real-time through peer-to-peer messaging, autonomous task agents, and fleet-wide visibility.

Architecture

Layer 5: ContextDNA Chief --- authoritative memory, evidence synthesis, merge adjudication
Layer 4: Multi-Fleet -------- cross-machine coordination (THIS PLUGIN)
Layer 3: Superset ----------- local parallel execution (worktrees, parallel agents)
Layer 2: 3-Surgeons --------- local truth protocol (3 LLMs cross-examine)
Layer 1: Superpowers -------- local captain (discipline, workflow invariance)

Operating rule: Every machine stays independently capable. The chief machine is preferred for synthesis, not required for basic operation.

3x3x3 Model

The fleet operates as 3 machines x 3 surgeons x 3 phases:

Each MacBook runs an identical local surgeon cell (Head + Cardiologist + Neurologist). Same protocol, different worktree/branch/implementation direction.

What It Is

Architecture

Layer 5: ContextDNA Chief --- authoritative memory, evidence synthesis, merge adjudication
Layer 4: Multi-Fleet -------- cross-machine coordination (THIS PLUGIN)
Layer 3: Superset ----------- local parallel execution (worktrees, parallel agents)
Layer 2: 3-Surgeons --------- local truth protocol (3 LLMs cross-examine)
Layer 1: Superpowers -------- local captain (discipline, workflow invariance)

Operating rule: Every machine stays independently capable. The chief machine is preferred for synthesis, not required for basic operation.

3x3x3 Model

The fleet operates as 3 machines x 3 surgeons x 3 phases:

Each MacBook runs an identical local surgeon cell (Head + Cardiologist + Neurologist). Same protocol, different worktree/branch/implementation direction.

Skill

What It Does

When to Invoke

using-multi-fleet

This skill. Architecture overview, role guide, skill index.

First time using multi-fleet, need orientation

fleet-protocol

CORE — self-healing communication invariant, channel priority (P0-P7), repair escalation, idle productivity, secrets protection.

Understanding how fleet communication works, debugging delivery, adding channels

fleet-send

Send a message (context, task, alert, broadcast) to another machine.

Need to communicate with another node

fleet-task

Dispatch autonomous work to another machine with session-aware agents.

Need work done on another node without human interaction

fleet-chain

Chain orchestration — multi-step task dependencies with automatic sequencing across nodes.

Multi-node pipelines, ordered deployments, sequential workflows

fleet-dispatch

Remote worker dispatch with tracked lifecycle and result polling.

Delegating tasks where you need delivery confirmation and result tracking

fleet-ack

Delivery confirmation protocol — ACK tracking, retry on timeout, failure alerting.

Verifying message delivery, debugging undelivered tasks, tuning retry timing

fleet-idle

Productive idle protocol — automatic work discovery when nodes are idle.

Checking suggested work, tuning idle thresholds, understanding suggestions

fleet-security

HMAC signing, replay prevention, peer validation, session gold sanitization.

Setting up fleet auth, debugging rejected messages, auditing security posture

fleet-status

Quick health check: who's online, idle, working.

Before dispatching work, checking connectivity

fleet-check

Run the full 7-channel communication test to a target.

Diagnosing delivery failures, verifying setup

fleet-repair

4-level repair escalation for broken channels (notify, guide, assist, remote).

Channels are broken, self-healing triggered

fleet-wake

Wake a sleeping machine via health check, SSH, or WoL magic packet.

Target node is offline, need it for a task

fleet-tunnel

SSH tunnel management for restricted networks.

Firewall blocks ports 4222/8844/8855

fleet-worker

tmux-isolated worker pool for tasks that shouldn't disrupt interactive sessions.

Running fleet tasks without IDE lag

fleet-watchdog

Continuous background health monitoring with auto-repair triggers.

Understanding why nodes go offline, tuning thresholds

productivity-view

Live fleet-wide dashboard showing nodes, agents, backlog, and coordination.

Watching fleet operations in real-time

Role	What it does	Example
Chief	Runs ContextDNA, Redis, MLX LLM. Authoritative memory. Collects verdicts, requests rebuttals, synthesizes merge decisions with evidence.	mac1
Worker	Runs local 3-Surgeons cell, produces verdicts, critiques other machines' work. Full local capability. Independent when chief is down.	mac2, mac3
Head Surgeon	Per-machine authority. Only role that emits cross-machine packets. Produces `local_verdict` with confidence + dissent.	One per machine
Coordinator Agent	Dedicated spawned agent per machine for LAN packet handling. Never edits code. Only relays, summarizes, compares, escalates.	One per machine

Priority	Channel	Latency	Description
P0	Cloud relay	<200ms	AWS/cloud bridge — works across networks
P1	NATS pub/sub	<100ms	Real-time, bidirectional
P2	HTTP direct	<1s	Peer-to-peer, no central dependency
P3	Chief relay	1-2s	Queues for offline nodes
P4	Seed file	next prompt	SSH write to target's seed dir
P5	SSH direct	2-5s	Reliable, works through firewalls
P6	Wake-on-LAN	10-60s	Wake sleeping machines
P7	Git push	async	Guaranteed delivery (eventual)

Secret Type	Lookup Order	Never
API keys	`$ENV_VAR` → macOS Keychain → AWS Secrets Manager	Never in code, never in config files
IPs	`.multifleet/config.json` only	Never hardcoded in scripts or skills
Usernames	`.multifleet/config.json` or `$USER`	Never hardcoded

Type	What happens	Disrupts session?
`context`	Seed file, injected on next prompt	No
`task`	Spawns autonomous `claude -p` agent	No
`reply`	Seed file with ref to original	No
`alert`	Focuses VS Code + macOS notification	Yes
`sync`	Silent state bookkeeping	No
`broadcast`	Seed file on all nodes	No

Symptom	Diagnosis	Skill to Use
`curl: (7) Failed to connect to 127.0.0.1 port 8855`	Local daemon not running	Restart daemon manually or via LaunchAgent
Node shows `offline` in fleet-status	Target unreachable	`fleet-wake` then `fleet-check`
Message delivered via P5 (SSH) instead of P1 (NATS)	NATS or HTTP broken	`fleet-check` then `fleet-repair`
Task agent spawned but no results	Agent may be stuck or timed out	Check `/tasks/live`, read `/tmp/atlas-agent-results/`
All nodes show `offline` except self	Network issue or config wrong	Check `.multifleet/config.json` peer entries

Using Multi Fleet

What It Is

Architecture

3x3x3 Model

Using Multi Fleet

What It Is

Architecture

3x3x3 Model

Fleet Roles

Verdict and Rebuttal Protocol

<HARD-GATE> Communication Protocol Invariants

1. CHANNEL PRIORITY ORDER (mandatory sequence)

Communication Protocol (8-Priority Fallback)

2. SELF-HEALING AFTER DELIVERY (automatic, not optional)

3. IDLE PROTOCOL (mandatory for all fleet nodes)

4. SECRETS INVARIANT (zero exceptions)

5. COMMUNICATION INVARIANCE (every message, every time)

</HARD-GATE>

Self-Healing Invariant

Message Types

All 17 Skills

Quick Start

Step 1: Check fleet health

Step 2: Start live monitoring

Step 3: Send a message

Step 4: Check task results

Common Workflows

Dispatch work to another machine

Debug why a message didn't arrive

Fleet-wide operation (all machines)

Error Handling Basics

Communication Protocol (INVARIANT)

Channel Cascade

Self-Healing Protocol

Idle Protocol

Channel State Inspection

Step 2: Configure fleet identity

Step 3: Start the daemon

Step 4: Verify connectivity

Step 5: Set up monitoring

Step 6: Verify health

Key Principles

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns