Coordinates heterogeneous AI agents with dynamic role assignment and state synchronization across distributed workspaces
Orchestrates heterogeneous AI agents with dynamic role assignment, state synchronization, and fault tolerance across distributed workspaces. Manages agent lifecycle, task delegation, and cross-agent data flow for complex multi-system workflows.
Real use cases:
maestro agent assign <agent-type> --to-workspace <path> [--role <role>] [--priority <1-5>]
Assigns an agent instance to a workspace with specific role and priority.
maestro agent list [--status <active|idle|failed|all>] [--format <json|table>]
Lists all registered agents with current state and assignments.
maestro task dispatch <prompt> --to <agent1,agent2,...> [--sync|--async] [--timeout <seconds>] [--depends-on <task-id>]
Dispatches a task to multiple agents with dependency graph support.
maestro workflow create <name> --definition <yaml-path> [--vars <json-path>]
Creates a reusable workflow from YAML definition with variable substitution.
maestro workflow run <workflow-id> [--param key=value ...] [--trace] [--resume-from <step-id>]
Executes a workflow with parameter injection and trace logging.
maestro state snapshot [--workspace <path>] [--include-logs] [--output <tar.gz>]
Captures complete state of all agents and workspaces for rollback.
maestro state restore <snapshot-path> [--skip-failed] [--dry-run]
Restores agents and workspaces to snapshot state.
maestro monitor start [--interval <seconds>] [--alert-webhook <url>] [--metrics-port <port>]
Starts real-time monitoring with health checks and alerting.
maestro agent health <agent-id> [--detail] [--log-lines <n>]
Checks health status of specific agent with optional log tail.
maestro queue status [--agent <id>] [--task <id>] [--waiting|--processing|--completed]
Shows task queue state across all agents.
MAESTRO_WORKSPACE_ROOT - Root directory for all isolated workspaces (default: /tmp/maestro-workspaces)
MAESTRO_REDIS_URL - Redis connection for state sharing (default: redis://localhost:6379)
MAESTRO_LOG_LEVEL - Logging verbosity (DEBUG, INFO, WARN, ERROR)
MAESTRO_MAX_CONCURRENT - Maximum parallel agent executions (default: 4)
MAESTRO_SNAPSHOT_RETENTION - Days to keep automatic snapshots (default: 7)
MAESTRO_KILOCODE_BIN - Path to Kilocode CLI binary (default: /usr/local/bin/kilo)
MAESTRO_AGENT_TIMEOUT - Default agent task timeout in seconds (default: 1800)
MAESTRO_FAILSAFE_MODE - Enable aggressive rollback on any agent failure (true/false)
Agent Discovery & Registration
/etc/maestro/agents.d/*.yamlmaestro agent register or auto-register on first useWorkspace Isolation
MAESTRO_WORKSPACE_ROOT/<agent-id>/<task-id>/agents.d/<agent>.yaml env blockmaestro-<workspace-hash> unless shared_network: trueTask Dispatching
maestro task dispatch validates target agents are registered and healthy--sync: blocks until all agents complete, aggregates results, exits with first non-zero code if any fail--async: returns immediately with task-tracker-id, user polls maestro task status <id>--depends-on tasks completed before dispatchExecution & Monitoring
kilo <mode> "<prompt>" <workspace-dir>workspace/agent.log--trace: wraps prompt with "TRACE: [timestamp] step: " prefixes for each stdout lineHealth Checking
--interval):
agent:<id>:last_heartbeat (must be < 60s old)FAILSAFE_MODE=trueState Synchronization
maestro state set <key> <value> (stored in Redis, TTL 24h by default){{ state.<key> }} in YAML definitionstask:<id>:result in stateSnapshot & Rollback
maestro state snapshot creates tar.gz with:
maestro state restore stops all agents, restores workspaces from tar, loads Redis RDB, restarts agents to previous assignmentsShutdown & Cleanup
maestro shutdown [--grace-period <seconds>] sends SIGTERM to all agents, waits, then SIGKILLSNAPSHOT_RETENTION days (unless --keep specified)latest symlink--to-workspace pointing outside Maestro's own root to prevent recursive workspace creation.maestro agent health <id> must show status: healthy.--sync for dependent tasks, --async for fire-and-forget. Never assume async task completes successfully without explicit status check.MAESTRO_WORKSPACE_ROOT to a disk with sufficient space (≥ 10GB per concurrent agent). Default /tmp may be small.maestro workflow create/run instead of manual maestro task dispatch to capture reproducible definitions.redis-cli ping must return PONG.maestro agent list to ensure no critical agents are active (unless intentional rollback during incident).--trace in debugging multi-agent failures; the trace log shows inter-agent message timing and payloads.Example 1: Coordinate code review and testing on a PR
# Assign code-reviewer and test-engineer agents to fresh workspaces
maestro agent assign code-reviewer --to-workspace /tmp/pr-123-review --priority 1
maestro agent assign test-engineer --to-workspace /tmp/pr-123-tests --priority 2
# Dispatch review task to code-reviewer, then run tests only if review passes (sync dependency)
maestro task dispatch "Review PR #123, check for security issues and code style" \
--to code-reviewer --sync
# If review succeeded (exit code 0), dispatch tests
maestro task dispatch "Run unit and integration tests for PR #123" \
--to test-engineer --depends-on $(maestro task list --last | jq -r '.id') --sync
# Capture final state for audit
maestro state snapshot --output /backups/pr-123-snapshot-$(date +%s).tar.gz
Example 2: FlickClaw feature with parallel frontend and backend development
export MAESTRO_WORKSPACE_ROOT=/opt/flickclaw-workspaces
maestro agent assign flickclaw-bot --to-workspace /opt/flickclaw-workspaces/backend --role backend-dev
maestro agent assign frontend-specialist --to-workspace /opt/flickclaw-workspaces/frontend --role ui-dev
# Run both agents async, they coordinate via shared state
maestro task dispatch "Implement user authentication API endpoint with JWT" \
--to flickclaw-bot --async
maestro task dispatch "Create login form component with validation and error handling" \
--to frontend-specialist --async --timeout 3600
# Poll until both tasks complete
while true; do
backend_status=$(maestro task status --last flickclaw-bot | jq -r '.status')
frontend_status=$(maestro task status --last frontend-specialist | jq -r '.status')
[[ "$backend_status" == "completed" && "$frontend_status" == "completed" ]] && break
sleep 30
done
Example 3: Incident rollback with snapshot restore
# Emergency: agents diverged, restore last known good state
maestro agent list # see active agents
maestro shutdown --grace-period 10 # stop all agents gracefully
# Find latest snapshot
latest=$(ls -t /backups/maestro-snapshots/*.tar.gz | head -1)
maestro state restore "$latest" --dry-run # verify what will change
maestro state restore "$latest" # perform restore
# Restart agents to previous assignments
maestro agent assign --from-snapshot "$latest"
Example 4: Create reusable workflow for RPGCLAW deployment
workflows/deploy-rpgclaw.yaml: