Name: Fleet Repair
Author: supportersimulator

Skills suchen.../

Fleet Repair | Skills Pool

Message delivery falls back to P3+
  → L1: Log to fleet dashboard + local repair log
  → L2: Send context message with repair steps to target node
  → Wait 120s — did P1/P2 recover?
  → L3: SSH into target, spawn repair agent (claude -p)
  → Wait 300s — did P1/P2 recover?
  → L4: Present remote fix commands to human for approval

# Send a repair request to a specific node
curl -sf -X POST http://127.0.0.1:8855/repair -H "Content-Type: application/json" \
  -d '{"target": "<node-id>", "channel": "nats", "level": "assist"}'

# Check repair status
curl -s http://127.0.0.1:8855/repair/status | python3 -m json.tool

python3 tools/fleet_nerve_nats.py repair <node-id>
python3 tools/fleet_nerve_nats.py repair <node-id> --level guide
python3 tools/fleet_nerve_nats.py repair <node-id> --channel http --level remote

# View recent repairs
curl -s http://127.0.0.1:8855/repair/log | python3 -m json.tool

# Filter by node
curl -s "http://127.0.0.1:8855/repair/log?node=<node-id>" | python3 -m json.tool

{
  "repair": {
    "autoEscalate": true,
    "l2WaitSeconds": 120,
    "l3WaitSeconds": 300,
    "l4RequiresApproval": true,
    "maxRetriesPerHour": 3
  }
}

Key	Default	Description
`autoEscalate`	`true`	Automatically escalate through levels
`l2WaitSeconds`	`120`	Wait time before escalating L2 to L3
`l3WaitSeconds`	`300`	Wait time before escalating L3 to L4
`l4RequiresApproval`	`true`	Require human approval for remote execution
`maxRetriesPerHour`	`3`	Rate limit to prevent repair storms

# Check which peers have broken channels
curl -s http://127.0.0.1:8855/channels | python3 -c "
import sys, json
data = json.load(sys.stdin)
for peer, ch in data.get('peers', {}).items():
    broken = [p for p, v in ch.items() if v.get('status') == 'fail' and p in ('p1_nats','p2_http')]
    if broken:
        print(f'{peer}: {broken} — needs repair')
"

# Trigger repair for all broken peers
for node in $(curl -s http://127.0.0.1:8855/channels | python3 -c "
import sys, json
data = json.load(sys.stdin)
for peer, ch in data.get('peers', {}).items():
    if any(v.get('status') == 'fail' for p, v in ch.items() if p in ('p1_nats','p2_http')):
        print(peer)
"); do
  curl -sf -X POST http://127.0.0.1:8855/repair -H "Content-Type: application/json" \
    -d "{\"target\": \"$node\", \"level\": \"assist\"}"
done

Level	Action	Automation	Human needed?
L1 Notify	Log the failure, alert the fleet dashboard	Fully automatic	No
L2 Guide	Send repair instructions to the target node's session	Fully automatic	Human reads instructions
L3 Assist	Spawn a repair agent on the target via SSH	Fully automatic	No

Level	Action	Automation	Human needed?
L1 Notify	Log the failure, alert the fleet dashboard	Fully automatic	No
L2 Guide	Send repair instructions to the target node's session	Fully automatic	Human reads instructions
L3 Assist	Spawn a repair agent on the target via SSH	Fully automatic	No

Fleet Repair

Escalation Levels

Fleet Repair

Escalation Levels

How It Works

Triggering a Repair

Automatic (self-healing)

Manual repair request

Via fleet CLI

What Each Level Fixes

L1 Notify

L2 Guide

L3 Assist

L4 Remote

Repair Log

Configuration

Background Repair Agents

When to Use This Skill

Bluebubbles

Add Tracing

Analytics Events

Add Expert

Arthas

Arthas Eagleeye Traceid