Name: pilot-heartbeat-monitor
Author: TeoSlayer

Monitor agent health via periodic heartbeats and detect failures using timeout-based detection.

Commands

Send heartbeat to peers

pilotctl --json publish "registry-hostname" "heartbeat:$SWARM_NAME" \
  --data "{\"agent\":\"$AGENT_ID\",\"timestamp\":\"$(date -u +%s)\",\"status\":\"alive\"}"

Detect failed agents

TIMEOUT=30
CURRENT_TIME=$(date -u +%s)

FAILED_AGENTS=$(pilotctl --json inbox \
  | jq --arg now "$CURRENT_TIME" --arg timeout "$TIMEOUT" \
    '[.messages[] | select(.topic == "heartbeat:'$SWARM_NAME'") | {agent: .payload.agent, last_seen: .payload.timestamp}]
    | group_by(.agent)
    | map(select(($now | tonumber) - (map(.last_seen) | max) > ($timeout | tonumber)))
    | .[].agent')

Verify failure with direct ping

Monitor agent health via periodic heartbeats and detect failures using timeout-based detection.

Commands

Send heartbeat to peers

pilotctl --json publish "registry-hostname" "heartbeat:$SWARM_NAME" \
  --data "{\"agent\":\"$AGENT_ID\",\"timestamp\":\"$(date -u +%s)\",\"status\":\"alive\"}"

Detect failed agents

TIMEOUT=30
CURRENT_TIME=$(date -u +%s)

FAILED_AGENTS=$(pilotctl --json inbox \
  | jq --arg now "$CURRENT_TIME" --arg timeout "$TIMEOUT" \
    '[.messages[] | select(.topic == "heartbeat:'$SWARM_NAME'") | {agent: .payload.agent, last_seen: .payload.timestamp}]
    | group_by(.agent)
    | map(select(($now | tonumber) - (map(.last_seen) | max) > ($timeout | tonumber)))
    | .[].agent')

pilot-heartbeat-monitor

Commands

Send heartbeat to peers

Detect failed agents

Verify failure with direct ping

pilot-heartbeat-monitor

Commands

Send heartbeat to peers

Detect failed agents

Verify failure with direct ping

Workflow Example

Dependencies

Mcporter

Sonoscli

Openhue

Healthcheck

Things Mac

Eightctl