技能檔案

Monitor Ci

Name: Monitor Ci
Author: TongTC

Monitor Nx Cloud CI pipeline and handle self-healing fixes. USE WHEN user says "monitor ci", "watch ci", "ci monitor", "watch ci for this branch", "track ci", "check ci status", wants to track CI status, or needs help with self-healing CI fixes. ALWAYS USE THIS SKILL instead of native CI provider tools (gh, glab, etc.) for CI monitoring.

TongTC0 星標2026年2月24日

職業
分類: 系統管理

技能內容

Monitor CI Command

You are the orchestrator for monitoring Nx Cloud CI pipeline executions and handling self-healing fixes. You spawn the ci-monitor-subagent subagent to poll CI status and make decisions based on the results.

Context

Current Branch: !git branch --show-current
Current Commit: !git rev-parse --short HEAD
Remote Status: !git status -sb | head -1

User Instructions

$ARGUMENTS

Important: If user provides specific instructions, respect them over default behaviors described below.

Configuration Defaults

Setting	Default	Description

相關技能

Monitor Ci | Skills Pool

Check nx.json at workspace root for nxCloudId or nxCloudAccessToken

If nx.json missing OR neither property exists → exit with:

[monitor-ci] Nx Cloud not connected. Unlock 70% faster CI and auto-fix broken PRs with https://nx.dev/nx-cloud

If connected → continue to main loop

Anti-Pattern	Why It's Bad
Using CI provider CLIs with `--watch` flags (e.g., `gh pr checks --watch`, `glab ci status -w`)	Bypasses Nx Cloud self-healing entirely
Writing custom CI polling scripts	Unreliable, pollutes context, no self-healing
Cancelling CI workflows/pipelines	Destructive, loses CI progress
Running CI checks on main agent	Wastes main agent context tokens
Independently analyzing/fixing CI failures while subagent polls	Races with self-healing, causes duplicate fixes and confused state

Status	Default Behavior
`ci_success`	Exit with success. Log "CI passed successfully!"
`fix_auto_applying`	Fix will be auto-applied by self-healing. Do NOT call MCP. Record `last_cipe_url`, spawn new subagent in wait mode to poll for new CI Attempt.
`fix_available`	Compare `failedTaskIds` vs `verifiedTaskIds` to determine verification state. See Fix Available Decision Logic section below.
`fix_failed`	Self-healing failed to generate fix. Attempt local fix based on `taskOutputSummary`. If successful → commit, push, loop. If not → exit with failure.
`environment_issue`	Call MCP to request rerun: `update_self_healing_fix({ shortLink, action: "RERUN_ENVIRONMENT_STATE" })`. New CI Attempt spawns automatically. Loop to poll for new CI Attempt.
`self_healing_throttled`	Self-healing throttled due to unapplied fixes. See Throttled Self-Healing Flow below.
`no_fix`	CI failed, no fix available (self-healing disabled or not executable). Attempt local fix if possible. Otherwise exit with failure.
`no_new_cipe`	Expected CI Attempt never spawned (CI workflow likely failed before Nx tasks). Report to user, attempt common fixes if configured, or exit with guidance.
`polling_timeout`	Subagent polling timeout reached. Exit with timeout.
`cipe_canceled`	CI Attempt was canceled. Exit with canceled status.
`cipe_timed_out`	CI Attempt timed out. Exit with timeout status.
`cipe_no_tasks`	CI Attempt exists but failed with no task data (likely infrastructure issue). Retry once with empty commit. If retry fails, exit with failure and guidance.
`error`	Increment `no_progress_count`. If >= 3 → exit with circuit breaker. Otherwise wait 60s and loop.

Apply-locally + enhance flow:
- Run nx-cloud apply-locally <shortLink>
- Enhance the code to fix failing tasks
- Run failing tasks again to verify fix
- If still failing → increment local_verify_count, loop back to enhance
- If passing → commit and push, record expected_commit_sha, spawn subagent in wait mode
Track attempts (wraps step 4):
- Increment local_verify_count after each enhance cycle
- If local_verify_count >= local_verify_attempts (default: 3):
  - Get code in commit-able state
  - Commit and push with message indicating local verification failed
  - Report to user:
```
[monitor-ci] Local verification failed after <N> attempts. Pushed to CI for final validation. Failed: <taskIds>
```
  - Record expected_commit_sha, spawn subagent in wait mode (let CI be final judge)

git commit -m "fix(<projects>): <brief description>

Failed tasks: <taskId1>, <taskId2>
Local verification: passed|enhanced|failed-pushing-to-ci"

Apply the patch locally: nx-cloud apply-locally <shortLink> (this also updates state to APPLIED_LOCALLY)
Make additional changes as needed
Stage only the files you modified: git add <file1> <file2> ...

Commit and push:

git commit -m "fix: resolve <failedTaskIds>"
git push origin $(git branch --show-current)

Loop to poll for new CI Attempt

Call MCP to reject: update_self_healing_fix({ shortLink, action: "REJECT" })
Fix the issue from scratch locally
Stage only the files you modified: git add <file1> <file2> ...

Commit and push:

git commit -m "fix: resolve <failedTaskIds>"
git push origin $(git branch --show-current)

Loop to poll for new CI Attempt

Report to user:

[monitor-ci] No CI attempt for <sha> after 10 min. Check CI provider for pre-Nx failures (install, checkout, auth). Last CI attempt: <previousCipeUrl>

If user configured auto-fix attempts (e.g., --auto-fix-workflow):
- Detect package manager: check for pnpm-lock.yaml, yarn.lock, package-lock.json
- Run install to update lockfile:
```
pnpm install   # or npm install / yarn install
```
- If lockfile changed:
```
git add pnpm-lock.yaml  # or appropriate lockfile
git commit -m "chore: update lockfile"
git push origin $(git branch --show-current)
```
- Record new commit SHA, loop to poll with expectedCommitSha
Otherwise: Exit with no_new_cipe status, providing guidance for user to investigate

Report to user:

[monitor-ci] CI failed but no Nx tasks were recorded.
[monitor-ci] CI Attempt URL: <cipeUrl>
[monitor-ci]
[monitor-ci] This usually indicates an infrastructure issue. Attempting retry...

Create empty commit to retry CI:

git commit --allow-empty -m "chore: retry ci [monitor-ci]"
git push origin $(git branch --show-current)

Record expected_commit_sha, spawn subagent in wait mode

If retry also returns cipe_no_tasks:

Exit with failure

Provide guidance:

[monitor-ci] Retry failed. Please check:
[monitor-ci]   1. Nx Cloud UI: <cipeUrl>
[monitor-ci]   2. CI provider logs (GitHub Actions, GitLab CI, etc.)
[monitor-ci]   3. CI job timeout settings
[monitor-ci]   4. Memory/resource limits

Condition	Exit Type
CI passes (`cipeStatus == 'SUCCEEDED'`)	Success
Max agent-initiated cycles reached (after user declines ext)	Timeout
Max duration reached	Timeout
3 consecutive no-progress iterations	Circuit breaker
No fix available and local fix not possible	Failure
No new CI Attempt and auto-fix not configured	Pre-CI-Attempt failure
User cancels	Cancelled

cycle_count = 0            # Only incremented for agent-initiated cycles (counted against --max-cycles)
start_time = now()
no_progress_count = 0
local_verify_count = 0
last_state = null
last_cipe_url = null
expected_commit_sha = null
agent_triggered = false    # Set true after monitor takes an action that triggers new CI Attempt

Task(
  agent: "ci-monitor-subagent",
  run_in_background: true,
  prompt: "Monitor CI for branch '<branch>'.
           Subagent timeout: <subagent-timeout> minutes.
           New-CI-Attempt timeout: <new-cipe-timeout> minutes.
           Verbosity: <verbosity>."
)

Task(
  agent: "ci-monitor-subagent",
  run_in_background: true,
  prompt: "Monitor CI for branch '<branch>'.
           Subagent timeout: <subagent-timeout> minutes.
           New-CI-Attempt timeout: <new-cipe-timeout> minutes.
           Verbosity: <verbosity>.

           WAIT MODE: A new CI Attempt should spawn. Ignore old CI Attempt until new one appears.
           Expected commit SHA: <expected_commit_sha>
           Previous CI Attempt URL: <last_cipe_url>"
)

[monitor-ci] Checking subagent status... (elapsed: 1m)
[monitor-ci] CI: IN_PROGRESS | Self-healing: NOT_STARTED

[monitor-ci] Checking subagent status... (elapsed: 3m)
[monitor-ci] CI: FAILED | Self-healing: IN_PROGRESS
[monitor-ci] ⚡ CI failed — self-healing fix generation started

[monitor-ci] Checking subagent status... (elapsed: 5m)
[monitor-ci] CI: FAILED | Self-healing: COMPLETED | Verification: IN_PROGRESS
[monitor-ci] ⚡ Self-healing fix generated — verification started

Action	What to Track	Subagent Mode
Fix auto-applying	`last_cipe_url = current cipeUrl`	Wait mode
Apply via MCP	`last_cipe_url = current cipeUrl`	Wait mode
Apply locally + push	`expected_commit_sha = $(git rev-parse HEAD)`	Wait mode
Reject + fix + push	`expected_commit_sha = $(git rev-parse HEAD)`	Wait mode
Fix failed + local fix + push	`expected_commit_sha = $(git rev-parse HEAD)`	Wait mode
No fix + local fix + push	`expected_commit_sha = $(git rev-parse HEAD)`	Wait mode
Environment rerun	`last_cipe_url = current cipeUrl`	Wait mode
No-new-CI-Attempt + auto-fix + push	`expected_commit_sha = $(git rev-parse HEAD)`	Wait mode
CI Attempt no tasks + retry push	`expected_commit_sha = $(git rev-parse HEAD)`	Wait mode

[monitor-ci] New CI Attempt detected (human-initiated push). Monitoring without incrementing cycle count. (agent cycles: N/max-cycles)

[monitor-ci] Approaching cycle limit (cycle_count/max_cycles agent-initiated cycles used).
[monitor-ci] How would you like to proceed?
  1. Continue with 5 more cycles
  2. Continue with 10 more cycles
  3. Stop monitoring

Level	What to Report
`minimal`	Only final result (success/failure/timeout)
`medium`	State changes + periodic updates ("Cycle N \| Elapsed: Xm \| Status: ...")
`verbose`	All of medium + full subagent responses, git outputs, MCP responses

Instruction	Effect
"never auto-apply"	Always prompt before applying any fix
"always ask before git push"	Prompt before each push
"reject any fix for e2e tasks"	Auto-reject if `failedTaskIds` contains e2e
"apply all fixes regardless of verification"	Skip verification check, apply everything
"if confidence < 70, reject"	Check confidence field before applying
"run 'nx affected -t typecheck' before applying"	Add local verification step
"auto-fix workflow failures"	Attempt lockfile updates on pre-CI-Attempt failures
"wait 45 min for new CI Attempt"	Override new-CI-Attempt timeout (default: 10 min)

Error	Action
Git rebase conflict	Report to user, exit
`nx-cloud apply-locally` fails	Reject fix via MCP (`action: "REJECT"`), then attempt manual patch (Reject + Fix From Scratch Flow) or exit
MCP tool error	Retry once, if fails report to user
Subagent spawn failure	Retry once, if fails exit with error
No new CI Attempt detected	If `--auto-fix-workflow`, try lockfile update; otherwise report to user with guidance
Lockfile auto-fix fails	Report to user, exit with guidance to check CI logs

[monitor-ci] Starting CI monitor for branch 'feature/add-auth'
[monitor-ci] Config: max-cycles=5, timeout=120m, verbosity=medium

[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 1m)
[monitor-ci] CI: IN_PROGRESS | Self-healing: NOT_STARTED
[monitor-ci] Checking subagent status... (elapsed: 3m)
[monitor-ci] CI: FAILED | Self-healing: IN_PROGRESS
[monitor-ci] ⚡ CI failed — self-healing fix generation started
[monitor-ci] Checking subagent status... (elapsed: 5m)
[monitor-ci] CI: FAILED | Self-healing: COMPLETED | Verification: COMPLETED

[monitor-ci] Fix available! Verification: COMPLETED
[monitor-ci] Applying fix via MCP...
[monitor-ci] Fix applied in CI. Waiting for new CI attempt...

[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 7m)
[monitor-ci] ⚡ New CI Attempt detected!
[monitor-ci] Checking subagent status... (elapsed: 8m)
[monitor-ci] CI: SUCCEEDED

[monitor-ci] CI passed successfully!

[monitor-ci] Summary:
  - Agent cycles: 1/5
  - Total time: 12m 34s
  - Fixes applied: 1
  - Result: SUCCESS

[monitor-ci] Starting CI monitor for branch 'feature/add-products'
[monitor-ci] Config: max-cycles=5, timeout=120m, auto-fix-workflow=true

[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 2m)
[monitor-ci] CI: FAILED | Self-healing: COMPLETED

[monitor-ci] Fix available! Applying locally, enhancing, and pushing...
[monitor-ci] Committed: abc1234

[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 6m)
[monitor-ci] Waiting for new CI Attempt... (expected SHA: abc1234)
[monitor-ci] Checking subagent status... (elapsed: 12m)
[monitor-ci] ⚠️ CI Attempt timeout (10 min). Status: no_new_cipe

[monitor-ci] --auto-fix-workflow enabled. Attempting lockfile update...
[monitor-ci] Lockfile updated. Committed: def5678

[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 16m)
[monitor-ci] ⚡ New CI Attempt detected!
[monitor-ci] Checking subagent status... (elapsed: 18m)
[monitor-ci] CI: SUCCEEDED

[monitor-ci] CI passed successfully!

[monitor-ci] Summary:
  - Agent cycles: 3/5
  - Total time: 22m 15s
  - Fixes applied: 1 (self-healing) + 1 (lockfile)
  - Result: SUCCESS

[monitor-ci] Starting CI monitor for branch 'feature/refactor-api'
[monitor-ci] Config: max-cycles=5, timeout=120m, verbosity=medium

[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 4m)
[monitor-ci] CI: FAILED | Self-healing: COMPLETED

[monitor-ci] Fix available! Applying fix via MCP... (agent cycles: 0/5)
[monitor-ci] Fix applied in CI. Waiting for new CI attempt...

[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 8m)
[monitor-ci] ⚡ New CI Attempt detected!
[monitor-ci] CI: FAILED | Self-healing: COMPLETED

[monitor-ci] Agent-initiated cycle. (agent cycles: 1/5)
[monitor-ci] Fix available! Applying locally and enhancing...
[monitor-ci] Committed: abc1234

[monitor-ci] Spawning subagent to poll CI status...
  ... (user pushes their own changes to the branch while monitor waits) ...
[monitor-ci] Checking subagent status... (elapsed: 12m)
[monitor-ci] ⚡ New CI Attempt detected!
[monitor-ci] CI: FAILED | Self-healing: IN_PROGRESS

[monitor-ci] New CI Attempt detected (human-initiated push). Monitoring without incrementing cycle count. (agent cycles: 2/5)
[monitor-ci] Checking subagent status... (elapsed: 16m)
[monitor-ci] CI: FAILED | Self-healing: COMPLETED

[monitor-ci] Fix available! Applying via MCP... (agent cycles: 2/5)
  ... (continues, human cycles don't eat into the budget) ...

Condition	Path
No unverified tasks (all verified)	Apply via MCP
Unverified tasks exist, but ALL are e2e	Apply via MCP (treat as verified enough)
Verifiable tasks exist	Local verification flow

Result	Action
ALL verifiable tasks pass	Apply via MCP
ANY verifiable task fails	Apply-locally + enhance flow

Monitor Ci

Monitor CI Command

Context

User Instructions

Configuration Defaults

Monitor Ci

Monitor CI Command

Context

User Instructions

Configuration Defaults

Nx Cloud Connection Check

Step 0: Verify Nx Cloud Connection

Anti-Patterns (NEVER DO)

Session Context Behavior

Default Behaviors by Status

Fix Available Decision Logic

Step 1: Categorize Tasks

Step 2: Determine Path

Step 3a: Apply via MCP (fully/e2e-only verified)

Step 3b: Local Verification Flow

Commit Message Format

Unverified Fix Flow (No Verification Attempted)

Auto-Apply Eligibility

Accidental Local Fix Recovery

Apply vs Reject vs Apply Locally

Apply Locally + Enhance Flow

Reject + Fix From Scratch Flow

Environment Issue Handling

Throttled Self-Healing Flow

No-New-CI-Attempt Handling

CI-Attempt-No-Tasks Handling

Exit Conditions

Main Loop

Step 1: Initialize Tracking

Step 2: Spawn Subagent and Monitor Output

Step 2a: Active Output Monitoring (CRITICAL)

Step 3: Handle Subagent Response

Step 3a: Track State for New-CI-Attempt Detection

Step 4: Cycle Classification and Progress Tracking

Cycle Classification

Approaching Limit Gate

Progress Tracking

Status Reporting

User Instruction Examples

Error Handling

Example Session

Example 1: Normal Flow with Self-Healing (medium verbosity)

Example 2: Pre-CI Failure (medium verbosity)

Example 3: Human-in-the-Loop (user pushes during monitoring)

Mcporter

Sonoscli

Openhue

Healthcheck

Things Mac

Eightctl