Name: Debug Remote Ci Failures
Author: speediedan

스킬 검색.../

Debug Remote Ci Failures | Skills Pool

# Find the failed workflow run
gh run list --branch <branch-name> --limit 10

# Download all artifacts from the run
mkdir -p /tmp/ci_artifacts_<run_id>
gh run download <run_id> --dir /tmp/ci_artifacts_<run_id>

# If artifacts aren't available, use run logs directly
gh run view <run_id> --log > /tmp/ci_run_<run_id>.log 2>&1

# For runner-shutdown cases, download the raw job log directly
job_id=$(gh api "repos/<owner>/<repo>/actions/runs/<run_id>/jobs" --jq '.jobs[] | select(.name | contains("ubuntu")) | .id')
gh api "repos/<owner>/<repo>/actions/jobs/${job_id}/logs" > /tmp/ci_job_<job_id>.log

# Get just the FAILURES section
grep -A 200 "^FAILURES" /tmp/ci_artifacts_<run_id>/<artifact>/output.txt

# Count failures per OS
grep -c "FAILED" /tmp/ci_artifacts_<run_id>/<os>-*/output.txt

# Get unique test names that failed
grep "^FAILED" /tmp/ci_artifacts_<run_id>/<os>-*/output.txt | sort -u

grep -nE "analysis_resource_debug|op_serialization_resource_debug|shutdown signal|exit code 143" /tmp/ci_job_<job_id>.log

# Before (Windows fails)
summary.append(f"  \u2713 {name}: validated")

# After (cross-platform)
summary.append(f"  [OK] {name}: validated")

# Compare local vs CI package versions
# Local:
pip show transformer-lens sae-lens circuit-tracer nnsight

# CI (from log artifacts):
grep "transformer.lens\|sae.lens\|circuit.tracer\|nnsight" /tmp/ci_artifacts_<run_id>/*/output.txt

# Check for discrepancies between pyproject.toml pins and override files
diff <(grep "TransformerLens\|SAELens" pyproject.toml) <(cat requirements/ci/overrides.txt)

# Regenerate lock file after updating pins
./requirements/utils/lock_ci_requirements.sh

# Before (OOMs on CI when datasets hashes gen_kwargs):
dataset = Dataset.from_generator(generator_fn, gen_kwargs=gen_kwargs, features=features, ...)

# After (generator-backed, no dill hashing of module/model state):
dataset = Dataset.from_generator(
  generator_fn,
  gen_kwargs=gen_kwargs,
  features=features,
  fingerprint=generate_random_fingerprint(),
)

# Download full job log (artifacts may not contain useful data for segfaults)
job_id=$(gh api "repos/<owner>/<repo>/actions/runs/<run_id>/jobs" --jq '.jobs[] | select(.name | contains("windows")) | .id')
gh api "repos/<owner>/<repo>/actions/jobs/${job_id}/logs" > /tmp/windows_log.txt

# Look for segfault and thread dumps
grep -n "Segmentation fault\|Current thread\|faulthandler" /tmp/windows_log.txt

# Check step conclusions
gh run view <run_id> --json jobs --jq '.jobs[] | select(.name | contains("ubuntu")) | .steps[] | {name: .name, conclusion: .conclusion}'

# Compare job duration (suspiciously round = likely killed)
gh run view <run_id> --json jobs --jq '.jobs[] | {name: .name, started: .startedAt, completed: .completedAt}'

Is the failure caused by wrong dependency version?
  Yes -> Update pins, push, wait for CI before fixing other failures
  No  -> Is it a platform encoding issue?
    Yes -> Replace Unicode with ASCII, commit with other changes
    No  -> Is it a test infrastructure issue?
      Yes -> Fix test code, not application code
      No  -> Investigate application code

# Commit and push
git add -A && git commit -m "fix: <descriptive message>"
git push origin <branch>

# Find the new Test full run
gh run list --branch <branch> --limit 5

# Monitor progress
gh run view <run_id> --json jobs --jq '.jobs[] | {name: .name, status: .status, conclusion: .conclusion}'

# Once complete, check results
gh run view <run_id> --json jobs --jq '.jobs[] | select(.conclusion == "failure") | .name'

# First attempt may fail
git add -A && git commit -m "fix: ..."
# Output: "fix end of files... Failed"

# Files were auto-fixed — just re-add and commit
git add -A && git commit -m "fix: ..."

# Get pass/fail counts from a completed run
gh run view <run_id> --log 2>/dev/null | grep -E "passed|failed|error" | tail -5

# Download artifacts from failed run
gh run download <run_id> --dir /tmp/ci_artifacts_<run_id>

# Parse test results
for f in /tmp/ci_artifacts_<run_id>/*/output.txt; do
  echo "=== $(dirname $f | xargs basename) ==="
  grep "^FAILED" "$f" | wc -l
  grep "^FAILED" "$f"
done

# Compare failures across OS runners
diff <(grep "^FAILED" /tmp/ci_artifacts/windows-*/output.txt | sed 's/.*FAILED //' | sort) \
     <(grep "^FAILED" /tmp/ci_artifacts/macos-*/output.txt | sed 's/.*FAILED //' | sort)

Debug Remote Ci Failures

Debug Remote CI Failures Skill

When to Use This Skill

Prerequisites

Workflow Overview

Debug Remote Ci Failures

Debug Remote CI Failures Skill

When to Use This Skill

Prerequisites

Workflow Overview

Step 1: Download CI Artifacts

Extracting Test Results from Logs

Step 2: Categorize Failures

Category A: Platform-Specific Encoding Issues

Category B: Dependency Version Mismatch

Category C: Numerical Precision / Backend Parity

Category D: Test Infrastructure Issues

Category E: Memory Exhaustion During Dataset Fingerprinting

Category G: Fingerprint Scope vs Persistent Cache Semantics

Category H: CPython Segfault in Upstream Threading (nnsight interleaver)

Category F: Unexplained Step Cancellation / Runner Shutdown

Category I: Analysis Setup Dies Before the Existing Serialization Marker

Category J: Function-Scoped Analysis Fixtures Need Reuse Without Rehydrating Heavy State

Step 3: Prioritize Fixes

Decision Framework

Step 4: Fix, Commit, Push, Verify Loop

Efficient CI Iteration

Pre-commit Hook Awareness

Handling Auto-Cancellation

Step 5: Analyzing CI Results

Quick Status Check

Downloading Failure Artifacts

Cross-Platform Failure Comparison

Lessons Learned from PR #197

1. Dependency Pin Consistency is Critical

2. Don't Chase Symptoms Before Fixing Dependencies

3. Unicode in Test Output is a Windows Landmine

4. Type Checking is a Separate CI Job

5. Pre-commit Hooks May Modify Files

6. SAE Hook Name Resolution Varies by Model Backend

7. Avoid Passing Full Modules Through HF Datasets Fingerprinting

8. "Cancelled" Steps With No Output Usually Mean OOM Kill

9. Runtime Fingerprints Do Not Solve Persistent Caching

10. Upstream Threading Crashes May Only Surface on Specific Platforms

Related Files

Github

Openclaw Parallels Smoke

Update Screenshots

Azure Pipelines

Deployment Patterns

Deployment Patterns