Name: Pr Test
Author: Significant-Gravitas

搜索技能.../

Pr Test | Skills Pool

Use Redis CLI to set counters directly:

# Find the Redis container
REDIS_CONTAINER=$(docker ps --format '{{.Names}}' | grep redis | head -1)
# Set a key with expiry
docker exec $REDIS_CONTAINER redis-cli SET key value EX ttl
# Example: Set rate limit counter to near-limit
docker exec $REDIS_CONTAINER redis-cli SET "rate_limit:user:[email protected]" 99 EX 3600
# Example: Check current value
docker exec $REDIS_CONTAINER redis-cli GET "rate_limit:user:[email protected]"

Use API calls to check before/after state:

# BEFORE: Record current state
BEFORE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
echo "Credits BEFORE: $BEFORE"

# Perform the action...

# AFTER: Record new state and compare
AFTER=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
echo "Credits AFTER: $AFTER"
echo "Delta: $(( BEFORE - AFTER ))"

Take screenshots BEFORE and AFTER state changes — the UI must reflect the backend state change
Never rely on mocked/injected browser state — always use real backend state. Do NOT use agent-browser eval to fake UI state. The backend must be the source of truth.

Use direct DB queries when needed:

# Query via Supabase's PostgREST or docker exec into the DB
docker exec supabase-db psql -U supabase_admin -d postgres -c "SELECT credits FROM user_credits WHERE user_id = '...';"

After every API test, verify the state change actually persisted:

# Example: After a credits purchase, verify DB matches API
API_CREDITS=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
DB_CREDITS=$(docker exec supabase-db psql -U supabase_admin -d postgres -t -c "SELECT credits FROM user_credits WHERE user_id = '...';" | tr -d ' ')
[ "$API_CREDITS" = "$DB_CREDITS" ] && echo "CONSISTENT" || echo "MISMATCH: API=$API_CREDITS DB=$DB_CREDITS"

# If argument is a PR number, find its worktree
gh pr view {N} --json headRefName --jq '.headRefName'
# If argument is a path, use it directly

PR_NUMBER=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json number --jq '.[0].number')
PR_TITLE=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json title --jq '.[0].title' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | head -c 50)
RESULTS_DIR="$REPO_ROOT/test-results/PR-${PR_NUMBER}-${PR_TITLE}"
mkdir -p $RESULTS_DIR

cd $WORKTREE_PATH

# Read PR description to understand the WHY
gh pr view {N} --json body --jq '.body'

git log --oneline dev..HEAD | head -20
git diff dev --stat

# Test Plan: PR #{N} — {title}

## Scenarios
1. [Scenario name] — [what to verify]
2. ...

## API Tests (if applicable)
1. [Endpoint] — [expected behavior]
   - Before state: [what to check before]
   - After state: [what to verify changed]

## UI Tests (if applicable)
1. [Page/component] — [interaction to test]
   - Screenshot before: [what to capture]
   - Screenshot after: [what to capture]

## Negative Tests (REQUIRED — at least one per feature)
1. [What should NOT happen] — [how to trigger it]
   - Expected error: [what error message/code]
   - State unchanged: [what to verify did NOT change]

# CRITICAL: .env files are NOT checked into git. They must be copied manually.
cp $REPO_ROOT/autogpt_platform/.env $PLATFORM_DIR/.env
cp $REPO_ROOT/autogpt_platform/backend/.env $BACKEND_DIR/.env
cp $REPO_ROOT/autogpt_platform/frontend/.env $FRONTEND_DIR/.env

# Extracts OAuth tokens and writes CLAUDE_CODE_OAUTH_TOKEN + CLAUDE_CODE_REFRESH_TOKEN into .env
bash $BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env

# In $BACKEND_DIR/.env, ensure these are set:
CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false
CHAT_API_KEY=<value of OPEN_ROUTER_API_KEY from the same .env>
CHAT_BASE_URL=https://openrouter.ai/api/v1
CHAT_USE_CLAUDE_AGENT_SDK=true

ORKEY=$(grep "^OPEN_ROUTER_API_KEY=" $BACKEND_DIR/.env | cut -d= -f2)
[ -n "$ORKEY" ] || { echo "ERROR: OPEN_ROUTER_API_KEY is missing in $BACKEND_DIR/.env"; exit 1; }
perl -i -pe 's/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false/' $BACKEND_DIR/.env
# Add or update CHAT_API_KEY and CHAT_BASE_URL
grep -q "^CHAT_API_KEY=" $BACKEND_DIR/.env && perl -i -pe "s|^CHAT_API_KEY=.*|CHAT_API_KEY=$ORKEY|" $BACKEND_DIR/.env || echo "CHAT_API_KEY=$ORKEY" >> $BACKEND_DIR/.env
grep -q "^CHAT_BASE_URL=" $BACKEND_DIR/.env && perl -i -pe 's|^CHAT_BASE_URL=.*|CHAT_BASE_URL=https://openrouter.ai/api/v1|' $BACKEND_DIR/.env || echo "CHAT_BASE_URL=https://openrouter.ai/api/v1" >> $BACKEND_DIR/.env

# Stop any running app containers (keep infra: supabase, redis, rabbitmq, clamav)
docker ps --format "{{.Names}}" | grep -E "rest_server|executor|copilot|websocket|database_manager|scheduler|notification|frontend|migrate" | while read name; do
  docker stop "$name" 2>/dev/null
done

cd $PLATFORM_DIR && docker compose build --no-cache 2>&1 | tail -20
if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker build failed"; exit 1; fi

cd $PLATFORM_DIR && docker compose up -d 2>&1 | tail -20
if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker compose up failed"; exit 1; fi

# Poll until backend and frontend respond
for i in $(seq 1 60); do
  BACKEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8006/docs 2>/dev/null)
  FRONTEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000 2>/dev/null)
  if [ "$BACKEND" = "200" ] && [ "$FRONTEND" = "200" ]; then
    echo "Services ready"
    break
  fi
  sleep 5
done

ANON_KEY=$(grep "NEXT_PUBLIC_SUPABASE_ANON_KEY=" $FRONTEND_DIR/.env | sed 's/.*NEXT_PUBLIC_SUPABASE_ANON_KEY=//' | tr -d '[:space:]')

# Signup (idempotent — returns "User already registered" if exists)
RESULT=$(curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
  -H "apikey: $ANON_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"email":"[email protected]","password":"testtest123"}')

# If "Database error finding user", restart supabase-auth and retry
if echo "$RESULT" | grep -q "Database error"; then
  docker restart supabase-auth && sleep 5
  curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
    -H "apikey: $ANON_KEY" \
    -H 'Content-Type: application/json' \
    -d '{"email":"[email protected]","password":"testtest123"}'
fi

# Get auth token
TOKEN=$(curl -s -X POST 'http://localhost:8000/auth/v1/token?grant_type=password' \
  -H "apikey: $ANON_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"email":"[email protected]","password":"testtest123"}' | jq -r '.access_token // ""')

curl -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/...

ONBOARDING_RESULT=$(curl -s --max-time 30 -X POST \
  "http://localhost:8006/api/onboarding/step?step=VISIT_COPILOT" \
  -H "Authorization: Bearer $TOKEN")
echo "Onboarding bypass: $ONBOARDING_RESULT"

# Verify it took effect
ONBOARDING_STATUS=$(curl -s --max-time 30 \
  "http://localhost:8006/api/onboarding/completed" \
  -H "Authorization: Bearer $TOKEN" | jq -r '.is_completed')
echo "Onboarding completed: $ONBOARDING_STATUS"
if [ "$ONBOARDING_STATUS" != "true" ]; then
  echo "ERROR: onboarding bypass failed — browser tests will hit /onboarding instead of the target feature. Investigate before proceeding."
  exit 1
fi

Service	Port	URL
Frontend	3000	http://localhost:3000
Backend REST	8006	http://localhost:8006
Supabase Auth (via Kong)	8000	http://localhost:8000
Executor	8002	http://localhost:8002
Copilot Executor	8008	http://localhost:8008
WebSocket	8001	http://localhost:8001
Database Manager	8005	http://localhost:8005
Redis	6379	localhost:6379
RabbitMQ	5672	localhost:5672

# Example: List agents
curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/graphs | jq . | head -20

# Example: Create an agent
curl -s -X POST http://localhost:8006/api/graphs \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{...}' | jq .

# Example: Run an agent
curl -s -X POST "http://localhost:8006/api/graphs/{graph_id}/execute" \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"data": {...}}'

# Example: Get execution results
curl -s -H "Authorization: Bearer $TOKEN" \
  "http://localhost:8006/api/graphs/{graph_id}/executions/{exec_id}" | jq .

# 1. Record BEFORE state
BEFORE_STATE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/{resource} | jq '{relevant_fields}')
echo "BEFORE: $BEFORE_STATE"

# 2. Perform the action
ACTION_RESULT=$(curl -s -X POST ... | jq .)
echo "ACTION RESULT: $ACTION_RESULT"

# 3. Record AFTER state
AFTER_STATE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/{resource} | jq '{relevant_fields}')
echo "AFTER: $AFTER_STATE"

# 4. Log the comparison
echo "=== STATE CHANGE VERIFICATION ==="
echo "Before: $BEFORE_STATE"
echo "After: $AFTER_STATE"
echo "Expected change: {describe what should have changed}"

# Close any existing session
agent-browser close 2>/dev/null || true

# Use --session-name to persist cookies across navigations
# This means login only needs to happen once per test session
agent-browser --session-name pr-test open 'http://localhost:3000/login' --timeout 15000

# Get interactive elements
agent-browser --session-name pr-test snapshot | grep "textbox\|button"

# Login
agent-browser --session-name pr-test fill {email_ref} "[email protected]"
agent-browser --session-name pr-test fill {password_ref} "testtest123"
agent-browser --session-name pr-test click {login_button_ref}
sleep 5

# Dismiss cookie banner if present
agent-browser --session-name pr-test click 'text=Accept All' 2>/dev/null || true

# Navigate — cookies are preserved so login persists
agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000

# Take screenshot
agent-browser --session-name pr-test screenshot $RESULTS_DIR/01-page.png

# Interact with elements
agent-browser --session-name pr-test fill {ref} "text"
agent-browser --session-name pr-test press "Enter"
agent-browser --session-name pr-test click {ref}
agent-browser --session-name pr-test click 'text=Button Text'

# Read page content
agent-browser --session-name pr-test snapshot | grep "text:"

# Backend REST server
docker logs autogpt_platform-rest_server-1 2>&1 | tail -30

# Executor (runs agent graphs)
docker logs autogpt_platform-executor-1 2>&1 | tail -30

# Copilot executor (runs copilot chat sessions)
docker logs autogpt_platform-copilot_executor-1 2>&1 | tail -30

# Frontend
docker logs autogpt_platform-frontend-1 2>&1 | tail -30

# Filter for errors
docker logs autogpt_platform-executor-1 2>&1 | grep -i "error\|exception\|traceback" | tail -20

# Create a session
SESSION_ID=$(curl -s -X POST 'http://localhost:8006/api/chat/sessions' \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{}' | jq -r '.id // .session_id // ""')

# Stream a message (SSE - will stream chunks)
curl -N -X POST "http://localhost:8006/api/chat/sessions/$SESSION_ID/stream" \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"message": "Hello, what can you help me with?"}' \
  --max-time 60 2>/dev/null | head -50

agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
# ... fill chat input and press Enter, wait 20-30s for response

# BEFORE the action
agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{scenario}-before.png

# Perform the action...

# AFTER the action
agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{scenario}-after.png

# Examples:
# $RESULTS_DIR/01-login-page-before.png
# $RESULTS_DIR/02-login-page-after.png
# $RESULTS_DIR/03-credits-page-before.png
# $RESULTS_DIR/04-credits-purchase-after.png
# $RESULTS_DIR/05-negative-insufficient-credits.png
# $RESULTS_DIR/06-error-state.png

### Screenshot 1: {descriptive title}
[Read the PNG file here]

**What it shows:** {1-2 sentence explanation of what this screenshot proves}

---

# Build these variables during Step 6 — they are required by Step 7's script
# NOTE: declare -A requires Bash 4.0+. This is standard on modern systems (macOS ships zsh
# but Homebrew bash is 5.x; Linux typically has bash 5.x). If running on Bash <4, use a
# plain variable with a lookup function instead.
declare -A SCREENSHOT_EXPLANATIONS=(
  ["01-login-page.png"]="Shows the login page loaded successfully with SSO options visible."
  ["02-builder-with-block.png"]="The builder canvas displays the newly added block connected to the trigger."
  # ... one entry per screenshot, using the same explanations you showed the user above
)

TEST_RESULTS_TABLE="| 1 | Login flow | PASS | N/A | 01-login-before.png, 02-login-after.png |
| 2 | Credits purchase | PASS | Before: 100, After: 95 | 03-credits-before.png, 04-credits-after.png |
| 3 | Insufficient credits (negative) | PASS | Credits: 0, rejected | 05-insufficient-credits-error.png |"
# ... one row per test scenario with actual results

# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
REPO="Significant-Gravitas/AutoGPT"
SCREENSHOTS_BRANCH="test-screenshots/pr-${PR_NUMBER}"
SCREENSHOTS_DIR="test-screenshots/PR-${PR_NUMBER}"

# Step 1: Create blobs for each screenshot and build tree JSON
# Retry each blob upload up to 3 times. If still failing, list them at end of report.
shopt -s nullglob
SCREENSHOT_FILES=("$RESULTS_DIR"/*.png)
if [ ${#SCREENSHOT_FILES[@]} -eq 0 ]; then
  echo "ERROR: No screenshots found in $RESULTS_DIR. Test run is incomplete."
  exit 1
fi
TREE_JSON='['
FIRST=true
FAILED_UPLOADS=()
for img in "${SCREENSHOT_FILES[@]}"; do
  BASENAME=$(basename "$img")
  B64=$(base64 < "$img")
  BLOB_SHA=""
  for attempt in 1 2 3; do
    BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha' 2>/dev/null || true)
    [ -n "$BLOB_SHA" ] && break
    sleep 1
  done
  if [ -z "$BLOB_SHA" ]; then
    FAILED_UPLOADS+=("$img")
    continue
  fi
  if [ "$FIRST" = true ]; then FIRST=false; else TREE_JSON+=','; fi
  TREE_JSON+="{\"path\":\"${SCREENSHOTS_DIR}/${BASENAME}\",\"mode\":\"100644\",\"type\":\"blob\",\"sha\":\"${BLOB_SHA}\"}"
done
TREE_JSON+=']'

# Step 2: Create tree, commit, and branch ref
TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')

# Resolve parent commit so screenshots are chained, not orphan root commits
PARENT_SHA=$(gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" --jq '.object.sha' 2>/dev/null || echo "")
if [ -n "$PARENT_SHA" ]; then
  COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
    -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
    -f tree="$TREE_SHA" \
    -f "parents[]=$PARENT_SHA" \
    --jq '.sha')
else
  COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
    -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
    -f tree="$TREE_SHA" \
    --jq '.sha')
fi

gh api "repos/${REPO}/git/refs" \
  -f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
  -f sha="$COMMIT_SHA" 2>/dev/null \
  || gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" \
    -X PATCH -f sha="$COMMIT_SHA" -F force=true

REPO_URL="https://raw.githubusercontent.com/${REPO}/${SCREENSHOTS_BRANCH}"

# Build image markdown using uploaded image URLs; skip FAILED_UPLOADS (listed separately)

IMAGE_MARKDOWN=""
for img in "${SCREENSHOT_FILES[@]}"; do
  BASENAME=$(basename "$img")
  TITLE=$(echo "${BASENAME%.png}" | sed 's/^[0-9]*-//' | sed 's/-/ /g' | awk '{for(i=1;i<=NF;i++) $i=toupper(substr($i,1,1)) tolower(substr($i,2))}1')
  # Skip images that failed to upload — they will be listed at the end
  IS_FAILED=false
  for failed in "${FAILED_UPLOADS[@]}"; do
    [ "$(basename "$failed")" = "$BASENAME" ] && IS_FAILED=true && break
  done
  if [ "$IS_FAILED" = true ]; then
    continue
  fi
  EXPLANATION="${SCREENSHOT_EXPLANATIONS[$BASENAME]}"
  if [ -z "$EXPLANATION" ]; then
    echo "ERROR: Missing screenshot explanation for $BASENAME. Add it to SCREENSHOT_EXPLANATIONS in Step 6."
    exit 1
  fi
  IMAGE_MARKDOWN="${IMAGE_MARKDOWN}
### ${TITLE}
![${BASENAME}](${REPO_URL}/${SCREENSHOTS_DIR}/${BASENAME})
${EXPLANATION}
"
done

# Write comment body to file to avoid shell interpretation issues with special characters
COMMENT_FILE=$(mktemp)
# If any uploads failed, append a section listing them with instructions
FAILED_SECTION=""
if [ ${#FAILED_UPLOADS[@]} -gt 0 ]; then
  FAILED_SECTION="
## ⚠️ Failed Screenshot Uploads
The following screenshots could not be uploaded via the GitHub API after 3 retries.
**To add them:** drag-and-drop or paste these files into a PR comment manually:
"
  for failed in "${FAILED_UPLOADS[@]}"; do
    FAILED_SECTION="${FAILED_SECTION}
- \`$(basename "$failed")\` (local path: \`$failed\`)"
  done
  FAILED_SECTION="${FAILED_SECTION}

**Run status:** INCOMPLETE until the files above are manually attached and visible inline in the PR."
fi

cat > "$COMMENT_FILE" <<INNEREOF
## E2E Test Report

| # | Scenario | Result | API Evidence | Screenshot Evidence |
|---|----------|--------|-------------|-------------------|
${TEST_RESULTS_TABLE}

${IMAGE_MARKDOWN}
${FAILED_SECTION}
INNEREOF

gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
rm -f "$COMMENT_FILE"

# Verify the posted comment contains inline images — exit 1 if none found
# Use separate --paginate + jq pipe: --jq applies per-page, not to the full list
LAST_COMMENT=$(gh api "repos/${REPO}/issues/$PR_NUMBER/comments" --paginate 2>/dev/null | jq -r '.[-1].body // ""')
if ! echo "$LAST_COMMENT" | grep -q '!\['; then
  echo "ERROR: Posted comment contains no inline images (![). Bare directory links are not acceptable." >&2
  exit 1
fi
echo "✓ Inline images verified in posted comment"

gh pr view "$PR_NUMBER" --json body --jq '.body' --repo "$REPO"

Criterion	Pass condition
Coverage	Every feature/change described in the PR has at least one test scenario
All scenarios pass	No FAIL rows in the results table
Negative tests	At least one failure-path test per feature (invalid input, unauthorized, edge case)
Before/after evidence	Every state-changing API call has before/after values logged
Screenshots are meaningful	Screenshots show the actual state change, not just a loading spinner or blank page
No regressions	Existing core flows (login, agent create/run) still work

ALL criteria pass                            → APPROVE
Any scenario FAIL or missing PR feature      → REQUEST_CHANGES (list gaps)
Evidence weak (no before/after, vague shots) → REQUEST_CHANGES (list what's missing)

REVIEW_FILE=$(mktemp)

# Count results
PASS_COUNT=$(echo "$TEST_RESULTS_TABLE" | grep -c "PASS" || true)
FAIL_COUNT=$(echo "$TEST_RESULTS_TABLE" | grep -c "FAIL" || true)
TOTAL=$(( PASS_COUNT + FAIL_COUNT ))

# List any coverage gaps found during evaluation (populate this array as you assess)
# e.g. COVERAGE_GAPS=("PR claims to add X but no test covers it")
COVERAGE_GAPS=()

cat > "$REVIEW_FILE" <<REVIEWEOF
## E2E Test Evaluation — APPROVED

**Results:** ${PASS_COUNT}/${TOTAL} scenarios passed.

**Coverage:** All features described in the PR were exercised.

**Evidence:** Before/after API values logged for all state-changing operations; screenshots show meaningful state transitions.

**Negative tests:** Failure paths tested for each feature.

No regressions observed on core flows.
REVIEWEOF

gh pr review "$PR_NUMBER" --repo "$REPO" --approve --body "$(cat "$REVIEW_FILE")"
echo "✅ PR approved"

FAIL_LIST=$(echo "$TEST_RESULTS_TABLE" | grep "FAIL" | awk -F'|' '{print "- Scenario" $2 "failed"}' || true)

cat > "$REVIEW_FILE" <<REVIEWEOF
## E2E Test Evaluation — Changes Requested

**Results:** ${PASS_COUNT}/${TOTAL} scenarios passed, ${FAIL_COUNT} failed.

### Required before merge

${FAIL_LIST}
$(for gap in "${COVERAGE_GAPS[@]}"; do echo "- $gap"; done)

Please fix the above and re-run the E2E tests.
REVIEWEOF

gh pr review "$PR_NUMBER" --repo "$REPO" --request-changes --body "$(cat "$REVIEW_FILE")"
echo "❌ Changes requested"

rm -f "$REVIEW_FILE"

Identify the root cause in the code — read the relevant source files
Write a failing test first (TDD): For backend bugs, write a test marked with pytest.mark.xfail(reason="..."). For frontend/Playwright bugs, write a test with .fixme annotation. Run it to confirm it fails as expected.
Screenshot the broken state: agent-browser screenshot $RESULTS_DIR/{NN}-broken-{description}.png
Fix the code in the worktree

Rebuild ONLY the affected service (not the whole stack):

cd $PLATFORM_DIR && docker compose up --build -d {service_name}
# e.g., docker compose up --build -d rest_server
# e.g., docker compose up --build -d frontend

Wait for the service to be ready (poll health endpoint)
Re-test the same scenario
Screenshot the fixed state: agent-browser screenshot $RESULTS_DIR/{NN}-fixed-{description}.png
Remove the xfail/fixme marker from the test written in step 2, and verify it passes
Verify the fix did not break other scenarios (run a quick smoke test)
Commit and push immediately:

cd $WORKTREE_PATH
git add -A
git commit -m "fix: {description of fix}"
git push

test scenario → find issue (bug OR UX problem) → screenshot broken state
→ fix code → rebuild affected service only → re-test → screenshot fixed state
→ verify no regressions → commit + push
→ repeat for next scenario
→ after ALL scenarios pass, run full re-test to verify everything together

#	Scenario	Result	API Evidence	Screenshot Evidence
1	{name}	PASS/FAIL	Before: X, After: Y	01-before.png, 02-after.png
2	...	...	...	...

Pr Test

Manual E2E Test

Critical Requirements

1. Screenshots at Every Step

2. Screenshots MUST Be Posted to PR

Pr Test

Manual E2E Test

Critical Requirements

1. Screenshots at Every Step

2. Screenshots MUST Be Posted to PR

3. State Verification with Before/After Evidence

4. Negative Test Cases Are Mandatory

5. Test Report Must Include Full Evidence

State Manipulation for Realistic Testing

Arguments

Step 0: Resolve the target

Step 1: Understand the PR

Step 2: Write test scenarios

Step 3: Environment setup

3a. Copy .env files from the root worktree

3b. Configure copilot authentication

Option 1: Subscription mode (preferred — uses your Claude Max/Pro subscription)

Option 2: OpenRouter API key mode (fallback)

3c. Stop conflicting containers

3e. Build and start

3f. Wait for services to be ready

3h. Create test user and get auth token

3i. Disable onboarding for test user

Step 4: Run tests

Service ports reference

API testing

Browser testing with agent-browser

Checking logs

Copilot chat testing

Step 5: Record results and take screenshots

Step 6: Show results to user with screenshots

Step 7: Post test report as PR comment with screenshots

Step 8: Evaluate and post a formal PR review

Evaluation criteria

Decision logic

Post the review

Fix mode (--fix flag)

Fix protocol for EVERY issue found (including UX issues):

Fix loop (like pr-address)

Known issues and workarounds

Problem: "Database error finding user" on signup

Problem: Copilot returns auth errors in subscription mode

Problem: agent-browser can't find chromium

Problem: agent-browser selector matches multiple elements

Problem: Frontend shows cookie banner blocking interaction

Problem: Container loses npm packages after rebuild

Problem: Services not starting after docker compose up

Problem: Docker uses cached layers with old code (PR changes not visible)

Problem: agent-browser open loses login session

Problem: Supabase auth returns "Database error querying schema"

Session Logs

OpenClaw Test Heap Leaks

Node Connect

Openclaw Qa Testing

Openclaw Secret Scanning Maintainer

Flags

Problem: Services not starting after `docker compose up`

Problem: `agent-browser open` loses login session