Two-phase cleanup of duplicate and outdated issue files in docs/issues/. Phase 1 uses Python script for fast pattern matching. Phase 2 uses claude -p for semantic analysis on suspects only.
# Phase 1: Run Python scanner (fast, free)
python3 .github/scripts/issue-scanner.py
# Phase 1: Get suspects only (for Phase 2 input)
python3 .github/scripts/issue-scanner.py --suspects-only
# Phase 1: JSON output (for automation)
python3 .github/scripts/issue-scanner.py --json
# Phase 1: Validation check (CI integration, exit 1 if errors)
python3 .github/scripts/issue-scanner.py --check
docs/harness/automations.yml contains issue-gc-reviewsettings/harness → Cleanup & Correctionpython3 .github/scripts/issue-scanner.py --suspects-onlyProblem: Running deep AI analysis on every issue is expensive.
Solution: Two-phase approach:
claude -p only on suspects┌─────────────────────────────────────────────────────────┐
│ All Issues (N files) │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Phase 1: Python Scanner (.github/scripts/issue-scanner.py)│ │
│ │ - Filename keyword extraction │ │
│ │ - YAML front-matter validation │ │
│ │ - Same area + keyword overlap detection │ │
│ │ - Age-based staleness check │ │
│ │ → Output: Suspect list (M files, M << N) │ │
│ └───────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Phase 2: Deep Analysis (claude -p, only M files) │ │
│ │ - Content similarity │ │
│ │ - Semantic duplicate detection │ │
│ │ - Merge recommendations │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Run python3 .github/scripts/issue-scanner.py to get:
====================================================================================================
📋 ISSUE SCANNER REPORT
====================================================================================================
📊 ISSUE TABLE:
----------------------------------------------------------------------------------------------------
Status Sev Date Area Title
----------------------------------------------------------------------------------------------------
✅ resolv 🟠 2026-03-02 background-worker HMR 导致 sessionToTask 内存 Map 丢失
🔴 open 🟡 2026-03-04 ui Task Execute button disabled
...
----------------------------------------------------------------------------------------------------
Total: 12 issues
📈 SUMMARY BY STATUS:
🔴 open: 5
✅ resolved: 7
If any issue has malformed front-matter, the scanner reports:
❌ VALIDATION ERRORS (need AI fix):
------------------------------------------------------------
2026-03-08-broken-issue.md:
- Missing required field: area
- Invalid status: pending (valid: ['open', 'investigating', 'resolved', 'wontfix', 'duplicate'])
Action: Ask AI to fix the file:
claude -p "Fix the front-matter in docs/issues/2026-03-08-broken-issue.md. Add missing 'area' field and change status to a valid value."
The scanner automatically detects:
| Type | Detection Rule | Example |
|---|---|---|
| Duplicate | Same area + ≥2 common keywords | hmr-task vs task-hmr-recovery |
| Stale | open > 30 days | Issue from 2026-01-15 still open |
| Stale | investigating > 14 days | Stuck investigation |
Output:
⚠️ SUSPECTS (need Phase 2 deep analysis):
------------------------------------------------------------
🔗 Potential Duplicates:
- 2026-03-02-hmr-resets-session-to-task-map.md
↔ 2026-03-08-background-task-hmr-recovery.md
Reason: Same area 'background-worker', keywords: {'task', 'hmr'}
⏰ Stale Issues:
- 2026-02-01-old-bug.md: Open for 35 days (>30)
# Get suspects as JSON for scripting
python3 .github/scripts/issue-scanner.py --suspects-only
Output:
[
{
"file_a": "2026-03-02-hmr-resets-session-to-task-map.md",
"file_b": "2026-03-08-background-task-hmr-recovery.md",
"reason": "Same area 'background-worker', keywords: {'task', 'hmr'}",
"type": "duplicate"
}
]
IMPORTANT: After Phase 1, proceed automatically to Phase 2 without asking. Do NOT ask "Would you like me to proceed?" — just do it.
python3 .github/scripts/issue-scanner.pyDuplicates — Read both files, compare content:
related_issues cross-referenceOpen Issues — Check if resolved:
Relevant Files in codebasepython3 .github/scripts/issue-scanner.py --resolve <file>Stale Issues (open > 30 days):
--closeUse the scanner's update commands for fast changes:
# Resolve issues (status: open → resolved)
python3 .github/scripts/issue-scanner.py --resolve file1.md file2.md
# Close issues (status: open → wontfix)
python3 .github/scripts/issue-scanner.py --close file.md
# Generic field update
python3 .github/scripts/issue-scanner.py --set severity high --files file.md
_template.mdstatus: investigating — active work| Frequency | Action |
|---|---|
| After adding issues | Run python3 .github/scripts/issue-scanner.py |
| Weekly (active dev) | Full scan + Phase 2 on suspects |
| Monthly (stable) | Full scan + triage all open issues |
| Approach | Deep Analysis | Cost |
|---|---|---|
| Naive (all) | N files | 💰💰💰💰💰 |
| Two-phase | ~M suspects (M << N) | 💰 |
Savings: ~90% cost reduction by filtering in Phase 1.