Supply chain investigation, evidence recovery, and forensic analysis for GitHub repositories. Covers deleted commit recovery, force-push detection, IOC extraction, multi-source evidence collection, hypothesis formation/validation, and structured forensic reporting. Inspired by RAPTOR's 1800+ line OSS Forensics system.
A 7-phase multi-agent investigation framework for researching open-source supply chain attacks. Adapted from RAPTOR's forensics system. Covers GitHub Archive, Wayback Machine, GitHub API, local git analysis, IOC extraction, evidence-backed hypothesis formation and validation, and final forensic report generation.
Read these before every investigation step. Violating them invalidates the report.
EV-XXXX). Assertions without citations are forbidden.[HYPOTHESIS]. Only statements verified against original sources may be stated as facts.execute_code in a sandboxed environment.internal-lib-v2 is uploaded to NPM with a higher version than the internal one. The investigator must track when this package was first seen and if any PushEvents in the target repo updated package.json to this version..github/workflows/build.yml. The investigator looks for PushEvents from this user after a long period of inactivity or from a new IP/location (if detectable via BigQuery).git fsck and GH Archive to recover the original commit SHA and verify what was leaked.Path convention: Throughout this skill,
SKILL_DIRrefers to the root of this skill's installation directory (the folder containing thisSKILL.md). When the skill is loaded, resolveSKILL_DIRto the actual path — e.g.~/.hermes/skills/security/oss-forensics/or theoptional-skills/equivalent. All script and template references are relative to it.
mkdir investigation_$(echo "REPO_NAME" | tr '/' '_')
cd investigation_$(echo "REPO_NAME" | tr '/' '_')
python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list
cp SKILL_DIR/templates/forensic-report.md ./investigation-report.md
iocs.md file to track Indicators of Compromise as they are discovered.Goal: Extract all structured investigative targets from the user's request.
Actions:
owner/repo)Tools: Reasoning only, or execute_code for regex extraction from large text blocks.
Output: Populate iocs.md with extracted IOCs. Each IOC must have:
Reference: See evidence-types.md for IOC taxonomy.
Spawn up to 5 specialist investigator sub-agents using delegate_task (batch mode, max 3 concurrent). Each investigator has a single data source and must not mix sources.
Orchestrator note: Pass the IOC list from Phase 1 and the investigation time window in the
contextfield of each delegated task.
ROLE BOUNDARY: You query the LOCAL GIT REPOSITORY ONLY. Do not call any external APIs.
Actions:
# Clone repository
git clone https://github.com/OWNER/REPO.git target_repo && cd target_repo
# Full commit log with stats
git log --all --full-history --stat --format="%H|%ae|%an|%ai|%s" > ../git_log.txt
# Detect force-push evidence (orphaned/dangling commits)
git fsck --lost-found --unreachable 2>&1 | grep commit > ../dangling_commits.txt
# Check reflog for rewritten history
git reflog --all > ../reflog.txt
# List ALL branches including deleted remote refs
git branch -a -v > ../branches.txt
# Find suspicious large binary additions
git log --all --diff-filter=A --name-only --format="%H %ai" -- "*.so" "*.dll" "*.exe" "*.bin" > ../binary_additions.txt
# Check for GPG signature anomalies
git log --show-signature --format="%H %ai %aN" > ../signature_check.txt 2>&1
Evidence to collect (add via python3 SKILL_DIR/scripts/evidence-store.py add):
gitgitgitgitReference: See recovery-techniques.md for accessing force-pushed commits.
ROLE BOUNDARY: You query the GITHUB REST API ONLY. Do not run git commands locally.
Actions:
# Commits (paginated)
curl -s "https://api.github.com/repos/OWNER/REPO/commits?per_page=100" > api_commits.json
# Pull Requests including closed/deleted
curl -s "https://api.github.com/repos/OWNER/REPO/pulls?state=all&per_page=100" > api_prs.json
# Issues
curl -s "https://api.github.com/repos/OWNER/REPO/issues?state=all&per_page=100" > api_issues.json
# Contributors and collaborator changes
curl -s "https://api.github.com/repos/OWNER/REPO/contributors" > api_contributors.json
# Repository events (last 300)
curl -s "https://api.github.com/repos/OWNER/REPO/events?per_page=100" > api_events.json
# Check specific suspicious commit SHA details
curl -s "https://api.github.com/repos/OWNER/REPO/git/commits/SHA" > commit_detail.json
# Releases
curl -s "https://api.github.com/repos/OWNER/REPO/releases?per_page=100" > api_releases.json
# Check if a specific commit exists (force-pushed commits may 404 on commits/ but succeed on git/commits/)
curl -s "https://api.github.com/repos/OWNER/REPO/commits/SHA" | jq .sha
Cross-reference targets (flag discrepancies as evidence):
Reference: See evidence-types.md for GH event types.
ROLE BOUNDARY: You query the WAYBACK MACHINE CDX API ONLY. Do not use the GitHub API.
Goal: Recover deleted GitHub pages (READMEs, issues, PRs, releases, wiki pages).
Actions:
# Search for archived snapshots of the repo main page
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO&output=json&limit=100&from=YYYYMMDD&to=YYYYMMDD" > wayback_main.json
# Search for a specific deleted issue
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/issues/NUM&output=json&limit=50" > wayback_issue_NUM.json
# Search for a specific deleted PR
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/pull/NUM&output=json&limit=50" > wayback_pr_NUM.json
# Fetch the best snapshot of a page
# Use the Wayback Machine URL: https://web.archive.org/web/TIMESTAMP/ORIGINAL_URL
# Example: https://web.archive.org/web/20240101000000*/github.com/OWNER/REPO
# Advanced: Search for deleted releases/tags
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/releases/tag/*&output=json" > wayback_tags.json
# Advanced: Search for historical wiki changes
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/wiki/*&output=json" > wayback_wiki.json
Evidence to collect:
Reference: See github-archive-guide.md for CDX API parameters.
ROLE BOUNDARY: You query GITHUB ARCHIVE via BIGQUERY ONLY. This is a tamper-proof record of all public GitHub events.
Prerequisites: Requires Google Cloud credentials with BigQuery access (
gcloud auth application-default login). If unavailable, skip this investigator and note it in the report.
Cost Optimization Rules (MANDATORY):
--dry_run before every query to estimate cost._TABLE_SUFFIX to filter by date range and minimize scanned data.# Template: safe BigQuery query for PushEvents to OWNER/REPO
bq query --use_legacy_sql=false --dry_run "
SELECT created_at, actor.login, payload.commits, payload.before, payload.head,
payload.size, payload.distinct_size
FROM \`githubarchive.month.*\`
WHERE _TABLE_SUFFIX BETWEEN 'YYYYMM' AND 'YYYYMM'
AND type = 'PushEvent'
AND repo.name = 'OWNER/REPO'
LIMIT 1000
"
# If cost is acceptable, re-run without --dry_run
# Detect force-pushes: zero-distinct_size PushEvents mean commits were force-erased
# payload.distinct_size = 0 AND payload.size > 0 → force push indicator
# Check for deleted branch events
bq query --use_legacy_sql=false "
SELECT created_at, actor.login, payload.ref, payload.ref_type
FROM \`githubarchive.month.*\`
WHERE _TABLE_SUFFIX BETWEEN 'YYYYMM' AND 'YYYYMM'
AND type = 'DeleteEvent'
AND repo.name = 'OWNER/REPO'
LIMIT 200
"
Evidence to collect:
Reference: See github-archive-guide.md for all 12 event types and query patterns.
ROLE BOUNDARY: You enrich EXISTING IOCs from Phase 1 using passive public sources ONLY. Do not execute any code from the target repository.
Actions:
github.com/OWNER/REPO/commit/SHA.patch)web_extract on public WHOIS services)After all investigators complete:
python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list to see all collected evidence.content_sha256 hash matches the original source.[VERIFIED] (confirmed from 2+ independent sources) or [UNVERIFIED] (single source only).A hypothesis must:
EV-XXXX, EV-YYYY)[HYPOTHESIS] until validatedCommon hypothesis templates (see investigation-templates.md):
For each hypothesis, spawn a delegate_task sub-agent to attempt to find disconfirming evidence before confirming.
The validator sub-agent MUST mechanically check:
evidence.json (hard failure if any ID is missing → hypothesis rejected as potentially fabricated).[VERIFIED] piece of evidence was confirmed from 2+ sources.Output:
VALIDATED: All evidence cited, verified, logically consistent, no plausible alternative explanation.INCONCLUSIVE: Evidence supports hypothesis but alternative explanations exist or evidence is insufficient.REJECTED: Missing evidence IDs, unverified evidence cited as fact, logical inconsistency detected.Rejected hypotheses feed back into Phase 4 for refinement (max 3 iterations).
Populate investigation-report.md using the template in forensic-report.md.
Mandatory sections:
EV-XXXX entries with source, type, and verification statusReport rules:
[EV-XXXX] citation[REDACTED]python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json listinvestigation-report.md to the user.This skill is designed for defensive security investigation — protecting open-source software from supply chain attacks. It must not be used for:
Investigations should be conducted with the principle of minimal intrusion: collect only the evidence necessary to validate or refute the hypothesis. When publishing results, follow responsible disclosure practices and coordinate with affected maintainers before public disclosure.
If the investigation reveals a genuine compromise, follow the coordinated vulnerability disclosure process:
GitHub REST API enforces rate limits that will interrupt large investigations if not managed.
Authenticated requests: 5,000/hour (requires GITHUB_TOKEN env var or gh CLI auth)
Unauthenticated requests: 60/hour (unusable for investigations)
Best practices:
export GITHUB_TOKEN=ghp_... or use gh CLI (auto-authenticates)If-None-Match / If-Modified-Since headers) to avoid consuming quota on unchanged dataX-RateLimit-Remaining header; if below 100, pause for X-RateLimit-Reset timestampIf rate-limited mid-investigation, record the partial results in the evidence store and note the limitation in the report.