Performs comprehensive security-focused source-sink data flow analysis on codebases. Traces every untrusted input (user input, API payloads, LLM tool args, file metadata, external API responses) through transformations and validations to sensitive execution sinks (shell execution, prompt assembly, subprocess calls, SQL/query construction, template rendering, file writes, log output). Produces a structured markdown report with mermaid diagrams, vulnerability matrix, and remediation priorities. Use this skill whenever the user asks for source-sink analysis, data flow analysis, untrusted input analysis, injection analysis, attack surface mapping, threat modeling, security audit, taint analysis, or wants to understand how untrusted data flows through their codebase. Also trigger when users say things like 'where does user input end up', 'find injection vulnerabilities', 'trace data flows', 'what's the attack surface', 'security review this code', or 'how is input validated'.
You are performing a comprehensive security-focused source-sink data flow analysis. Your goal is to trace every path where untrusted data enters the system and follows it to every sensitive execution point, documenting what validation/escaping exists (and what's missing) at each step.
This is a taint analysis: treat every piece of data not generated internally as potentially hostile, and trace it until it either reaches a sink or is provably sanitized.
Source-sink analysis is the foundation of application security review. A vulnerability exists when untrusted data reaches a sensitive sink without adequate validation. By systematically mapping every flow, you find issues that code review alone misses -- the injection that spans 6 function calls across 4 files, the metadata field nobody thought to escape, the API endpoint someone forgot to authenticate.
Run four parallel Explore agents to cover different layers simultaneously. This is critical for thoroughness -- a single sequential pass will miss cross-layer flows.
Launch these four agents simultaneously:
Agent 1 -- Entry Points & Event Handlers: Find all places untrusted data enters the system:
Agent 2 -- Execution Sinks & Dangerous Functions: Find all sensitive execution points:
subprocess, exec, eval, os.system, os.popen, shell=TrueAgent 3 -- Client Layer & External Integrations: Find all API clients and external communication:
Agent 4 -- Config, Secrets & Sandbox Layer: Find all configuration and privilege boundaries:
For each source discovered in Phase 1, trace data through every function call to every reachable sink. This is where most of the analysis time should go.
For each flow, document:
Pay special attention to these commonly-missed patterns:
--help, -X POST)Rate each flow using this matrix:
| Severity | Criteria |
|---|---|
| CRITICAL | Untrusted data reaches shell execution, arbitrary code eval, or template injection with no/trivially-bypassable sanitization |
| HIGH | Untrusted data reaches dangerous sinks with partial or context-inappropriate sanitization; or completely unauthenticated API endpoints exposing sensitive operations |
| MEDIUM | Untrusted data reaches sinks with reasonable but imperfect sanitization; or query language injection where external API authorization provides secondary defense |
| LOW | Flows where exploitation requires compromising a trusted source (e.g., Git repo, config repo); or information disclosure via logs/error messages |
For each vulnerability, also assess:
Generate a single comprehensive markdown document with this structure:
# Source-Sink Security Analysis: {Project Name}
## 1. Executive Summary
- Total sources/sinks/flows analyzed
- Critical/High/Medium/Low finding counts
- Top 3 priority items
## 2. Master Attack Surface Map
- Mermaid flowchart: all sources → transformations → sinks
- Color-coded by severity
## 3. Source Classification
| Source | Controller | Trust Level | Entry Point | Section |
(table of all untrusted input sources)
## 4. Sink Classification
| Sink | Impact if Compromised | Protection Level | Section |
(table of all sensitive sinks)
## 5-N. Detailed Flow Analysis
One section per flow, containing:
- Trace (source → each hop with file:line → sink)
- Code snippets for critical points
- Escaping/validation analysis
- Injection vectors (with proof-of-concept payloads where applicable)
- Mitigating factors
- Residual risk assessment
- Mermaid diagram for complex flows
## N+1. Risk Assessment
- Mermaid quadrantChart (likelihood vs impact)
- Full vulnerability matrix table
## N+2. Remediation Priorities
- Ordered by risk, with specific fix recommendations
## N+3. File Reference Appendix
| File | Relevant Sections |
Use these patterns for consistency:
Master attack surface map (flowchart TB with subgraphs for Sources, Transforms, Sinks):
flowchart TB
subgraph Sources["UNTRUSTED SOURCES"]
S1["description"]
end
subgraph Sinks["SENSITIVE SINKS"]
K1["description"]
end
S1 --> K1
classDef critical fill:#ffcdd2,stroke:#b71c1c
classDef high fill:#ffe0b2,stroke:#e65100
classDef medium fill:#fff9c4,stroke:#f57f17
classDef low fill:#c8e6c9,stroke:#2e7d32
Risk quadrant chart:
quadrantChart
title Risk Assessment: Likelihood vs Impact
x-axis Low Likelihood --> High Likelihood
y-axis Low Impact --> High Impact
quadrant-1 Prioritize
quadrant-2 Monitor
quadrant-3 Accept
quadrant-4 Mitigate
Vuln Name: [likelihood, impact]
Flow-specific sequence diagrams for complex multi-component flows.
subprocess.run/Popen with shell=True or unsanitized argsos.system(), os.popen(), eval(), exec()yaml.load() vs yaml.safe_load()pickle.loads() with untrusted dataEnvironment(autoescape=False) or Template() without sandbox.format() in SQL queriesshlex.quote() usage (correct?) vs shlex.split() (dangerous with untrusted input)os.path.join() without traversal checks (it doesn't prevent ../)__import__() or importlib with user-controlled module nameschild_process.exec() vs execFile() vs spawn()eval(), Function(), vm.runInNewContext()innerHTML, dangerouslySetInnerHTMLJSON.parse() with prototype pollution potentialrequire() or dynamic import() with user-controlled pathsos/exec.Command() with unsanitized argstext/template vs html/templatedatabase/sql with string concatenationfilepath.Join() without filepath.Clean() traversal checkunsafe package usagereflect with user-controlled type namesAccess-Control-Allow-Origin: *After generating the report, ask the user how they want it delivered:
docs/SOURCE_SINK_ANALYSIS.md)Default to asking, but if the user specified a preference upfront, honor it.