Audit a software project for hardening — security, AI gaps, test coverage, code quality, and decoupling. Use when user wants to harden a project, audit for vulnerabilities, check test coverage, or separate private data from code.
Perform a systematic hardening audit of this project. Work through each phase below, exploring the codebase to find real issues — not hypothetical ones. For each finding, explain the risk and suggest a concrete fix.
When to Use
When user types /harden
When user wants to audit a project for security, test coverage, or code quality
Before making a private repo public
After a major feature or refactor to check for regressions
When onboarding to an unfamiliar codebase to assess its health
Process
Step 1: Context Gathering (Phase 0)
Before scanning, ask these questions to calibrate the audit. The user can answer inline or say "skip" to assume worst-case (strictest ratings).
Visibility — Is this repo private, or could it go public?
Access — Just you, a team, or open source?
관련 스킬
Deployment — Local only, server, or cloud?
Compliance — Any regulatory or legal requirements? (government forms, PII rules, HIPAA, etc.)
Known issues — Areas you already know are problematic? (prioritize or skip)
Use the answers to calibrate severity ratings throughout the audit. For example:
"Private repo, solo access" — PII in git is medium, not critical
"Going public" — PII in git is critical
"Government forms" — output validation is high priority
If the user skips, assume: public visibility, shared access, compliance required.
After calibration, state assumptions:
Based on the answers, tell the user what this audit covers and what it doesn't. For example:
"Covering: source code, config files, git history, dependencies, test coverage"
"Does this match your expectations, or should I adjust?"
Tailor the scope assumptions to the project. A web app with a Dockerfile gets different assumptions than a CLI tool. Let the user confirm or adjust before proceeding.
Step 2: Audit Scopes
Work through these one at a time. After each scope, summarize findings and ask if the user wants to go deeper or move to the next scope.
Scope 1: Security
Input validation and output sanitization (including stderr/log leakage)
Secrets in code, config, and git history (not just current files — check committed history)
Permission and access control
Dependency hygiene (pinning, unused deps expanding attack surface, known CVEs)
File system access patterns (path traversal, unsafe reads/writes)
Scope 2: AI-Specific Gaps
Prompt injection risks (user input flowing into prompts unsanitized)
Data exposure through AI context (secrets, PII, or sensitive content visible to the model)
Access control on AI interfaces (who can invoke the AI, rate limits, cost controls)
Output validation (does the system verify AI outputs before acting on them?)
Fragile model assumptions (hardcoded model names, unpinned versions, deterministic output expectations)
Scope 3: Test Coverage
Map existing test coverage (what has tests, what doesn't)
Flag high-risk untested code (touches files, network, secrets, user input, money)
Verify security and AI findings from Scopes 1-2 have test coverage
Identify missing error path tests (what happens when things fail?)
Suggest highest-value tests to add first, prioritized by risk
Scope 4: Code Quality
Linter and formatter configuration (is one configured? If not, recommend one for the language/framework)
Error handling (silent failures, bare except, swallowed exceptions)
Dead code and stale config (unused functions, abandoned imports, orphaned files from refactors)
Consistency (naming, patterns, structure across similar components)
Network and retry behavior (connection limits, timeout handling, backoff)
Hardcoded values that should be configurable (paths, filenames, magic numbers)
Scope 5: Decoupling & Data Separation
Tightly coupled components that should be independent (shared state, circular dependencies, components modifying same files)
Private or sensitive data committed to the repo (PII, personal content, customer data)
.gitignore coverage for sensitive and generated files
Environment-specific config committed as if universal
Data that should live outside the repo (personal content, user-specific state, local config)
Clear boundaries between framework code and user data
Step 3: Checkpoint Questions
During each scope, if you encounter something ambiguous — ask before rating severity. Only ask questions the codebase can't answer.
Examples:
"This file has what looks like a real SSN — is this test data or production?"
"This API key is in a committed file — is this a throwaway/dev key or production?"
"Two skills modify the same file — is this intentional or a gap?"
If the codebase can answer it, don't ask. If only the user knows, ask.
Do NOT turn the audit into an interview. The skill's strength is autonomous exploration. Reserve questions for decisions that change severity or skip/prioritize a finding.
Step 4: Scorecard
After completing all scopes, present a scorecard with letter grades per scope.
Grading formula:
Points per finding: Critical = 4, High = 3, Medium = 2, Low = 1
Any critical finding in a scope = D minimum for that scope (cannot grade above D until critical is resolved)
Grade
Points
A
0
B
1-4
C
5-9
D
10-14
F
15+
Overall grade: Average of scope grades (A=4, B=3, C=2, D=1, F=0), rounded.
Step 5: Batch Plan
Propose batches for fixing the findings.
Batching rules:
Dependencies first — if finding X blocks finding Y, X goes in an earlier batch
Logical grouping — findings that touch the same files or fix related problems go together
Severity within batches — higher severity batches come first when no dependency constraint
Step 6: Issue Creation (Optional)
After presenting the batch plan, ask: "Ready to create GitHub issues? I'll file them in batch order."
Only create issues after the user reviews and approves the batches.
Output Format
Scorecard
| Scope | Grade | Findings |
|-------|-------|----------|
| Security | C | 1 critical, 2 high, 1 medium |
| AI | B | 2 high |
| Tests | D | 3 high |
| Code Quality | B | 1 medium, 2 low |
| Decoupling | C | 2 high |
Overall: C
Batch Plan
### Batch 1 — [description] ([count] issues)
Resolves: [finding numbers]
Dependency: [what must be done first, or "None — do this first"]
Effort: [Low / Medium / High]
Rules
Explore the codebase yourself before asking questions. Read files, check configs, scan for patterns.
Only flag real issues you find in the code — not theoretical risks.
Prioritize findings by severity: critical > high > medium > low, calibrated by Phase 0 context.
The scorecard and batch plan are mandatory outputs. Do not skip them.