Defines the standard 8-category taxonomy for classifying project files: Config, Tests, Docs, Scripts, Source Code, Data, AI Tooling, and Other. REFERENCE THIS SKILL whenever categorizing files, auditing project structure, or answering 'what types of files are here.' Apply these categories and patterns whether using the bundled script or manual analysis.
| Category | Description |
|---|---|
| Config | Configuration files for tools, environments, and build systems |
| Tests | Test files, fixtures, and testing utilities |
| Docs | Documentation, READMEs, and guides |
| Scripts | Standalone executable scripts and automation |
| Source Code | Core application/library source files |
| Data | Data files, datasets, and static assets |
| AI Tooling | AI/ML configs, prompts, and agent definitions |
| Other | Files that don't fit other categories (fallback) |
Always use exactly these 8 categories — do not invent new ones like "Schema", "Database", or "Reference Data." Map edge cases as follows:
| File Type | Category | Reasoning |
|---|---|---|
| SQL DDL (CREATE TABLE) | Docs | Documents database structure |
| SQL DML (INSERT/SELECT) | Data | Contains or queries data |
| .duckdb, .sqlite files | Data | Database storage |
| schema.json, openapi.yaml | Docs | Specification/contract files |
| Shell scripts (.sh) | Scripts | Executable automation |
| requirements.txt | Config | Dependency configuration |
When scanning directories, the script uses layered exclusion to prevent wasting tokens on useless output:
Layer 1: Always Excluded (non-negotiable)
node_modules/, bower_components/, jspm_packages/.git/, .svn/, .hg/__pycache__/, .pytest_cache/, .tox/, .mypy_cache/venv/, .venv/Layer 2: .gitignore (if present and parseable)
pathspec library; warns if unavailableLayer 3: Extended Defaults (fallback when no .gitignore)
dist/, build/, out/, _build/, target/vendor/, coverage/, .nyc_output/, htmlcov/env/, *.egg-info/Layer 4: Escape Hatches
--include-ignored: Bypass Layers 2-3 (but NOT Layer 1)--include-all: Bypass ALL layers (use with extreme caution)Important: .env as a FILE is categorized as Config. .env as a DIRECTORY is excluded.
When reporting results, explain which exclusion path was used:
"Excluded 3 directories via Layer 1 (always-exclude), 2 via .gitignore"
Apply rules in this order (first match wins):
tests/ → Tests, src/ → Source Code, docs/ → Docs, references/ → Docsschema.json, openapi.yaml → Docs (see patterns.md for full list)test_*.py → Tests, *.config.js → Config.sh → Scripts, .csv → Data, .py → Source Codefunction categorize(filePath, content):
# PHASE 1: Filename + directory rules
category = byLocationOrExtension(filePath)
if category != UNKNOWN: return (category, "High")
# PHASE 2: Frontmatter refinement
fm = extractYAMLFrontmatter(content)
if fm indicates config: return (CONFIG, "Medium")
if fm indicates ai_tooling: return (AI_TOOLING, "Medium")
# PHASE 3: Content structure analysis
if looksLikeTest(content): return (TESTS, "Medium")
if looksLikeScript(content): return (SCRIPTS, "Medium")
if looksLikeSource(content): return (SOURCE_CODE, "Medium")
if looksLikeDocs(content): return (DOCS, "Medium")
if looksLikeData(content): return (DATA, "Medium")
# PHASE 4: Keyword detection (fallback)
kw_category = detectByKeywords(content)
if kw_category: return (kw_category, "Low")
return (OTHER, "Low")
For complete pattern reference, see references/patterns.md:
For each file, provide:
### [Filename]
- **Category**: [Config | Tests | Docs | Scripts | Source Code | Data | AI Tooling | Other]
- **Confidence**: [High | Medium | Low]
- **Reasoning**: [Why this category was chosen]
- **Recommended Location**: [Suggested directory if misfiled, or "Correct" if well-placed]
Confidence Guidelines:
When using --analyze-content (Phases 2-4):
assert or describe()) may be miscategorized as TestsRecommendation: Use Phase 1 (path-based) categorization for speed and reliability. Enable --analyze-content only for ambiguous files or when you need higher accuracy on edge cases.
User: What category is src/utils/helpers.py?
Response:
### src/utils/helpers.py
- **Category**: Source Code
- **Confidence**: High
- **Reasoning**: Located in `src/` directory; `.py` extension; module structure
- **Recommended Location**: Correct
User: Categorize customers.csv in root directory.
Response:
### customers.csv
- **Category**: Data
- **Confidence**: High
- **Reasoning**: CSV extension; tabular structure detected
- **Recommended Location**: `data/customers.csv`
User: What category is notes.txt?
Response:
### notes.txt
- **Category**: Other
- **Confidence**: Low
- **Reasoning**: Prose content; no structural markers; could be Docs if formalized
- **Recommended Location**: Manual review needed — consider `docs/` if documentation
User: Categorize all files in this project.
Response:
Config (5):
- .gitignore
- pyproject.toml
- docker-compose.yml
Source Code (12):
- src/main.py
- src/utils/helpers.py
Tests (4):
- tests/test_main.py
- tests/conftest.py
Docs (2):
- README.md
- docs/API.md
AI Tooling (1):
- .claude/skills/categorizing-files/SKILL.md
Other (1):
- notes.txt (Low confidence — review needed)
Excluded 2 directories: 1 via Layer 1 (always-exclude), 1 via .gitignore
For large directories or programmatic use, the bundled script implements this methodology with automatic directory exclusion (skips node_modules/, venv/, .git/, etc.):
# Basic usage
python scripts/categorize.py [path]
# Enable content analysis (Phases 2-4)
python scripts/categorize.py --analyze-content [path]
# Include .gitignore-excluded files (bypass Layers 2-3)
python scripts/categorize.py --include-ignored [path]
# Include ALL files including node_modules (use with caution)
python scripts/categorize.py --include-all [path]
For a single file:
python scripts/categorize.py myfile.py
# Output: myfile.py: Source Code (High)
For a directory:
python scripts/categorize.py .
# Output: Grouped list by category with exclusion summary
See scripts/categorize.py for implementation details and programmatic API.