Classify an AI/LLM infrastructure repository into a hierarchical module taxonomy and generate `module_map.json`, `file_index.json`, and `module_profile.json` using LLM semantic analysis. Use when software profiling needs stable module boundaries for AI infra repositories.
Use this skill when you need to Classify an AI infra (LLM/multimodal) repo into standardized module categories.
Typical inputs:
data/repos/<repo-name>).module_map.json: coarse module labels with evidence (scores + paths + counts).file_index.json: file → module assignment (coarse.fine labels).module_profile.json: module list in software-profile schema (name/category/description/paths).MODULES.md: human-readable summary.dsocr.Include files (code only):
.py, .pyi, .go, .rs, .java, .kt, .scala, .c, .cc, .cpp, .h, .hpp, .cu, .cuh, .sh, .bash, .ps1Exclude directories:
.git, .hg, .svn, .tox, .venv, venv, __pycache__, .mypy_cache, etc.node_modules, dist, build, target, out, etc.bazel-bin, bazel-out, bazel-testlogs, bazel-workspace, etc..idea, .vscode, .pytest_cache, .github, etc.testdoc2.1 If module_map.json / file_index.json / module_profile.json already exist in your target output directory, you can directly jump to step 3.
2.2 If not, run the scanner to produce LLM-driven module candidates:
python .claude/skills/ai-infra-module-modeler/scripts/scan_repo.py \
--repo <repo-path> \
--out <path-to-save> \
--max-files 20000 \
--max-bytes 200000 \
--group-depth <to-be-determined> \
--llm-provider deepseek \
--max-workers 10 \
--llm-model "deepseek-chat"
--require-llm to fail fast if the LLM is unavailable or returns invalid JSON.--group-depth, --group-sample-files, --group-snippets, --snippet-bytes, --batch-size. Choose --group-depth based on repository structure to balance runtime and classification accuracy.<out>/MODULES.md.module_map.json, file_index.json, and module_profile.json against real code files and checklist criteria in references/checklists/.README*, pyproject.toml / requirements*, Dockerfile, helm/, k8s/, examples/, and top-level packages.