Explore a repository and generate architecture documentation. Reads directory structure, manifests, configs, and imports, then generates up to nine `.archeia/` docs — four always (Architecture.md, System.json, Standards.md, Guide.md), attempts Containers.json, Components.json, and DataFlow.json on every run, and adds up to two more docs when evidence supports them (Entities.json, StateMachine.json) — plus AGENTS.md and CLAUDE.md. Uses structured templates, evidence grounding, and self-validation.
This skill explores a repository and generates 6–11 files:
.archeia/ docs from structured templates
(Architecture.md, System.json, Standards.md, Guide.md)Containers.json is always attempted after System.json. Generate it
only when one or more runtime units can be identified with evidence;
otherwise emit an explicit insufficiency outcome and skip the file.Components.json is always attempted after the Containers.json
decision. Generate it only when at least one generated container has
evidence-backed internal source modules; otherwise emit an explicit
insufficiency outcome and skip the file.DataFlow.json is always attempted after Architecture.md. Generate it
only when a primary interaction flow can be traced with evidence;
otherwise emit an explicit insufficiency outcome and skip the file..archeia/ docs when evidence supports them
(Entities.json, StateMachine.json)AGENTS.md — synthesized agent instructions from what was discoveredCLAUDE.md — points to to avoid duplicationAGENTS.mdEvery claim in the generated docs must cite a file path as evidence. The
templates in assets/templates/ define the structure, required sections,
and quality rubrics for each .archeia/ file.
Read references/ARCHITECTURE_DOCS_PROTOCOL.md before starting. It explains
the philosophy behind .archeia/ docs: why they live in the repo, what goes
in vs stays out, the evidence principle, the template-as-exemplar pattern,
and how .archeia/ relates to READMEs, agents.md, and diagrams.
Read references/examples.md for gold-standard examples of the quality bar
this skill must hit. Study them before generating — they show the difference
between generic framework descriptions and genuinely useful architecture
documentation grounded in evidence.
Run scripts/discover.sh <repo-root> first and use its files_to_read array
as the default exploration plan. If .archeia/ScanReport.md exists, use it as
supporting context, not required input. You may read beyond the discovery
output when manifests, imports, schema files, README claims, or workspace
structure point to additional evidence.
Then use this priority order for follow-up reads and overrides. Within each category, read files alphabetically. Stop exploring after reading ~30 files total — shift to generation if the budget is reached. If a critical file is discovered late, reading it is fine.
Priority 1 — Root manifests (read all that exist):
package.json, pyproject.toml, Cargo.toml, go.mod, Gemfile,
composer.json, pom.xml, build.gradle, Mix.exs, deno.json
Priority 2 — Root configs (read all that exist):
tsconfig.json, tsconfig*.json, ruff.toml, pyproject.toml [tool.*],
.eslintrc*, .prettierrc*, biome.json, Makefile, Justfile,
Taskfile.yml, Dockerfile, docker-compose.yml, fly.toml, render.yaml,
railway.json, vercel.json, netlify.toml
Priority 2.5 — Schema and model files (read all that exist):
prisma/schema.prisma, **/models.py, **/models/*.py, schema.graphql,
**/*.entity.ts, **/schema.ts (Drizzle), **/*.sql in migrations/ (first 2)
Priority 3 — Root docs (read all that exist):
README.md, CONTRIBUTING.md, CHANGELOG.md
Priority 4 — CI/CD (read first 3 files alphabetically):
.github/workflows/*.yml, .gitlab-ci.yml, .circleci/config.yml
Priority 5 — Test setup (read config files, not test bodies):
tests/conftest.py, jest.config.*, vitest.config.*, test/test_helper.*,
.nycrc, pytest.ini, setup.cfg [tool:pytest], phpunit.xml
Priority 6 — Source sampling (read first 5 files alphabetically):
Files in src/, lib/, app/, or the primary source directory. Focus on
entry points and module index files (index.*, main.*, app.*, mod.rs).
Use this table to map discovered files to conclusions. When signals conflict, follow the priority order: manifest content > file extensions > directory structure > README claims.
| Signal | Conclusion |
|---|---|
package.json exists | Node.js/JavaScript project |
package.json has "type": "module" | ESM modules |
package.json → dependencies has react | React frontend |
package.json → dependencies has express/fastify/hono | HTTP server framework |
package.json → dependencies has next | Next.js full-stack |
package.json → devDependencies has typescript | TypeScript project |
package.json → scripts has test | Has test runner |
tsconfig.json exists | TypeScript (confirms) |
pyproject.toml exists | Python project |
pyproject.toml → [tool.ruff] | Uses ruff linter |
pyproject.toml → [tool.black] | Uses black formatter |
pyproject.toml → [tool.mypy] | Uses mypy type checker |
pyproject.toml → [tool.pytest] | Uses pytest |
requirements.txt / setup.py | Python (legacy packaging) |
Cargo.toml exists | Rust project |
go.mod exists | Go project |
Gemfile exists | Ruby project |
composer.json exists | PHP project |
pom.xml / build.gradle | Java/JVM project |
Mix.exs exists | Elixir project |
deno.json exists | Deno runtime |
Dockerfile exists | Containerized deployment |
docker-compose.yml exists | Multi-service local dev |
fly.toml | Deploys to Fly.io |
vercel.json / netlify.toml | Serverless/JAMstack deploy |
render.yaml / railway.json | PaaS deployment |
.github/workflows/ | GitHub Actions CI/CD |
.gitlab-ci.yml | GitLab CI |
Makefile / Justfile / Taskfile.yml | Has task automation |
tests/ / __tests__/ / spec/ / test/ | Has test directory |
.pre-commit-config.yaml | Uses pre-commit hooks |
src/ directory | Standard source layout |
lib/ directory | Library-style source layout |
app/ directory | Application-style layout (Rails, Next.js, etc.) |
packages/ / apps/ | Monorepo with workspaces |
.env.example | Environment-variable configuration |
uv.lock / poetry.lock | Python lockfile (uv or poetry) |
pnpm-lock.yaml / package-lock.json / yarn.lock | JS lockfile |
prisma in manifest dependencies | Node.js project uses Prisma ORM |
prisma/schema.prisma exists | Prisma schema file (read for entities) |
django in manifest dependencies | Python project uses Django ORM |
**/models.py with class X(models.Model) | Django models (read for entities) |
sqlalchemy in manifest dependencies | Python project uses SQLAlchemy |
typeorm in manifest dependencies | TypeScript project uses TypeORM |
drizzle-orm in manifest dependencies | TypeScript project uses Drizzle ORM |
xstate in manifest dependencies | Has explicit state machine library |
django-fsm in manifest dependencies | Has Django finite state machine |
Enum field named status/state/phase in model | Potential state lifecycle |
Generate .archeia/ docs in this order (respecting template depends_on):
Read assets/templates/Architecture.md frontmatter and body
Generate .archeia/Architecture.md from collected signals
Read assets/templates/System.json example
Generate .archeia/System.json — follow the example structure exactly,
replacing example data with evidence from this repo. Rules:
system: exactly one object, sourced from manifest + READMEpeople: infer from README audience, auth roles, route groups. Empty [] if none foundexternal_systems: every external service confirmed by manifest deps or docker-composerelationships: connect every entity. Every source/target must resolve to an id aboveevidence array with at least one file pathContainers.json attempt — Read assets/templates/Containers.json
example on every run.
Attempt to generate .archeia/Containers.json. If one or more runtime
units can be identified with evidence, follow the example structure.
Rules:
system_boundary: must match system from System.json (same id, name, description)containers: runtime units (processes, databases, caches), NOT source directories.
type is one of: webapp | api | database | cache | queue | filesystem | worker | cliexternal_systems: carried forward from System.json, must match exactlyrelationships: container-to-container and container-to-externalpeople_container_mappings: include only if System.json has people, omit key otherwise.archeia/Containers.json<!-- INSUFFICIENT EVIDENCE: runtime units for Containers.json -->Components.json attempt — Read assets/templates/Components.json
example on every run.
Attempt to generate .archeia/Components.json after the
Containers.json decision.
Containers.json was generated and at least one container has
evidence-backed internal source code, follow the example structure.
Rules:
containers: one entry per container from Containers.json with internal source code.
Skip containers with no app-side code (e.g., managed databases without schema code)components: code-level modules (directories/packages, not files).
type is one of: module | service | controller | repository | middleware | handler | library | config
Prefix IDs with container ID (e.g., api-routes, worker-jobs)external_systems: include only if components talk directly to externalsrelationships: focus on architecturally significant connections (layer crossings,
module boundaries, external integrations). Do NOT enumerate every importContainers.json was skipped:
.archeia/Components.json<!-- INSUFFICIENT EVIDENCE: Containers.json unavailable for Components.json -->Containers.json was generated but no container has evidence-backed
internal source modules:
.archeia/Components.json<!-- INSUFFICIENT EVIDENCE: internal source modules for Components.json -->Conditional: Entities.json — Check if ORM/schema evidence was found during exploration (Prisma schema, Django models, SQLAlchemy models, TypeORM entities, Drizzle schema, or migration files with CREATE TABLE). If yes:
assets/templates/Entities.json example.archeia/Entities.json — follow the example structure. Rules:
source_type: set based on detected ORM (prisma, django, sqlalchemy,
typeorm, drizzle, sql, graphql)entities: read schema/model files, extract domain entities only
(not join tables, audit logs, migration tracking, or session tables)fields: 3–6 key fields per entity. Include PKs, FKs, and
domain-significant fields. Omit timestamps (created_at, updated_at)
unless domain-relevant.
constraints is one or more of:
pk | fk:[entity_id] | unique | not_null | nullable | default:[value]relationships: extract from FK fields and explicit relation decorators.
cardinality is one of: one_to_one | one_to_many | many_to_manyevidence array
If no ORM/schema evidence found, skip. Print: "Skipped Entities.json — no
ORM or schema files detected."DataFlow.json attempt — After Architecture.md Data Flow section is generated:
assets/templates/DataFlow.json is mandatory input. Read it on every
run; this attempt is required even though the output file is
evidence-gated..archeia/DataFlow.json on every run..archeia/DataFlow.json — follow the example structure.
Rules:
flows: at minimum one flow marked primary: true, matching
Architecture.md's Data Flow tableparticipants[].id must resolve to an id in System.json,
Containers.json, or Components.json.
type is: person | container | component | external_systemsteps[].type is: sync | async | responsesteps[].protocol is: HTTPS | SQL | gRPC | Redis | WebSocket | function call | message queue.archeia/DataFlow.json<!-- INSUFFICIENT EVIDENCE: primary interaction flow for DataFlow.json -->Conditional: StateMachine.json — Check if state machine evidence was found during exploration (explicit library in manifest, or clear enum fields with transition methods). If yes:
assets/templates/StateMachine.json example.archeia/StateMachine.json — follow the example structure.
Rules:
high or medium confidence:
high: explicit library config (xstate createMachine, django-fsm
decorator, aasm block)medium: clear enum type + transition methods that reference the
enum valueslow confidence (inferred from if/else logic)state_machines[].entity must resolve to an entity id in
Entities.json (if generated). If no Entities.json, use descriptive id.terminal_states: states with no outgoing transitions
If no state machine evidence found, skip. Print: "Skipped
StateMachine.json — no state machines or lifecycle enums detected."Read assets/templates/Standards.md frontmatter and body
Generate .archeia/Standards.md (may reference Architecture.md for topology)
Read assets/templates/Guide.md frontmatter and body
Generate .archeia/Guide.md (may reference Architecture.md + Standards.md)
For each generated file:
<!-- INSUFFICIENT EVIDENCE: [description] --> for gapsAfter generating all .archeia/ docs, run a validation pass:
Step 1 — Rubric check: Re-read each generated file. For each template's quality rubric (listed at the bottom of the template), verify the output meets every criterion. Fix issues inline using the Edit tool. One fix pass maximum — if an issue persists after one fix attempt, note it and move on.
Step 2 — Canonical evidence validation: Run
./scripts/validate-evidence.sh <repo-root> after generating the .archeia/
docs. Fix any invalid_paths or malformed_evidence findings. The validator
checks only canonical evidence locations: Markdown **Evidence:** lines and
JSON evidence arrays.
Step 3 — Validation summary: Print a summary for the user:
After the .archeia/ docs are validated, generate AGENTS.md and CLAUDE.md.
AGENTS.md — The substantive file. Synthesize what was discovered into
practical instructions that a coding agent needs when working in this repo.
Draw from the generated .archeia/ docs:
Structure AGENTS.md following these best practices:
**Evidence:** line using
repo-relative inline-code paths so the final validation pass can verify itHandling low-confidence areas: Architecture.md and Standards.md now report confidence per section (high/medium/low). When generating AGENTS.md, read the "Low-confidence guidance" blocks from both documents and translate each one into a concrete agent instruction.
For each low-confidence area, AGENTS.md must explicitly state the absence and provide a fallback behavior. Do not skip areas just because no convention was detected — silence causes agents to hallucinate standards.
Example for a repo with mixed confidence:
## Coding Standards
**Formatting:** No formatter configured. Match the style of the nearest
existing file. Do not introduce formatting tools without explicit approval.
**Linting:** ruff configured (`ruff.toml`). Run `ruff check` before
committing.
**Type checking:** No type checker configured. Do not add type annotations
unless the file already uses them.
**Testing:** pytest configured (`pyproject.toml`). Test files: `test_*.py`
in `tests/`. Run: `uv run pytest`.
**Naming:** Mostly snake_case. Some camelCase in `src/legacy/`. Match the
convention of the directory you're working in.
The pattern: for high-confidence areas, state the convention and the command. For low-confidence areas, state the absence and the fallback ("match nearest file," "don't introduce," "ask first"). Every section gets an entry — no gaps, no silence.
CLAUDE.md — A thin pointer file. Its only job is to direct Claude to
AGENTS.md so instructions are not duplicated across files. Format:
# Claude Instructions
See [AGENTS.md](AGENTS.md) for full project instructions.
Add Claude-specific configuration below the pointer only if the repo has Claude-specific needs (e.g., tool permissions, model preferences). Otherwise, keep it minimal.
After generating AGENTS.md and CLAUDE.md, run
./scripts/validate-evidence.sh <repo-root> <repo-root>/.archeia --include-root-docs
so the final validation pass includes the generated root docs. This second pass
exists because root docs are produced in Phase 4, after the .archeia/ docs.
<!-- INSUFFICIENT EVIDENCE: ... -->.Containers.json, Components.json, or DataFlow.json cannot be
evidenced, skip the file and emit an explicit insufficiency outcome rather
than inventing structure..archeia/ docs (always generated):
.archeia/Architecture.md.archeia/System.json.archeia/Standards.md.archeia/Guide.md.archeia/ docs (always attempted):
.archeia/Containers.json (generate when one or more runtime units can be
identified with evidence; otherwise emit an explicit insufficiency outcome).archeia/Components.json (generate when at least one generated container
has evidence-backed internal source modules; otherwise emit an explicit
insufficiency outcome).archeia/DataFlow.json (generate when a primary interaction flow can be
traced with evidence; otherwise emit an explicit insufficiency outcome)Conditional .archeia/ docs (when evidence supports):
.archeia/Entities.json (when ORM/schema detected).archeia/StateMachine.json (when state machines detected)Agent instructions:
AGENTS.mdCLAUDE.md