Name: Codebase Educator
Author: steveKit

Skills suchen.../

Codebase Educator | Skills Pool

Register	Written by	Purpose	Schema
`state.yaml`	Orchestrator	Workflow progress + restartability	`references/registers/state.md`
`gather.yaml`	Phase 1	Structured codebase data	`references/registers/gather.md`
`url-index.yaml`	Phase 1	Technology link lookup table	`references/registers/url-index.md`
`quality.yaml`	Phase 1.5	Quality assessment + tone guidance	`references/registers/quality.md`
`sections.yaml`	Phase 2	Per-section metadata (concepts, URLs, counts)	`references/registers/sections-manifest.md`

Phase	Load these references	Load these registers
1 (Gather)	nothing extra	-- (creating them)
1.5 (Quality + Depth)	`quality-assessment.md`	`gather.yaml`
2 (Write)	`sections/_shared.md` + the ONE section template	`gather.yaml`, `url-index.yaml`, `quality.yaml`
2.5 (Sweep)	nothing extra	`sections.yaml`
3 (Concepts)	`concept-template.md`	`sections.yaml`, registry + connections + vault-state on disk
4-5 (Commit/Report)	nothing extra	`state.yaml`, `sections.yaml`

Input	Type	Method
No argument	Local project	Analyze current working directory
`/path`, `./path`, `~/path`	Local path	Analyze directory at that path
`https://github.com/...`	GitHub repo	`git clone --depth 1` to `/tmp/educator-<name>`
`https://...` (non-GitHub)	Website	Discover repo first (see Website Sources below), fall back to external observation
`npm:<package>`	npm package	`npm pack` to `/tmp/educator-<name>`, extract
`pypi:<package>`	PyPI package	`pip download --no-deps --no-binary :all:` to `/tmp/educator-<name>`, extract

Source type	Base URL	File link pattern
GitHub repo	Input URL	`<base>/blob/<branch>/<filepath>`
Local with GitHub remote	`git remote get-url origin` converted	Same blob pattern
npm/PyPI with `repository`	From package manifest	Same if GitHub
Local, no remote	`null`	Relative paths only

Resolve source -- detect type, acquire code if needed
Scan structure -- ls top-level, read key files:
- Entry points (main, index, app, server files)
- Config files (package.json, Cargo.toml, go.mod, pyproject.toml, etc.)
- README, CLAUDE.md, ARCHITECTURE.md, docs/ if they exist
- CI/CD config (.github/workflows, Dockerfile, docker-compose)
- Test directories and config
Map dependency graph -- read imports in key files, trace module structure
Sample depth -- read files from different codebase areas:

Project size Sample target Strategy
Small (<20 files) Read most files Near-complete coverage
Medium (20-100) 8-15 files Cover every layer/module
Large (100+) 15-25 files Glob to discover, sample each subsystem

Batch Read calls. Prioritize: hub modules, architectural boundaries, unusual files. For each file, write a structured summary (not raw content) into the gather register, noting key snippets by line range.
Read test files -- 2-3 real test files for core logic (not trivial utils)
Check history (if git) -- git log --oneline -20 for activity; git log --diff-filter=A --name-only --format="" | head -30 for file creation order
Build URL index -- For every significant technology, resolve:
- Official docs URL (construct from known patterns for well-known tech)
- Registry URL (npm/PyPI/crates.io/pkg.go.dev patterns)
- Repo URL (from package manifest or known GitHub paths)
WebFetch budget: Only verify URLs when docs site isn't obvious from the name or registry metadata. Never WebFetch well-known tech URLs. When uncertain, link to registry page.

Minimum coverage: language, framework, database, major libraries (3+ imports), build tools, test framework.
Write registers:
- Write gather.yaml following references/registers/gather.md schema
- Write url-index.yaml following references/registers/url-index.md schema
- Update state.yaml: state -> GATHERED

strategy-pattern:
  category: pattern
  projects:
    - expressjs--express
    - pallets--flask

Load registry into memory
Collect concept list from sections.yaml -- union of all concepts-linked across sections. Deduplicate.
Process concepts -- for each:
- Normalize to kebab-case: lowercase, hyphens only. This is the filename AND registry key AND wikilink target.
- Check registry (not filesystem) for existence
- New concept: Create page from references/concept-template.md, add to registry with category (from frontmatter) and this project as first entry
- Existing concept: Append backlink under "## Seen In", add project to registry's projects list
Backlink format: [[<project>/_<project>_overview|<project>]] (bare [[project]] resolves to nothing -- it's a folder).

Batch writes: Group new concepts and write in quick succession. Batch Edit calls for existing concept backlinks.
Cross-project connections -- Use the connection index + registry:
- Load _connections.yaml for existing connection data
- From registry, find concepts where projects list contains both the new project and at least one other project
- For each connected project, collect the shared concept list
- Read "## Seen In" in shared concept pages for usage descriptions
- Write "## Cross-Project Connections" in new project's overview
- Update each connected project's overview with reciprocal entry
- Update _connections.yaml: add new project's entries and add the new project to each connected project's entry (both directions)
- Format:
```
Concepts shared with [[other/_other_overview|Display Name]]:
- **Concept** -- how this project uses it; how the other uses it.
```
Write registry back to disk (v2 enriched format)
Update _index.md -- add project row, update Concepts by Category (use registry category field rather than reading concept frontmatter)
Quick link checks:
- All wikilinks are lowercase-with-hyphens
- Prefer alias links to broad concepts over near-duplicates
- No speculative links -- only wikilink concepts actually discussed
Update state.yaml: state -> CONCEPTS_DONE, populate concepts-created/updated/connections

<project-name>/
+-- _<project-name>_overview.md
+-- architecture.md
+-- technology-choices.md
+-- design-patterns.md
+-- key-decisions.md
+-- gaps-vulnerabilities.md
+-- dependencies.md
+-- evolution.md
+-- testing-strategy.md
+-- if-starting-over.md
+-- learning-path.md
+-- glossary.md
+-- resources.md

Phase	Scope
1, 1.5, 2, 2.5	Per source sequentially
3 (Concepts)	Once across all sources
4 (Commit)	Once -- single branch, single commit
5 (Report)	Once -- combined report

Project size	Sample target	Strategy
Small (<20 files)	Read most files	Near-complete coverage
Medium (20-100)	8-15 files	Cover every layer/module
Large (100+)	15-25 files	Glob to discover, sample each subsystem

Codebase Educator

Architecture: State Machine + Registers

State Machine

Codebase Educator

Architecture: State Machine + Registers

State Machine

Registers

Context Loading Rules

Source Detection

Source URL Resolution

Multi-Source Batching

Phase 1: Gather

Phase 1.5: Quality Assessment & Depth Profiling

Phase 2: Write Sections

Vault Bootstrap (first-ever run only)

Section Writing Protocol

Write Order and Parallelism

Mermaid Validation

Inline Link Checklist

Phase 2.5: Concept Sweep

Phase 3: Concepts & Index

Registry (v2 — Enriched)

Vault Metadata Files

Steps

Phase 4: Commit & Push

Phase 5: Report

Cleanup

Website Sources

Repo Discovery

Repo Found -> Code Analysis

No Repo -> External Observation

Project Subfolder Naming

Project Subfolder Layout

Resume Protocol

Guidelines

Update Skills

Eval Harness

Ecc Tools Cost Audit

Code Tour

Rules Distill

Design System