Data Dictionary Generator

Generates or updates a structured, human-readable markdown data dictionary by auto-detecting the schema source and database engine, then querying the live database for statistics, distributions, sample rows, and data quality metrics.

When to Use

User asks to "create a data dictionary", "document the database", or "generate schema docs"
User asks to "update the data dictionary", "refresh the stats", or "sync the data dictionary"
User wants to onboard new team members to the database structure
User wants an up-to-date reference for all tables, columns, and relationships
User wants a data quality overview (null rates, value distributions)

Example Invocations

/data-dictionary
/data-dictionary --update
/data-dictionary --update --output reports/schema_ref.md
/data-dictionary --output reports/schema_ref.md
/data-dictionary --lang zh
/data-dictionary --output docs/db.md --lang es
create a data dictionary for this project
update the data dictionary
refresh the stats in the data dictionary
document the database schema
generate schema docs

Data Dictionary Generator

When to Use

User asks to "create a data dictionary", "document the database", or "generate schema docs"
User asks to "update the data dictionary", "refresh the stats", or "sync the data dictionary"
User wants to onboard new team members to the database structure
User wants an up-to-date reference for all tables, columns, and relationships
User wants a data quality overview (null rates, value distributions)

Example Invocations

/data-dictionary
/data-dictionary --update
/data-dictionary --update --output reports/schema_ref.md
/data-dictionary --output reports/schema_ref.md
/data-dictionary --lang zh
/data-dictionary --output docs/db.md --lang es
create a data dictionary for this project
update the data dictionary
refresh the stats in the data dictionary
document the database schema
generate schema docs

# Data Dictionary **Project**: <project name from package.json / pyproject.toml / directory name> **Database**: <engine> — `<connection string or file path>` **Schema source**: <file path and type> · git `<short hash>` (<date>) **Generated**: <YYYY-MM-DD> **Language**: <language> > ⚠️ [Only include if DB was not accessible]: Statistics are unavailable — > schema-only documentation generated. --- ## Quick Reference | Table | Rows | Description | |-------|------|-------------| | [`table_name`](#section-anchor) | N | One-line description | ... --- ## Table of Contents [grouped by logical domain, one bullet per table] --- ## Domain: <Group Name> ### `table_name` **Description**: [what this entity represents, its role, and how it relates to the broader domain] **Row count**: N **Schema**: `<source_file>:<line_number>` (if available) #### Columns | Column | Type | Nullable | Default | Indexed | Description | |--------|------|----------|---------|---------|-------------| | `col` | Type | Yes/No | value | PK / FK→`table` / ✓ / — | Description. Null rate: X% | #### Statistics > Only include sections that apply to the table. **Numeric columns** (min / max / mean): | Column | Min | Max | Mean | Null % | |--------|-----|-----|------|--------| **Categorical distributions** (for columns with ≤ 30 distinct values): | `col_name` value | Count | % | |-----------------|-------|---| **Date ranges**: | Column | Earliest | Latest | |--------|----------|--------| #### Sample Data > `birth_date` and `email` redacted. | id | col1 | col2 | ... | |----|------|------|-----| | 1 | val | val | ... | --- ## Enumerations ### `EnumName` > Used in: `table.column` | Value | Description | |-------|-------------| | `VALUE` | [what it means in business terms] | --- ## Entity Relationship Diagram ```mermaid erDiagram TABLE_A { int id PK string name int b_id FK } TABLE_B { int id PK string label } TABLE_A }o--|| TABLE_B : "belongs to"

--- ### Step 6: Mermaid ER Diagram Generate a `erDiagram` block. Rules: - Include all non-system tables - Show only the most important columns per table: PK, FKs, and 1–2 key business columns (skip audit fields like `created_at`) - Use Mermaid cardinality notation: - `||--||` one-to-one - `||--o{` one-to-many (optional) - `||--|{` one-to-many (required) - `}o--o{` many-to-many - Label each relationship with a short verb phrase (e.g. `"belongs to"`, `"has many"`) - If there are > 20 tables, split into domain sub-diagrams --- ### Step 7: Quality Checklist Before saving, verify: - [ ] Every non-system table is documented - [ ] Every column has a description - [ ] Null rates are shown for every nullable column - [ ] Numeric stats shown for all numeric columns - [ ] Categorical distribution shown for all enum/low-cardinality columns - [ ] Sample rows shown for every table (or noted as empty) - [ ] All enums documented in the Enumerations section - [ ] Mermaid ER diagram renders (valid syntax) - [ ] Quick Reference table at the top covers all tables - [ ] Data Quality Notes section lists any anomalies found - [ ] PII columns are redacted in sample rows --- ## Notes for Schema-Specific Parsing ### Prisma - `@id` → Primary Key - `@unique` / `@@unique([...])` → Unique constraint - `@@index([...])` → Non-unique index - `@default(...)` → Default value - Relations defined with `@relation` → Foreign key, note `fields` and `references` - Enums → Document in Enumerations section ### SQL DDL - `PRIMARY KEY` → PK - `REFERENCES table(col)` → FK - `NOT NULL` → Nullable: No - `DEFAULT val` → Default value - `CREATE INDEX` / `CREATE UNIQUE INDEX` → index ### Django models.py - `ForeignKey(Model, ...)` → FK; `on_delete` behavior worth noting - `null=True` → Nullable: Yes - `unique=True` → Unique index - `db_index=True` → Non-unique index - `choices=` → Document as categorical with allowed values ### SQLAlchemy - `primary_key=True` → PK - `ForeignKey('table.col')` → FK - `nullable=False` → Nullable: No - `index=True` → Non-unique index - `unique=True` → Unique index

Priority	Schema Type	Detection
1	Prisma	`prisma/schema.prisma` or any `*/.prisma` file
2	SQL DDL	`schema.sql`, `/schema.sql`, or `/*.sql` with `CREATE TABLE`
3	Django	`**/models.py` files
4	SQLAlchemy	Python files containing `Column(`, `relationship(`
5	No schema file	Fall back to introspecting the live DB directly

Pattern	Engine	Query tool
`file:` / `.db` / `.sqlite`	SQLite	`sqlite3 <file> "<sql>"`
`postgresql://` / `postgres://`	PostgreSQL	`psql "$DATABASE_URL" -t -A -c "<sql>"`
`mysql://`	MySQL	`mysql -e "<sql>" <db_name>`
`.duckdb` / `duckdb://`	DuckDB	`duckdb <file> "<sql>"`

List	Contents
New tables	Tables in current schema NOT present in existing file
Removed tables	Tables in existing file NOT present in current schema
Changed tables	Tables present in both, but with added or removed columns

Data Dictionary

Data Dictionary Generator

When to Use

Example Invocations

Data Dictionary

Data Dictionary Generator

When to Use

Example Invocations

Instructions

Step 0: Parse Arguments & Decide Mode

Step 1: Auto-Detect Schema Source

Step 2: Locate & Connect to the Database

Step 3: Query Live Data Statistics

3a. Row count & basic completeness

3b. Numeric column statistics

3c. Categorical column distributions

3d. DateTime ranges

3e. Sample rows (PII-aware)

Step 4: Index & Constraint Collection

Update Mode

UM-1: Parse the Existing File

UM-2: Diff Schema — Detect Changes

UM-3: Merge & Write

UM-4: Print Update Summary

Step 5: Write the Data Dictionary

File structure:

Glossary

Data Quality Notes

Visualization Expert

Data Analyst

Huggingface Hub

Multi Reviewer Patterns

Dbt Transformation Patterns

Startup Financial Modeling