Guided interactive walkthrough of the synthdata plugin's capabilities. Use this skill whenever the user asks for a "synthdata tutorial", "walkthrough", "demo", "guided tour", "show me what synthdata can do", "how does synthdata work", "getting started with synthdata", or wants to explore the synthetic-data skills step by step. Also trigger on "onboarding", "product tour", "feature overview", or "teach me how to generate fake data" in the context of synthdata / synthetic data generation.
A hands-on walkthrough of the synthdata skills. The tutorial covers five core skills hands-on (generate, extract, extend, anonymize, serve) and introduces the compute and prompt-builder skills at the end. Rather than explaining features abstractly, the user works with real datasets as they go — each step producing a file they can inspect. The arc: tour templates → generate from template → write a custom schema → extract → extend → anonymize → serve as MCP server.
Check that the Python dependencies are installed:
python3 -c "import openpyxl, faker, numpy, pandas, yaml, mcp" 2>&1 && echo "DEPS_READY" || echo "DEPS_MISSING"
If DEPS_MISSING, tell the user to run:
pip install openpyxl faker numpy pandas pyyaml mcp --break-system-packages
Pick a scratch directory for tutorial outputs (default: /tmp/synthdata-tutorial/). Create it with
before Step 1.
mkdir -pWhen triggered, introduce the walkthrough conversationally:
I'll walk you through synthdata's key capabilities with a working example. We'll tour the built-in templates, generate a dataset, write a custom YAML schema for something outside the catalog, extract it to JSON, extend it with more rows, anonymize a "real-looking" file, and serve a dataset as an MCP server that Claude can query. About 15-20 minutes. Ready?
Wait for confirmation before proceeding. Also ask: "Any particular domain you'd rather use?"
— if they say e-commerce, healthcare, IoT, etc., swap the template in Step 1 accordingly. Otherwise
default to hr-directory (small, fast, easy to reason about).
Skill used: synthdata-generate
What it demonstrates: The 12 built-in templates — each is a production-ready schema you can
use as-is or fork.
List the templates:
python3 <synthdata-generate>/scripts/generate.py --list-templates
Walk the user through the catalog. Group them so it's easier to scan:
People & orgs
hr-directory — employees, departments (2 tables, classic starter)crm-pipeline — contacts, companies, deals, activities (sales pipeline)Commerce & finance
ecommerce-orders — customers, products, orders, order_items (4 tables, FK chains)financial-transactions — accounts, customers, transactionssaas-metrics — accounts, users, events, subscriptions (usage + billing)Healthcare
healthcare-patients — patients, providers, encounters, claimshealthcare-hrm-security — users, threat events, phishing sims, training, DLP, monthly risk
(7 tables, richest template)Operations & telemetry
security-events — users, devices, alerts, incidentslog-events — services, requests, errorsiot-sensors — devices, readings, eventssurvey-responses — respondents, questions, responsesCustom starter
blank-slate — minimal schema to fork for your own domainExplain the three knobs that apply to every template:
blank-slatequick (~50-100 rows), medium (~1K), thorough (~5K+). Scales row counts and
event density but leaves schema shape intact.Offer to peek inside a template YAML so they see what a schema actually looks like:
cat <synthdata-generate>/templates/hr-directory.yaml
Point out the anatomy: name:, tables:, per-table rows: (with quick/medium/thorough tiers),
columns: with typed fields (id, faker, choice, int, float, date), optional
profiles: (behavioral segments), foreign_key: links between tables, and writers: at the
bottom. This is the same format they'll write from scratch in Step 2.
Now generate the chosen template at medium effort:
python3 scripts/generate.py --template hr-directory --effort medium \
--output /tmp/synthdata-tutorial/hr.xlsx --seed 42
Show the user the output — row counts per table, foreign-key integrity (e.g., every
employee.department_id resolves to a departments.department_id), and column distributions
(department weights, tenure mean, salary lognormal).
Bridge:
Twelve templates cover a lot of ground, but not everything. What if you need something that isn't in the catalog — say, a fleet of delivery drivers with route assignments?
Skill used: synthdata-generate
What it demonstrates: The schema format is the real product. Templates are just pre-written
YAML — users can write their own for any domain.
Write a minimal two-table schema to a file. Use something concrete that isn't in the catalog so it feels distinct from Step 1 — delivery drivers + trips works well:
cat > /tmp/synthdata-tutorial/drivers.yaml << 'EOF'