Design and generate synthetic datasets, personas, event streams, fixtures, labels, and sandbox records that preserve useful structure without exposing real people or production secrets. Use when the task is to simulate realistic-but-safe data for testing, demos, analytics development, ML experiments, privacy-preserving sharing, edge-case coverage, or scenario generation. Do not use for simple anonymization-only requests, live data migration, or statistical analysis of existing datasets.
Generate synthetic data that is usable, explainable, and hard to confuse with real records.
Use this skill for:
Do not use this skill for:
Synthetic data is only good if it serves a job. Optimize for the stated purpose:
Follow the fidelity / utility / privacy triad. Do not maximize one dimension blindly at the expense of the others.
Clarify:
If the use case is fuzzy, pick a conservative target and state assumptions.
Preserve the parts of reality that the downstream task depends on:
Do not generate decorative randomness. Generate structure.
State which method is being used:
Watch for indirect disclosure through rare combinations, long tails, free text, geolocation, timestamps, and identifiers that can be linked externally. If privacy risk is meaningful, call out membership-inference or linkage concerns instead of implying safety.
Prefer a documented recipe over opaque filler:
For related tables or streams, keep referential integrity and lifecycle logic intact.
Check at least these dimensions:
Prefer:
Avoid:
Return one or more of:
When useful, structure the answer as:
prompt.md for execution posture and response styleexamples/README.md for representative requests and output shapeguides/qa-checklist.md for final review standardsmeta/skill.json for machine-readable metadata