Use when the user wants to create a dataset, generate synthetic data, or build a data generation pipeline.
Do not explore the workspace first. The workflow's Learn step gives you everything you need.
Build a synthetic dataset using the Data Designer library that matches this description:
$ARGUMENTS
Use Autopilot mode if the user implies they don't want to answer questions — e.g., they say something like "be opinionated", "you decide", "make reasonable assumptions", "just build it", "surprise me", etc. Otherwise, use Interactive mode (default).
Read only the workflow file that matches the selected mode, then follow it:
workflows/interactive.mdworkflows/autopilot.mdreferences/seed-datasets.md.references/person-sampling.md.sampler_type="category" with params=dd.CategorySamplerParams(...).prompt, system_prompt, and expr fields: reference columns with {{ column_name }}, nested fields with {{ column_name.field }}.SamplerColumnConfig: Takes params, not sampler_params.LLMJudgeColumnConfig produces a nested dict where each score name maps to {reasoning: str, score: int}. To get the numeric score, use the .score attribute. For example, for a judge column named quality with a score named correctness, use {{ quality.correctness.score }}. Using {{ quality.correctness }} returns the full dict, not the numeric score.data-designer CLI not found: Tell the user that data-designer is not installed in this environment (requires Python >= 3.10). Ask if they would like you to create a virtual environment and install it, or if they prefer to do it themselves. Do not install anything without the user's permission.Write a Python file to the current directory with a load_config_builder() function returning a DataDesignerConfigBuilder. Name the file descriptively (e.g., customer_reviews.py). Use PEP 723 inline metadata for dependencies.
# /// script
# dependencies = [
# "data-designer", # always required
# "pydantic", # only if this script imports from pydantic
# # add additional dependencies here
# ]
# ///
import data_designer.config as dd
from pydantic import BaseModel, Field
# Use Pydantic models when the output needs to conform to a specific schema
class MyStructuredOutput(BaseModel):
field_one: str = Field(description="...")
field_two: int = Field(description="...")
# Use custom generators when built-in column types aren't enough
@dd.custom_column_generator(
required_columns=["col_a"],
side_effect_columns=["extra_col"],
)
def generator_function(row: dict) -> dict:
# add custom logic here that depends on "col_a" and update row in place
row["name_in_custom_column_config"] = "custom value"
row["extra_col"] = "extra value"
return row
def load_config_builder() -> dd.DataDesignerConfigBuilder:
config_builder = dd.DataDesignerConfigBuilder()
# Seed dataset (only if the user explicitly mentions a seed dataset path)
# config_builder.with_seed_dataset(dd.LocalFileSeedSource(path="path/to/seed.parquet"))
# config_builder.add_column(...)
# config_builder.add_processor(...)
return config_builder
Only include Pydantic models, custom generators, seed datasets, and extra dependencies when the task requires them.