Conversation-driven specification and execution of healthcare data generation at scale
Use these skills when building specifications for data generation or executing batch generation.
| Skill | Use When | Location |
|---|---|---|
| Profile Builder | Defining population characteristics for batch generation | builders/profile-builder.md |
| Journey Builder | Defining temporal event sequences | builders/journey-builder.md |
| Quick Generate | Simple single-entity generation | builders/quick-generate.md |
| Profile Executor | Executing a profile specification | executors/profile-executor.md |
| Journey Executor | Executing a journey specification | executors/journey-executor.md |
| Cross-Domain Sync | Coordinating cross-product generation | executors/cross-domain-sync.md |
Build a profile for 200 Medicare Advantage members:
- Age 65-85, normal distribution centered at 74
- 40% with Type 2 diabetes
- 30% with heart failure
- San Diego County geography
Create a first-year diabetic journey:
- Initial diagnosis visit with labs
- Metformin prescription
- Quarterly follow-ups with A1c
- Possible titration to second agent
Generate the cohort using this profile and journey
Save as cohort "ma-diabetic-cohort-2025"
| Type | Use Case | Example |
|---|---|---|
categorical | Discrete choices | Gender: M/F/Other |
normal | Bell curve | Age centered at 72 |
log_normal | Skewed positive | Healthcare costs |
uniform | Equal probability | Random day in range |
explicit | Specific values | Exactly these NDCs |
See distributions/distribution-types.md for details.
Generated cohorts should use statistical distributions that match real-world population characteristics. The goal is realistic synthetic data, not random noise.
How to match a target population:
skills/populationsim/) provides county- and tract-level benchmarks for prevalence and social determinants.| Pattern | Use Case | Example |
|---|---|---|
linear | Simple sequence | Office visit → Lab → Follow-up |
branching | Decision points | ER → Admit OR Discharge |
protocol | Trial schedules | Cycle 1 Day 1, Day 8, Day 15 |
lifecycle | Long-term patterns | New member first year |
See the journeys/ folder for pattern details.
The Generative Framework orchestrates all HealthSim products:
| When Generating | Products Involved | Cross-Domain Triggers |
|---|---|---|
| Patient cohort | PatientSim, NetworkSim | Provider assignment |
| Member claims | MemberSim, PatientSim, NetworkSim | Encounter → Claim |
| Pharmacy fills | RxMemberSim, PatientSim, NetworkSim | Rx → Fill, DUR check |
| Trial subjects | TrialSim, PatientSim, NetworkSim | Subject ↔ Patient linking |
Pre-built profiles and journeys for common use cases:
All generated data is synthetic and fictional. HealthSim produces simulated test data only. Never present generated records as real patient data.
Generated data should reference recognized healthcare code systems:
| System | Use |
|---|---|
| ICD-10 | Diagnosis codes |
| CPT / HCPCS | Procedure codes |
| LOINC | Lab / observation codes |
| RxNorm / NDC | Medication identifiers |
| SNOMED CT | Clinical terminology |
| NPI | Provider identifiers |
Real reference data (NPI registry, CMS facility files, published code sets) is safe to use. Synthetic data is generated for all patient/member-level entities.
When generating data, handle incomplete or invalid inputs gracefully:
| Situation | How to Handle |
|---|---|
| No age range specified | Default to plan-appropriate range (Medicare: 65-95, Commercial: 18-64, Medicaid Pediatric: 0-18) |
| No geography specified | Omit geographic constraints; generate nationally representative distribution |
| Missing condition prevalence | Use published population baselines (e.g., CDC PLACES prevalence rates) |
| Incomplete journey steps | Generate the specified steps; warn the user about gaps rather than inventing steps silently |
| Unknown or invalid ICD-10/CPT code | Reject the code and ask the user to verify; never silently substitute a different code |
These are common mistakes to avoid:
Before returning generated data, verify:
Implementation Status: Foundation phase. See GENERATIVE-FRAMEWORK-PROGRESS.md for details.