Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detecting outliers in lab results, or converting raw CRF data to CDISC format. Cleans and standardizes clinical trial data for regulatory compliance with audit trails.
Clean, validate, and standardize clinical trial data to meet CDISC SDTM standards for regulatory submissions to FDA or EMA.
scripts/main.py.references/ for task-specific guidance.Python: 3.10+. Repository baseline for current packaged skills.numpy: unspecified. Declared in requirements.txt.pandas: unspecified. Declared in requirements.txt.scipy: unspecified. Declared in requirements.txt.cd "20260318/scientific-skills/Data Analytics/clinical-data-cleaner"
python -m py_compile scripts/main.py
python scripts/main.py --help
Example run plan:
CONFIG block or documented parameters if the script uses fixed settings.python scripts/main.py with the validated inputs.See ## Workflow above for related details.
scripts/main.py.references/ contains supporting rules, prompts, or checklists.Use this command to verify that the packaged script entry point can be parsed before deeper execution.
python -m py_compile scripts/main.py
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
python -m py_compile scripts/main.py
python scripts/main.py --help
python scripts/main.py --input "Audit validation sample with explicit symptoms, history, assessment, and next-step plan."
from scripts.main import ClinicalDataCleaner
# Initialize for Demographics domain
cleaner = ClinicalDataCleaner(domain='DM')
# Clean data with default settings
cleaned = cleaner.clean(raw_data)
# Save with audit trail
cleaner.save_report('output.csv')
cleaner = ClinicalDataCleaner(domain='DM') # or 'LB', 'VS'
is_valid, missing = cleaner.validate_domain(data)
Required Fields:
cleaner = ClinicalDataCleaner(
domain='DM',
missing_strategy='median' # mean, median, mode, forward, drop
)
cleaned = cleaner.handle_missing_values(data)
cleaner = ClinicalDataCleaner(
domain='LB',
outlier_method='domain', # iqr, zscore, domain
outlier_action='flag' # flag, remove, cap
)
flagged = cleaner.detect_outliers(data)
Clinical Thresholds:
| Parameter | Range | Unit |
|---|---|---|
| Glucose | 50-500 | mg/dL |
| Hemoglobin | 5-20 | g/dL |
| Systolic BP | 70-220 | mmHg |
standardized = cleaner.standardize_dates(data)
# Converts to ISO 8601: 2023-01-15T09:30:00
cleaner = ClinicalDataCleaner(
domain='DM',
missing_strategy='median',
outlier_method='iqr',
outlier_action='flag'
)
cleaned_data = cleaner.clean(data)
cleaner.save_report('output.csv')
Output Files:
output.csv - Cleaned SDTM dataoutput.report.json - Audit trail for regulatory submission
# Clean demographics
python scripts/main.py \
--input dm_raw.csv \
--domain DM \
--output dm_clean.csv \
--missing-strategy median \
--outlier-method iqr \
--outlier-action flag
# Clean lab data with clinical thresholds
python scripts/main.py \
--input lb_raw.csv \
--domain LB \
--output lb_clean.csv \
--outlier-method domain
See references/common-patterns.md for detailed examples:
See references/troubleshooting.md for solutions to:
Pre-Cleaning:
Post-Cleaning:
references/sdtm_ig_guide.md - CDISC SDTM Implementation Guidereferences/domain_specs.json - Domain-specific field requirementsreferences/outlier_thresholds.json - Clinical outlier thresholdsreferences/common-patterns.md - Detailed usage patternsreferences/troubleshooting.md - Problem-solving guideSkill ID: 189 | Version: 2.0 | License: MIT
Every final response should make these items explicit when they are relevant:
scripts/main.py fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.This skill accepts requests that match the documented purpose of clinical-data-cleaner and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
clinical-data-cleaneronly handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
Use the following fixed structure for non-trivial requests:
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.