Name: Setup
Author: olaTechie

mltoolkit Setup Playbook

Workflow

Check environment: run bash {PLUGIN_ROOT}/scripts/check-env.sh. Report any missing required packages.
Locate data: confirm the data path with the user.
Create .mltoolkit/ in user's CWD if missing. Add .mltoolkit/ to .gitignore (create .gitignore if absent).
Run setup_reference: python {SKILL_DIR}/references/setup_reference.py --data <DATA> [--target <TARGET>] --output-dir .mltoolkit
Read schema.csv and the generated figures. Present to user.
Fill out .mltoolkit/datasheet.md with the user. Ask them in order: a. Data provenance (source, collection dates, sampling) b. Consent & ethics (IRB status, consent basis, PHI/PII presence) c. Protected attributes (race, ethnicity, sex, gender, age, zip, religion, disability, national origin, pregnancy, sexual orientation). Record each column name the user identifies as sensitive — these are later passed to classify/regress as . d. (bias, coverage gaps, measurement issues)