Prepare the mltoolkit session — load data, run EDA, auto-detect task type, create .mltoolkit scratchpad.
bash {PLUGIN_ROOT}/scripts/check-env.sh. Report any missing required packages..mltoolkit/ in user's CWD if missing. Add .mltoolkit/ to .gitignore (create .gitignore if absent).python {SKILL_DIR}/references/setup_reference.py --data <DATA> [--target <TARGET>] --output-dir .mltoolkitschema.csv and the generated figures. Present to user..mltoolkit/datasheet.md with the user. Ask them in order:
a. Data provenance (source, collection dates, sampling)
b. Consent & ethics (IRB status, consent basis, PHI/PII presence)
c. Protected attributes (race, ethnicity, sex, gender, age, zip, religion, disability, national origin, pregnancy, sexual orientation). Record each column name the user identifies as sensitive — these are later passed to classify/regress as .
d. (bias, coverage gaps, measurement issues)--sensitive-featuresmltoolkit:classify, mltoolkit:regress, mltoolkit:cluster, or mltoolkit:anomaly..mltoolkit/ before any downstream skill runs.