scikit-learn pipeline template

The reference layout every tabular bundle in ManagerPack copies. The goal is consistency: same project structure, same MLflow conventions, same load/predict path, same notebook style. Whatever the actual problem, the plumbing is identical.

The worked example is "is this coin fair?" — a logistic regression on (flip_index, outcome). The model is trivial, the plumbing is the point.

Project layout

<bundle>/
├── README.md             # what this bundle does + how to run it
├── SKILL.md              # this file (or specialized for the bundle)
├── src/
│   ├── train.py          # train + log to MLflow
│   ├── predict.py        # load model from MLflow and predict
│   └── plots.py          # plot helpers, logged as MLflow artifacts
├── notebooks/
│   └── <name>_demo.py    # marimo notebook with mo.ui.slider
└── mlruns/               # MLflow tracking store (gitignored)

The data lives outside the bundle, in studio/data/<problem>.parquet, generated by datagen <problem>. Bundles never carry their own data — they consume parquet from the studio's shared data directory.

scikit-learn pipeline template

The worked example is "is this coin fair?" — a logistic regression on (flip_index, outcome). The model is trivial, the plumbing is the point.

Project layout

<bundle>/
├── README.md             # what this bundle does + how to run it
├── SKILL.md              # this file (or specialized for the bundle)
├── src/
│   ├── train.py          # train + log to MLflow
│   ├── predict.py        # load model from MLflow and predict
│   └── plots.py          # plot helpers, logged as MLflow artifacts
├── notebooks/
│   └── <name>_demo.py    # marimo notebook with mo.ui.slider
└── mlruns/               # MLflow tracking store (gitignored)

Kind	What
`params`	data path, n_rows, seed, test_size, cv_folds, model name, hyperparameters
`metrics`	cv mean & std of the held-out metric, test set score, recovery error when ground truth is known
`tags`	`data_hash` (sha256 prefix), `true_*` ground truth values from the sidecar
`artifacts`	model (via `mlflow.sklearn.log_model`), plots/, data/sidecar.json

Sklearn Pipeline Template

scikit-learn pipeline template

Project layout

Sklearn Pipeline Template

scikit-learn pipeline template

Project layout

Conventions

Pipeline shape

MLflow logging

Model serialization

Loading the model elsewhere

Coefficient interpretation

Plots

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns