Build a production-ready multiclass classifier on tabular data using XGBoost. Use when the user needs to predict one of several discrete classes from tabular features (product category, sentiment level, customer segment, intent, fault type). Covers per-class metrics, confusion matrix analysis, sample weighting for imbalance, top-K accuracy, and SHAP. Default to this for any tabular multiclass problem.
For tabular multiclass classification, default to XGBoost. Same
reasoning as binary: it dominates Kaggle and real-world tabular
benchmarks, handles class imbalance with sample_weight, and gives
you SHAP-based explanations as a side effect.
The differences from binary are real and worth understanding:
objective="multi:softprob" + num_class=N instead of
binary:logisticsample_weight per row for imbalance (no scale_pos_weight
for multiclass)predict_proba returns shape and you
often want top-K, not just argmax(n_samples, n_classes)binary-classificationmultilabel-classification<project>/
├── data/ # input parquet/csv
├── src/
│ ├── train.py # Pipeline + XGBClassifier(multi:softprob) + MLflow
│ ├── predict.py # reload, return top-K predictions per row
│ └── plots.py # confusion matrix, per-class metrics, ROC OvR, SHAP
├── notebooks/
│ └── demo.py # marimo walkthrough
└── mlruns/
Same pattern as the other tabular bundles. Use ibis to read; materialize
once with .execute() for sklearn:
import ibis
table = ibis.duckdb.connect().read_parquet("data/train.parquet")
feature_cols = [c for c in table.columns if c.startswith("feature_")]
data = (
table
.select(*feature_cols, "target")
.execute()
)
X = data[feature_cols]
y = data["target"].astype(int)
n_classes = int(y.max()) + 1
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier
def build_pipeline(feature_cols, n_classes, seed):
return Pipeline([
("preprocess", ColumnTransformer([("num", StandardScaler(), feature_cols)])),
("clf", XGBClassifier(
n_estimators=300,
max_depth=5,
learning_rate=0.05,
subsample=0.8,
colsample_bytree=0.8,
reg_lambda=1.0,
objective="multi:softprob",
num_class=n_classes,
eval_metric="mlogloss",
random_state=seed,
n_jobs=-1,
)),
])
The only changes from binary are objective, num_class, and
eval_metric.
Accuracy on a 5-class problem can be 80% while the model completely fails on one class. You need per-class precision, recall, F1, and