Implement and review base model training for tabular ML competitions: CatBoost, LightGBM, XGBoost, and Neural Networks. Use when: writing or reviewing trainer files and model entrypoints; implementing per-framework competition metric wrappers (CB/LGB/XGB/NN); aligning early-stopping metric with the competition objective; identifying correct prediction type for submission (probability vs class label vs value); debugging train OOF vs LB gaps caused by wrong metric or output format; handling auxiliary/prior data. NOT for hyperparameter tuning, ensemble, or pseudo-labeling.
This skill covers three tightly coupled concerns for base model training:
competition_score single source of truth; per-framework wrappers; the CatBoost logit vs prediction API difference that silently destroys scoressample_submission.csv; OOF collection patternsThe two most expensive silent bugs:
Identify your task type first — it determines which parameters, objectives, and techniques apply.
| Task Type | Framework objectives | Binary-only params to remove |
|---|
| Binary classification | XGB: binary:logistic · LGB: binary · CB: CatBoostClassifier | — (all apply) |
| Regression | XGB: reg:squarederror · LGB: regression · CB: CatBoostRegressor | scale_pos_weight, is_unbalance, auto_class_weights: Balanced, threshold=0.5 |
| Multiclass | XGB: multi:softprob · LGB: multiclass · CB: MultiClass loss | scale_pos_weight, is_unbalance |
| Multi-label | N independent binary models OR single NN with N sigmoid heads | — (each target is binary) |
| Ranking | XGB: rank:pairwise · LGB: lambdarank · CB: YetiRank | all imbalance params |
| Tree models (CB/LGB/XGB) | Neural Network | |
|---|---|---|
| Training objective | Built-in BCE / squared-error (internal to framework) | Custom loss: FocalLoss, SmoothBCE, MSELoss, BCELoss |
| Eval metric | Custom competition metric wrapper | Competition score computed per epoch |
| Early stopping driven by | Eval metric | Competition score |
Getting the eval metric wrong wastes all training — early stopping fires at the wrong iteration.
| Framework | Metric injection | Input type in callback |
|---|---|---|
| CatBoost binary | eval_metric=CatBoostCompMetric() | Raw logits → apply sigmoid |
| CatBoost regression | eval_metric=CatBoostCompMetric() | Raw predictions → use directly |
| CatBoost multiclass | eval_metric=CatBoostCompMetric() | K logit arrays → apply softmax |
| LightGBM | "metric": "None" + feval=make_lgb_feval() | Already-transformed probs/values |
| XGBoost | "disable_default_eval_metric": 1 + custom_metric=make_xgb_eval() + maximize=True | Already-transformed probs/values |
| NN | Compute in epoch loop; model.load_state_dict(best_state) | You control activation in forward() |
Single source of truth: define competition_score(y_true, y_pred) -> float once in base/metrics.py. competition_score always maximizes — negate RMSE/MAE/logloss when the leaderboard is "lower is better".
Critical CatBoost API difference:
CatBoostClassifier: approxes[0] = raw logits → must apply sigmoid (binary) or softmax (multiclass)CatBoostRegressor: approxes[0] = raw prediction values → no activation neededThe metric determines the prediction type. The target column values in
train.csvdo NOT.
| Metric | Prediction type | predict call |
|---|---|---|
| AUC-ROC, PR-AUC, Log Loss, Brier | float [0, 1] | model.predict_proba(X)[:, 1] |
| Accuracy, F1, Cohen's Kappa, QWK | class label | model.predict(X) |
| RMSE, MAE, RMSLE, MAPE | continuous value | model.predict(X) |
| Multi-class log loss | prob matrix (n, K) | model.predict_proba(X) |
| MAP@K, NDCG@K | ranked list | task-specific |
sample_submission.csv, not train.csv target dtypesample_submission.csv exactlyoof[val] = model.predict_proba(X[val])[:, 1], never fold-by-fold averagessample_submission.csv?sample_submission.csv, not train.csv.)sample_submission.csv.va_idx[va_idx < n_train] only| File | What it covers |
|---|---|
| model-training.md | CB/LGB/XGB/NN params by task type, trainer architecture, training objective vs eval metric |
| competition-metrics.md | competition_score pattern, per-framework metric wrappers (CB/LGB/XGB/NN), training losses |
| output-format.md | Metric → prediction type table, submission format by task, OOF collection patterns, scout checklist |
| Skill | When to use it instead |
|---|---|
ml-competition | Full pipeline overview, task type decision guide, first-principles checklist |
ml-competition-setup | Project structure, RunConfig, process management |
ml-competition-features | Feature engineering, validation strategy |
ml-competition-tuning | Optuna hyperparameter tuning |
ml-competition-advanced | Pseudo-labeling, ensemble, post-processing, experiment tracking |
ml-competition-quality | Coding rules, common pitfalls |