Evaluate trained machine learning models with the right metrics and comparison logic. Use for benchmark review, threshold selection, calibration, validation, and model comparison; not for feature engineering or leakage auditing.
Use this skill when the model exists and the question is whether it is good enough.
This skill focuses on choosing and interpreting the right evaluation metrics for the problem, then comparing candidate models or thresholds.
training-machine-learning-modelsengineering-features-for-machine-learningml-data-leakage-guardconfusion-matrix-generator for class-level error breakdownsscientific-reporting when the evaluation must become a deliverable