Compare a trained model against the production baseline for accuracy and F1 score.
This skill allows you to compare a new candidate model against the currently deployed production model to verify improvements in accuracy, precision, recall, and F1 score.
.pt).threading_ws/src/cherry_detection/resource/classification-2_26_2025-iter5.pt).../cherry_classification/data (or specified path).torch, torchvision, numpy, tqdm.Run the compare_models.py script from the repository root.
python training/scripts/compare_models.py \
--new-model <path/to/new_model.pt> \
--prod-model <path/to/production_model.pt> \
[--unnormalized] \
[--architecture <resnet50|mobilenet_v3_large|efficientnet_b0>]
--new-model: Path to the new candidate model.--prod-model: Path to the baseline model (usually threading_ws/src/cherry_detection/resource/classification-2_26_2025-iter5.pt).--unnormalized: Important! Use this flag if the new model was trained on 0-255 raw images (no ImageNet normalization). The production system typically uses unnormalized images.--architecture: Model architecture (default: resnet50).--device: cpu or cuda (default: auto-detect).Evaluate a new unnormalized ResNet50 model:
python training/scripts/compare_models.py \
--new-model training/experiments/resnet50_augmented_unnormalized/model_best_fixed.pt \
--prod-model threading_ws/src/cherry_detection/resource/classification-2_26_2025-iter5.pt \
--unnormalized
The script prints a comparison table.