Run or continue model benchmarks, collect measured results, and refresh README/docs benchmark sections from generated artifacts. Use when benchmark tables in model docs need to be created, updated, or corrected.
Use this skill to update benchmark sections in model documentation from real benchmark outputs.
This skill focuses on:
results/It does not own sample image export. Use model-sample-image-export for that.
Always prefer:
tools/experimental/benchmarking/benchmark.pywith an appropriate config file.
If the stock benchmark path is insufficient for a specific model:
results/Only publish benchmark values when they come from actual artifacts, for example:
results/<model>_benchmark.csvruns/ or results/Never infer missing values.
When refreshing benchmark tables:
Common sections to refresh:
### Image-Level AUC### Pixel-Level AUC### Image F1 Score### Pixel F1 ScoreIf a README only contains placeholders, replace only the rows supported by measured results.
If the README benchmark state changes, update the matching docs page under:
docs/source/markdown/guides/reference/models/image/<model>.mddocs/source/markdown/guides/reference/models/video/<model>.mdThe docs page may stay shorter than the README, but it must not contradict it.
Before finishing: