Evaluate generated video quality using available metrics (SSIM, loss trajectory, caption consistency)
Assess the quality of videos generated by a training run. Combines multiple signals to give a holistic quality assessment. This skill is evolving — new metrics will be added as they are developed.
| Parameter | Required | Description |
|---|---|---|
video_paths | Yes | List of paths to generated videos |
reference_paths | No | Paths to reference videos (for SSIM) |
prompts | No | Prompts used to generate videos (for caption check) |
loss_summary| No |
| Path to W&B summary JSON (for loss trajectory) |
metrics | No | Which metrics to run (default: all available) |
Check .agents/memory/evaluation-registry/README.md for the current catalog.
Leverages the existing infrastructure in fastvideo/tests/ssim/.
pytest fastvideo/tests/ssim/ -vs --video-path <generated> --reference-path <reference>
Or use the SSIM utility directly:
from fastvideo.tests.ssim.ssim_utils import compute_ssim
score = compute_ssim(generated_video, reference_video)
# score > 0.85 is typically "acceptable"
Interpretation:
| SSIM Range | Quality |
|---|---|
| > 0.90 | Excellent — very close to reference |
| 0.80–0.90 | Good — acceptable for most uses |
| 0.70–0.80 | Fair — noticeable differences |
| < 0.70 | Poor — significant quality issues |
Analyze the loss curve shape from W&B summary:
import json
with open(loss_summary_path) as f:
summary = json.load(f)
final_loss = summary["train_loss"]
runtime = summary["_runtime"]
steps = summary["_step"]
Early-stage heuristics (first 500 steps):
Use an LLM to evaluate whether the video content matches the input prompt.
Prompt: "A golden retriever playing in the snow"
Video: <path>
Score the video on:
1. Object presence (is there a golden retriever?)
2. Action accuracy (is it playing?)
3. Environment match (is there snow?)
4. Overall coherence (does it look natural?)
Each 1-5, total /20.
⚠️ This metric is in draft status. Results should not be treated as ground truth until calibrated against human judgments.
.agents/memory/evaluation-registry/README.md.## Video Quality Report: <experiment_name>
| Metric | Score | Threshold | Status |
|--------|-------|-----------|--------|
| SSIM (avg) | 0.87 | > 0.80 | ✅ Pass |
| Loss trajectory | decreasing | decreasing | ✅ Pass |
| Caption consistency | 16/20 | > 14/20 | ✅ Pass |
### Per-Video Scores
| Video | SSIM | Caption |
|-------|------|---------|
| video_001.mp4 | 0.89 | 17/20 |
| video_002.mp4 | 0.85 | 15/20 |
fastvideo/tests/ssim/ — SSIM test infrastructurefastvideo/tests/training/Vanilla/test_training_loss.py — loss comparison.agents/memory/evaluation-registry/README.md — metric catalog| Date | Change |
|---|---|
| 2026-03-02 | Initial version with SSIM, loss trajectory, caption consistency stub |