Side-by-side comparison of image generation outputs across different AI providers. Use after rendering the same prompt through multiple providers to evaluate quality, style accuracy, and cost.
Analyze and compare interior design renders from different AI providers to help the user choose the best output or provider for their needs.
Before comparing, check what renders and data are available.
!ls projects/${PROJECT_NAME}/renders/*.png 2>/dev/null | head -20 && echo "HAS_RENDERS=true" || echo "HAS_RENDERS=false"
!ls projects/${PROJECT_NAME}/renders/ 2>/dev/null | grep -c '.png' && echo "RENDER_COUNT above"
!ls projects/${PROJECT_NAME}/references/preprocessed/depth_map.png 2>/dev/null && echo "HAS_DEPTH=true" || echo "HAS_DEPTH=false"
!ls projects/${PROJECT_NAME}/references/*.{jpg,jpeg,png,webp} 2>/dev/null && echo "HAS_REFERENCE=true" || echo "HAS_REFERENCE=false"
!ls projects/${PROJECT_NAME}/style-config.yaml 2>/dev/null && echo "HAS_STYLE_CONFIG=true" || echo "HAS_STYLE_CONFIG=false"
!ls projects/${PROJECT_NAME}/notes.md 2>/dev/null && echo "HAS_NOTES=true" || echo "HAS_NOTES=false"
HAS_RENDERS=false OR RENDER_COUNT < 2?
└── STOP. Need at least 2 renders to compare. Run /render with multiple providers first.
HAS_DEPTH=true AND HAS_REFERENCE=true?
└── Include "Layout Preservation" criterion in comparison.
Auto-run /validate-layout for each render and include SSIM scores in the table.
HAS_DEPTH=false AND HAS_REFERENCE=true?
└── Layout comparison available but depth map missing.
Suggest running /preprocess-room first for quantitative layout scoring.
OR skip layout criterion if user doesn't care about layout fidelity.
HAS_STYLE_CONFIG=true?
└── Read style-config.yaml — use configured colors, materials, style as ground truth for "Style Accuracy" scoring.
Don't judge style accuracy by memory — check against the config.
HAS_NOTES=true?
└── Read notes.md — check for previous comparison results and feedback trends.
Append new comparison results to notes.md.
No reference photo (text-to-image only)?
└── Skip "Layout Preservation" criterion entirely — not applicable.
Data-driven comparison > subjective comparison. Use project config as ground truth wherever possible.
/render (or image paths provided by user)Evaluate each provider's output on:
/validate-layout): quantitative layout fidelity metric## Model Comparison: [Style] [Room]
| Criteria | Gemini | OpenAI | Stability | Grok |
|-----------------|--------|--------|-----------|------|
| Style Accuracy | ★★★★ | ★★★★ | ★★★★ | ★★★ |
| Photorealism | ... | ... | ... | ... |
| Composition | ... | ... | ... | ... |
| Detail Quality | ... | ... | ... | ... |
| Color Fidelity | ... | ... | ... | ... |
| Layout Fidelity | ... | ... | ... | ... |
### Best For:
- **Client presentation**: [provider] — [reason]
- **Design exploration**: [provider] — [reason]
- **Quick iteration**: [provider] — [reason]
- **Budget-conscious**: [provider] — [reason]
### Recommendations:
- [Specific refinement suggestions per provider]
Suggest /refine for the most promising output to iterate further.
/refine — after identifying the best provider output, use /refine to iterate on that specific prompt+provider combination. Pass the original prompt and specific feedback from the comparison./render — if the comparison reveals all outputs are poor, consider re-rendering with adjusted prompts. Go back to /generate-prompt if the prompts themselves need rework./generate-prompt — if comparison shows systematic prompt issues (wrong style keywords, bad composition across all providers), the prompts need regeneration, not refinement./style-guide — reference when evaluating style accuracy. Use the style's specific keywords/materials/colors as the ground truth for scoring./edit-design — if one render is close but needs specific changes (swap a piece of furniture, adjust lighting), /edit-design can target those changes rather than re-rendering from scratch./validate-layout — provides quantitative SSIM scores for layout comparison. Run on each provider's output to get objective layout fidelity metrics./preprocess-room — the source of truth for the original room's depth map used in layout comparison.