Name: Stable Diffusion Image Generation
Author: Thedougler

Stable Diffusion — Expert Decision Guide

Claude already knows Diffusers API basics. This skill adds expert judgment for non-obvious choices.

Model selection decision tree

Need	Model	Why
General purpose, fast	SD 1.5	Smallest VRAM, largest LoRA ecosystem
High quality, 1024px native	SDXL	Best quality/speed tradeoff today
Best text rendering	SD 3.0 / SD 3.5	T5 encoder understands text layout
Fastest single-step	SDXL + LCM LoRA	4 steps, `guidance_scale=1.0` — MUST use LCM scheduler
Best open-source overall	Flux.1 [dev]	Superior prompt following, but ~24GB VRAM

Model	Sweet spot	Too low	Too high
SD 1.5	7-9	<5: ignores prompt	>12: oversaturated, artifacts
SDXL	5-8	<3: unfocused	>10: harsh contrast
SD 3.x	4-7	<2: random	>8: burnt highlights
Flux	3.5 (fixed)	N/A	N/A — Flux uses guidance embedding, not CFG
LCM	1.0 (fixed)	N/A	>2: destroys output completely

I want to preserve...	ControlNet	Preprocessor	Key gotcha
Overall structure	`canny`	Canny edge detection	Low/high thresholds matter hugely — default (100,200) often too aggressive
Human pose only	`openpose`	OpenPose	Fails silently on non-human subjects
3D spatial layout	`depth`	MiDaS / Zoe	`conditioning_scale` > 1.0 causes depth map to override prompt entirely
Architectural lines	`mlsd`	M-LSD	Only detects straight lines — useless for organic subjects
Rough concept	`scribble`	HED / pidinet	Most forgiving — good starting point when unsure