Stable Diffusion uses a process called denoising diffusion — it learns to generate images by learning how to reverse the process of adding noise to images.
The Two-Phase Process
- Forward process (training): Take real images, gradually add random noise until they become pure static. Train the model to predict what was removed at each step.
- Reverse process (inference): Start with random noise. Repeatedly apply the learned denoiser, guided by your text prompt, until a coherent image emerges.
The Latent Space Trick
SD doesn't work on full-resolution pixels — it works in a compressed "latent space" 8x smaller. This is what makes it fast enough to run on consumer GPUs.
Reference: