Skip to main content
< All Topics
Print

Stable Diffusion Image Generation

name: stable-diffusion-image-generation

description: Generate and transform images using Stable Diffusion via the ITI multi-backend image generation API. Covers text-to-image, image-to-image, prompt engineering for diffusion models, model selection, and parameter tuning. For FLUX models (recommended for photorealism, text rendering, and color control), see the flux-image-generation skill instead. Use when generating images with community fine-tunes, LoRAs, or the Stable Diffusion ecosystem specifically.

Stable Diffusion Image Generation

Instructions

Generate images using the Stable Diffusion backend of the ITI image generation service at http://localhost:7860 (native) or http://stable-diffusion:7860 (Docker network).

Note: The image generation service now supports three backends: sd, flux-local, and flux-api. For FLUX-based generation (better photorealism, text rendering, hex color control), see the flux-image-generation skill. This skill covers the legacy Stable Diffusion backend.

Text-to-image generation:


curl -X POST http://localhost:7860/api/txt2img \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a professional product photo of a laptop on a minimalist desk, soft lighting, 4k",
    "negative_prompt": "blurry, low quality, distorted, watermark",
    "backend": "sd",
    "width": 768,
    "height": 512,
    "num_inference_steps": 50,
    "guidance_scale": 7.5,
    "seed": 42
  }'

Image-to-image transformation:


curl -X POST "http://localhost:7860/api/img2img?prompt=oil+painting+style&strength=0.7&backend=sd" \
  -F "image=@input.png"

Prompt engineering for Stable Diffusion:

  • Front-load the subject: “A golden retriever puppy, sitting in a garden”
  • Add style qualifiers: “digital art”, “oil painting”, “photorealistic”, “watercolor”, “3D render”
  • Add quality boosters: “highly detailed”, “4k”, “sharp focus”, “professional lighting”
  • Use negative prompts to exclude: “blurry, low quality, distorted, bad anatomy, watermark, text”
  • Weights work in some models: “(subject:1.3)” increases emphasis

Parameter tuning guide:

Parameter Range Effect
num_inference_steps 20-100 More steps = higher quality but slower; 30-50 is the sweet spot
guidance_scale 1-30 Higher = more prompt adherence; 7-12 is typical; >15 can oversaturate
width / height 128-1024 Must be multiples of 8; 512×512 is default; 768×512 for landscape
strength (img2img) 0.0-1.0 Higher = more deviation from input image; 0.5-0.8 is typical
seed any integer Fixed seed = reproducible results; omit for random

Model selection:

Model Best For Size
stabilityai/stable-diffusion-2-1 General purpose, good default ~5 GB
runwayml/stable-diffusion-v1-5 Largest ecosystem of community fine-tunes and LoRAs ~4 GB
stabilityai/stable-diffusion-xl-base-1.0 Highest quality output, native 1024×1024 ~7 GB

When to choose SD over FLUX:

  • You need community fine-tunes or LoRAs (SD has the largest ecosystem)
  • You’re working with an existing SD-based workflow
  • VRAM is limited (~5 GB vs ~13 GB for FLUX Klein)
  • Specific SD model fine-tunes match your use case

Resource awareness:

  • Docker CPU mode: ~120s per 512×512 image, ~4 GB RAM
  • Native MPS (M3 Max): ~8-12s per 512×512 image, ~5 GB RAM
  • For batch generation, prefer native mode to avoid Docker CPU overhead
  • First generation after startup is slower due to model loading (~30-60s additional)

Related skills:

  • flux-image-generation — FLUX models for superior photorealism, text rendering, and color control
  • flux-operations — FLUX backend deployment, BFL API credentials, Klein local setup
  • stable-diffusion-operations — SD service deployment and management
Table of Contents