Experimenting with StableDiffusion XL

A detailed walkthrough of my experience setting up and using StableDiffusion XL for creating AI-generated images

Installing Required Packages

First, I installed all the necessary Python packages:

# Install core dependencies
pip install diffusers
pip install torch
pip install transformers

# Install accelerate for better performance
pip install accelerate

Initial Setup

After installing the packages, I imported the necessary libraries and set up the StableDiffusion XL pipeline:

from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import torch

# Load SDXL with dtype torch.float16 for 16GB VRAM
pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
)

Initial output showed a warning about accelerate:

Cannot initialize model with low cpu memory usage because `accelerate` was not found in the environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install `accelerate` for faster and less memory-intense model loading.

Following this warning, I installed accelerate:

pip install accelerate

After installing accelerate, I ran the pipeline setup again with no warnings.

Moving to GPU

Hardware Configuration

GPU: NVIDIA GeForce RTX 4080 SUPER

Memory Total: 16376.0 MB

Memory Free: 14628.0 MB

Memory Used: 1420.0 MB

With my hardware configuration in place, I moved the model to GPU:

pipeline.to("cuda")

Prompt Engineering and Token Limits

Initially, I wrote a detailed prompt to create a specific image. However, I discovered that StableDiffusion XL has a token limit of 77 tokens per prompt. Here's how I adapted my prompt:

# Initial Prompt
prompt = """
orange-themed outfit. The kid has a bright, playful demeanor 
with a beaming smile. The outfit features a orange-like hat in 
shades of purple and green, and a dress designed to resemble the 
interior and peel of a orange, complete with lush green foliage and pink rose decorations. 
Surrounding the character are real oranges and leaves, blending realism with fantasy. 
The scene is painted on a white background, with the real orange and leaves displayed beside 
the painted character. Vibrant and colorful, the artwork evokes a garden 
fairy-tale aesthetic
"""

Initial Prompt (Too Long)

Token indices sequence length is longer than the specified maximum sequence length for this model (110 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['is painted on a white background, with the real orange and leaves displayed beside the painted character. vibrant and colorful, the artwork evokes a garden fairy - tale aesthetic']

# Optimized Prompt
prompt = """
A kid in a orange-themed outfit with a playful smile, 
wearing a purple-green orange hat and a dress resembling a orange's interior,
adorned with green foliage and pink roses. Real oranges and leaves surround 
the character, blending fantasy and realism on a white background. 
Vibrant and fairy-tale-like.
"""

Negative Prompt

negative_prompt = """
bad anatomy, bad proportions, blurry, cropped, deformed,
disfigured, distorted features, extra limbs, extra fingers,
gross proportions, incorrect eyes, long neck, malformed limbs,
missing arms, missing legs, morbid, mutated hands, mutation,
out of frame, poorly drawn face, poorly drawn hands, mutated fingers
"""

Generating the Image

Finally, I generated the image using my optimized prompt:

image = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=40,
    guidance_scale=7.5,
    width=1024,
    height=1024
).images[0]

# Save the generated image
image.save("orange-theme.png")

Generated Image

Here's the AI-generated image based on my optimized prompt:

An AI-generated illustration of a kid in an orange-themed outfit, created using StableDiffusion XL

Key Learnings

Through this experiment, I learned several important aspects of working with StableDiffusion XL:

The importance of using torch.float16 for efficient memory usage with 16GB VRAM
The benefit of installing accelerate for faster model loading
The critical role of prompt length and its impact on generation
The effectiveness of well-crafted negative prompts in improving output quality