Experimenting with StableDiffusion XL
A detailed walkthrough of my experience setting up and using StableDiffusion XL for creating AI-generated images
Installing Required Packages
First, I installed all the necessary Python packages:
# Install core dependencies
pip install diffusers
pip install torch
pip install transformers
# Install accelerate for better performance
pip install accelerate
Initial Setup
After installing the packages, I imported the necessary libraries and set up the StableDiffusion XL pipeline:
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import torch
# Load SDXL with dtype torch.float16 for 16GB VRAM
pipeline = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
)
Initial output showed a warning about accelerate:
Cannot initialize model with low cpu memory usage because `accelerate` was not found in the environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install `accelerate` for faster and less memory-intense model loading.
Following this warning, I installed accelerate:
pip install accelerate
After installing accelerate, I ran the pipeline setup again with no warnings.
Moving to GPU
Hardware Configuration
With my hardware configuration in place, I moved the model to GPU:
pipeline.to("cuda")
Prompt Engineering and Token Limits
Initially, I wrote a detailed prompt to create a specific image. However, I discovered that StableDiffusion XL has a token limit of 77 tokens per prompt. Here's how I adapted my prompt:
# Initial Prompt
prompt = """
orange-themed outfit. The kid has a bright, playful demeanor
with a beaming smile. The outfit features a orange-like hat in
shades of purple and green, and a dress designed to resemble the
interior and peel of a orange, complete with lush green foliage and pink rose decorations.
Surrounding the character are real oranges and leaves, blending realism with fantasy.
The scene is painted on a white background, with the real orange and leaves displayed beside
the painted character. Vibrant and colorful, the artwork evokes a garden
fairy-tale aesthetic
"""
Initial Prompt (Too Long)
Token indices sequence length is longer than the specified maximum sequence length for this model (110 > 77). Running this sequence through the model will result in indexing errors The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['is painted on a white background, with the real orange and leaves displayed beside the painted character. vibrant and colorful, the artwork evokes a garden fairy - tale aesthetic']
# Optimized Prompt
prompt = """
A kid in a orange-themed outfit with a playful smile,
wearing a purple-green orange hat and a dress resembling a orange's interior,
adorned with green foliage and pink roses. Real oranges and leaves surround
the character, blending fantasy and realism on a white background.
Vibrant and fairy-tale-like.
"""
Negative Prompt
negative_prompt = """
bad anatomy, bad proportions, blurry, cropped, deformed,
disfigured, distorted features, extra limbs, extra fingers,
gross proportions, incorrect eyes, long neck, malformed limbs,
missing arms, missing legs, morbid, mutated hands, mutation,
out of frame, poorly drawn face, poorly drawn hands, mutated fingers
"""
Generating the Image
Finally, I generated the image using my optimized prompt:
image = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=40,
guidance_scale=7.5,
width=1024,
height=1024
).images[0]
# Save the generated image
image.save("orange-theme.png")
Generated Image
Here's the AI-generated image based on my optimized prompt:

An AI-generated illustration of a kid in an orange-themed outfit, created using StableDiffusion XL
Key Learnings
Through this experiment, I learned several important aspects of working with StableDiffusion XL:
- The importance of using torch.float16 for efficient memory usage with 16GB VRAM
- The benefit of installing accelerate for faster model loading
- The critical role of prompt length and its impact on generation
- The effectiveness of well-crafted negative prompts in improving output quality