* This blog post is a summary of this video.

12 Clever Stable Diffusion Tips and Tricks

Author: marat_aiTime: 2024-03-23 14:20:00

Table of Contents

Leverage Name Associations for Better Character Generation

The old trick of using specific names in prompts to generate archetypal character images still works very effectively with Stable Diffusion. For those unfamiliar, the AI model is able to associate certain common names with typical physical appearances. So by including a name like 'John' or 'Mary' in your prompt, you can steer the image generation towards more stereotypical caucasian faces.

If you are generating a large batch of character images and want to avoid seeing too many repetitive Asian faces, try throwing in some Western names instead to increase the diversity. The name associations can help conjure new faces rather than drawing from the same facial feature patterns.

Use Stereotypical Names for Archetypal Characters

When you need to generate very archetypal character images - like a wizard, warrior, elder, child - use names that trigger the common physical tropes. Merlin for a wizard, Achilles for a warrior, Agnes for an old lady. The name cues will shortcut to the archetypal faces and looks you likely aim for.

Avoid Seeing the Same Faces by Trying New Names

After generating multiple images, you may start to notice repetitive elements across various faces as the model repeats effective patterns. Introduce some new names to help break out of ruts and increase variability of the generated faces and characters. The fresh name associations can prompt new inspiration.

Replicate Film Styles with Color Grading

By providing cinema style prompts, Stable Diffusion is capable of closely emulating the color grading and mood of particular movies. According to many users, it can nail the atmospheric look when provided the right descriptive context.

In my own testing, it certainly seems to work very well but does still depend somewhat on getting an initial facial generation that suits the actor and film described. I've found prompts along the following format generate great results:

Superman Henry Cavill in The Man of Steel cinematic style

My favorite result placed Jim Carrey within a Sin City stylization, perfectly capturing the high contrast black and white inked look along with his facial structure.

Use Cinema Style Prompts

Leverage prompts that specify a particular actor within the cinematic context of a famous movie they have starred in. This gives the AI the exact creative framing to replicate - matching that actor's face with the color and atmosphere of the desired film.

Sample Results

I generated a grid of celebrity faces matched to different stylized movie appearances. The color grading and mood is adapted straight from the described cinematic work with impressive accuracy - pure dramatic Batman, bright Superman, dark brooding Bond, etc. Check the samples linked here to see some standout examples of film replication.

Utilize Newly Supported Resolutions

Stable Diffusion XL comes pre-trained on much more high resolution image data than previous iterations. As such it reliably supports generating a wider range of resolutions without weird artifacts or distortions that used to occur when stepping beyond 512x512.

According to the official Stability website for SDXL, the following resolutions have been validated as stable:

List of Recommended Resolutions

512x512 640x512 768x512 512x640 640x640 768x768 1024x1024

Other Stable Diffusion Parameters

When generating these larger images, pay attention to Steps and CFG Scale settings as defined on the Stability site as well. Using values outside the given ranges can still introduce odd distortions or repetitions even at the supported resolutions.

Specify Styles Without Complex Syntax

Some posts describe methods for inputting preset styles through cryptic prompt formatting. However, with Stable Diffusion XL's more robust conditioning, simply stating the desired style directly seems to work perfectly well without memorizing any special syntax.

After testing in various UIs, I've found simply prompting with a plain text description of the needed style generates great results consistently. Extensive negative prompt walls also appear unnecessary now too as the model sticks closely to the specified creative direction.

Simple Style Prompts Work Well

Rather than try to learn a complex style encoding scheme with special characters, simply describe the style directly such as 'Impressionist oil painting stylization of a frog'. SDXL understands these plain language cues very effectively to adapt the output creative form.

Extensive Negative Prompts Not Needed

With the model's improved conditioning, providing a detailed style description alone is enough to steer the image generation. Lengthy lists of negative prompts to restrict unwanted elements are no longer as necessary to achieve focused coherent results.

Fix Distorted Faces in Wide Shots

If you are generating an image with small figures and notice poor facial quality, the likely culprit is the VAE model being used. Switching to another available VAE can dramatically improve faces at a distance.

Currently there is still only one public pre-trained VAE for SDXL available. However more options tailored to handle tricky wide portrait shots will hopefully emerge soon.

Likely a VAE Issue

When generating a wide scenic image, facial distortions on background figures likely stem from limitations in the vector quantization of the active VAE model. The underlying model itself can handle far shots well, but information gets lost in the compression.

Try Different VAE Models

Your best bet is trying a different VAE designed to better preserve fine facial features during vector quantization at smaller scales. As more options emerge, pick ones explicitly targeting improved facial reconstruction from tiny inputs.

Avoid Child-Like Faces

In some cases Stable Diffusion may erroneously generate youthful baby-faced appearances for characters described as adults. Rather than providing exact ages which confuses the model, use broader age range descriptions like "middle aged" instead.

The model understands categorical age buckets better than specific numbers. Letting it infer age appropriate looks from context works better than overriding with inaccurate numerical age labels.

Use Age Categories Instead of Numbers

Prompts like "25 year old" can confuse SDXL and yield inappropriate child-like faces. Use fuzzy ranges like "young adult" or "around 40 years old" to provide age context while giving flexibility.


Q: How can I reduce high RAM usage in Automatic1111?
A: Disable model caching in settings. Also use VRAM to supplement RAM with -lowram command.

Q: Why does Automatic1111 perform worse than ComfyUI?
A: Confirmed by Reddit users through testing. ComfyUI utilizes SD capabilities better.

Q: What causes VAE artifacts in images?
A: The original VAE model. Use the VAE for SD 0.9 instead to resolve.

Q: How can I recreate a painter's style?
A: Specify the artist name and related tags in prompt. Sample styles first.

Q: What's the best prompt structure for text generation?
A: Use quotes for text and brief description of object. Keep under 20 words.

Q: Do I need specialized models like GALGOGOT?
A: Try base SD first since trained on more data. Good chance of decent results.

Q: How can I reverse engineer an image into a prompt?
A: Use Bing or ChatGPT to auto-generate prompt. Refine as needed for desired outcome.

Q: Why does the same seed keep getting used in ComfyUI?
A: Seed controls reproducibility. Change last number or use 0 for variation.

Q: How can I easily recreate similar images?
A: Add comma to prompt or change seed number. Seed 0 yields different results.

Q: Did you find this video format helpful?
A: Let me know by liking the video and sharing other tips in the comments!