The Truth About Consistent Characters In Stable Diffusion

Monzon Media
3 Sept 202306:59

TLDRThe video script discusses achieving high consistency in AI-generated images using stable diffusion models. It suggests starting with a good model and giving characters distinct names to ensure consistent facial features. The use of ControlNet and reference images is highlighted for maintaining clothing and style consistency. The video also demonstrates how to change backgrounds and outfits with minimal effort, and how the technique can be applied to real photos for various creative purposes.


  • 🎨 Achieving 100% consistency in stable diffusion is not entirely possible, but getting 80-90% there is achievable.
  • 🔍 Start with a good model like Realistic Vision Photon or Absolute Reality for consistent facial features.
  • 💁‍♀️ Give your character a name or use two names to combine desired characteristics for more personalized results.
  • 📈 Use random name generators for character naming if creativity is a challenge.
  • 🛠️ ControlNet is a valuable tool for maintaining consistency in generated images, especially in terms of clothing and ethnicity.
  • 📸 Choose a full-body or knee-up image for better reference in ControlNet, focusing on specific clothing details.
  • 🎨 Style Fidelity option in ControlNet helps with maintaining the consistency of the image style.
  • 🌆 Changing the background and surroundings can create diverse scenes while keeping the character and outfit consistent.
  • 🔧 Root is an extension that can be used for real photo editing, allowing for changes in environment and outfit.
  • 📊 Style Fidelity slider can be adjusted (0.75 to 1) to improve consistency in details like clothing and accessories.
  • 📚 Creating a story with generated characters involves piecing together different poses and environments over time.

Q & A

  • What is the main concept discussed in the video?

    -The video discusses achieving a high level of consistency in stable diffusion for AI-generated images, specifically focusing on maintaining consistent facial features and clothing.

  • What percentage of consistency is considered achievable in stable diffusion according to the video?

    -The video suggests that it is not exactly possible to achieve 100% consistency, but one can get 80 to 90% of the way there with the right techniques.

  • What type of model is recommended for consistent facial features in the video?

    -The video recommends using models like 'Realistic Vision' or 'Photon Absolute Reality' for maintaining consistent facial features in AI-generated images.

  • How does the video suggest achieving consistency in character names?

    -The video suggests using two names or more to combine desired characteristics of different characters. It also mentions using random name generators for those not good at making up names.

  • What tool is mentioned for maintaining consistency in clothing and other elements?

    -The video mentions using 'ControlNet' as a tool to maintain consistency in clothing and other elements of the AI-generated images.

  • How does the video demonstrate the use of a reference image in ControlNet?

    -The video demonstrates by importing a reference image of a character wearing a black sweater and jeans into ControlNet, and then adjusting the settings to generate images with similar styles and features.

  • What is the role of the 'Style Fidelity' option in ControlNet?

    -The 'Style Fidelity' option in ControlNet helps with maintaining consistency in the style of the generated images, which can be adjusted to achieve better results.

  • Can the techniques discussed in the video be applied to real photos?

    -Yes, the video explains that the same techniques can be applied to real photos by using the 'Root' extension in ControlNet to maintain the facial features of the subject.

  • What is the significance of changing the background and surroundings in the AI-generated images?

    -Changing the background and surroundings allows for the creation of diverse scenes and stories with the same character, enhancing the versatility of the generated images.

  • How can one address minor inconsistencies in the generated images?

    -Minor inconsistencies can be addressed by adjusting the 'Style Fidelity' slider to a higher value, or by manually editing the images to correct details such as clothing elements or accessories.

  • What future content is hinted at in the video?

    -The video hints at future content that will delve deeper into aesthetics like the hands and faces, and incorporating multiple characters into the same scene.



🎨 Achieving Consistency in AI Image Generation

This paragraph discusses the process of achieving a high level of consistency in AI-generated images, particularly in stable diffusion. It emphasizes that while 100% consistency may not be entirely achievable, getting 80 to 90% of the way there is possible. The speaker introduces the use of a good model as the starting point and suggests using names for characters to maintain consistency in facial features. The paragraph also touches on the use of random name generators and the necessity of having control net installed. The speaker shares their approach to creating a prompt and selecting a look, focusing on clothing consistency and facial recognition in the generated images. The use of control knit and the importance of style fidelity in maintaining consistency are also highlighted.


🌟 Utilizing AI for Real Photo Editing and Storytelling

The second paragraph delves into the application of AI image generation for editing real photos and creating a cohesive story. The speaker demonstrates how to use the reference control knit feature to maintain the character's appearance across different scenes and outfits. They also discuss the potential imperfections in the generated images, such as inconsistencies in details like buttons on jeans, and how to address them by adjusting the style fidelity slider. The paragraph concludes with a mention of future videos that will explore more aesthetics and character interactions, as well as tips for optimizing AI performance on devices with lower specifications.




Consistency in the context of the video refers to the ability to produce images with uniform and predictable characteristics, such as facial features and clothing. The video emphasizes that achieving 100% consistency is not always possible, but one can reach 80 to 90% by using specific techniques and tools. This is crucial for creating a cohesive visual narrative or maintaining a particular style throughout a series of images.


In the video, a 'model' refers to the underlying structure or algorithm used to generate images. A good model is essential for creating realistic and consistent images. The video mentions 'Realistic Vision Photon' and 'Absolute Reality' as examples of models that are good for generating consistent facial features.

💡Character Naming

Character naming is a technique used in the video to enhance consistency by assigning names to the characters in the images. This helps the AI recognize and reproduce specific characteristics associated with those names, making it easier to create a series of images with the same character.


ControlNet is a tool or feature mentioned in the video that allows users to maintain consistency in their generated images. It is used to import and reference specific images, ensuring that subsequent images adhere to the style and appearance of the original.

💡Style Fidelity

Style Fidelity is a term used in the video to describe the faithfulness or accuracy with which the style of the reference image is maintained in the generated images. Adjusting the Style Fidelity slider can help improve the consistency of the images, making them more closely resemble the reference.

💡Reference Image

A reference image is a specific example used as a guide or template for the AI to generate new images. It provides a visual standard for the character's appearance, clothing, and style, which the AI aims to replicate across multiple images.

💡AI Generated Images

AI Generated Images are photographs or visuals created by artificial intelligence algorithms based on certain inputs or parameters. These images can mimic real-life scenarios, people, or objects, and are used in the video to demonstrate how to achieve consistency in image generation.

💡Background and Surroundings

Background and surroundings refer to the environment and context in which the characters or subjects of the images are placed. In the video, changing the background and surroundings allows for the creation of diverse scenes while maintaining the consistency of the characters.

💡Real Photos

Real Photos refer to images captured by a camera, as opposed to those generated by AI. The video suggests that the techniques discussed for AI-generated images can also be applied to real photos, allowing for the manipulation of environments and outfits in a consistent manner.


In the context of the video, 'Root' seems to refer to a specific tool or feature that assists in the process of image generation, possibly related to the control of the character's appearance or the integration of real photos into the AI system.


Achieving 80 to 90 percent consistency in stable diffusion is possible, but not 100%.

Starting with a good model, like Realistic Vision Photon or Absolute Reality, is crucial for consistent facial features.

Naming the character can help combine desired characteristics, like using two names to merge traits.

Random name generators can be used for character naming if creativity is challenging.

ControlNet is a necessary tool for maintaining consistency in generated images.

Creating a prompt with a specific look, such as a simple black sweater and jeans, helps establish a style.

Importing the look into ControlNet with a reference image aids in maintaining consistency.

The control weight setting in ControlNet, typically between 0.7 to 1, affects the consistency of the output.

Style Fidelity option in ControlNet can be adjusted for better consistency in style.

Changing the background and surroundings in the generated images is straightforward with ControlNet.

The method can be applied to real photos with the use of the Root extension in ControlNet.

Using ControlNet with real photos allows for changing the environment, location, and even outfits.

Small variances in generated images, like details on clothing, are normal and can be managed.

The style Fidelity slider can be increased up to 1 for better consistency in such cases.

Creating a story with the generated characters and images is a potential application.

Optimization techniques for AI-generated images with limited graphics card memory will be discussed in future content.