The Best New Way to Create Consistent Characters In Stable Diffusion

12 Jan 202403:12

TLDRThe video script outlines a step-by-step guide on creating consistent character images using ControlNet and various adapters. It begins with updating extensions and downloading specific face ID models, followed by using the Realistic Vision prompt with a simple description to generate a girl's image. The process involves adjusting the control net settings, matching pre-processors with models, and fine-tuning the output. The video also demonstrates changing the character's appearance, such as clothing and background, while maintaining consistency. The presenter encourages viewers to like and subscribe for more content.


  • 🎨 Preparing by updating the control net to the latest version is crucial for creating consistent characters in automatic 1111.
  • 🔗 Downloading and integrating Face ID IP adapters into the Web UI extensions control is a necessary step for character creation.
  • 📂 Organizing downloaded files into specific folders like 'Laura's' helps in maintaining an efficient workflow.
  • 🔄 Restarting to a stable diffusion checkpoint, such as 'Realistic Vision', ensures the best output for the character creation process.
  • 📝 Using a simple prompt like 'a girl in a yellow shirt, smiling' with the best quality setting can produce a masterpiece with the chosen control net.
  • 🔍 Choosing the right control net and model, such as 'Face ID Plus' and 'Face ID Plus SD 1.5', is essential for compatibility and desired results.
  • 👀 Adjusting the control net parameters, such as lowering the number to 0.5, can help achieve a more refined and less intense facial expression.
  • 👗 Changing the character's clothing, like dressing her in armor or a blue long dress, can be done while maintaining the consistency of the character's face.
  • 🏰 Setting the scene, such as in front of a castle or in a forest, adds depth and context to the character's portrayal.
  • 💃 Controlling the character's gesture is possible by using a second control net with an open POS pre-processor like 'DW open pose'.
  • 👍 Engaging with the content by liking and subscribing supports the creator and ensures updates on future tutorials and videos.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is about creating consistent characters in an AI-based image generation platform, specifically using ControlNet and Laurer models.

  • What is the first step the video recommends for preparation?

    -The first step is to update the ControlNet to the latest version and download the IP adapters called Face ID.

  • Where should the downloaded Face ID adapters be placed?

    -The Face ID adapters should be placed in the Web UI extensions ControlNet models folder.

  • What additional step is suggested for the Laurer models?

    -For the Laurer models, the video suggests downloading them and placing some in the Laurer's folder and then restarting to the stable diffusion checkpoint.

  • Which checkpoint does the video creator use for realistic image generation?

    -The video creator uses the 'Realistic Vision' checkpoint for image generation.

  • How does the video creator describe the process of generating the character's face?

    -The video creator describes the process as simple, using the prompt 'a girl, yellow shirt, smiling, Masterpiece B best quality' and adjusting the control net settings to achieve the desired result.

  • What happens if the pre-processor and the model do not match in the control net?

    -If the pre-processor and the model do not match, they won't work correctly, so it's important to ensure they are compatible.

  • How can the character's clothing be changed in the generated image?

    -The character's clothing can be changed by enabling a second control net and using a different image with the desired clothing, such as armor or a blue long dress.

  • Is it possible to control the gesture of the character in the generated image?

    -Yes, it is possible to control the gesture by opening the second control net and choosing an image with the desired gesture, using the open POS pre-processor.

  • What does the video creator suggest at the end of the video?

    -The video creator suggests that if the viewers like the video, they should give it a like and subscribe for more content.

  • What is the significance of the 'Automatic 1111' mentioned in the video?

    -The 'Automatic 1111' seems to refer to a specific setting or version of the AI platform being used, where certain features like the Laurer plus V2 cannot be used at the moment of the video.



🎨 Character Creation with Automatic 1111

The paragraph introduces a method for creating consistent characters using Automatic 1111. It begins with a greeting and an overview of the process, which involves wearing different suits to resemble the same character. The speaker instructs the audience to update their control net to the latest version and download specific IP adapters named 'face ID' from a provided link. These adapters are then placed in the web UI extensions control net models folder. The speaker also mentions downloading additional materials from a 'Laura's' folder and restarting the stable diffusion with a specific checkpoint called 'realistic Vision'. The prompt used for the character generation is described as simple, consisting of a girl with a yellow shirt, smiling. The process of using the control net with an IP adapter and pre-processor matching is explained, as well as adjusting the strength of the generated image. The original and final results of the character's face are compared, and the speaker proceeds to demonstrate changing the character's clothes and background, maintaining consistency in appearance. The video concludes with a call to like and subscribe for more content.



💡Control Net

Control Net refers to a system used in the video for managing and applying specific parameters to generate images with desired characteristics. It is a key component in the image generation process, allowing users to input images and select options to achieve a consistent output. In the context of the video, the Control Net is used to maintain the consistency of a character's face across different scenes and outfits, as demonstrated when the creator updates the character's appearance with armor in front of a castle.

💡Face ID

Face ID is a term used in the video to describe a set of identifiers or parameters that are used to recognize and replicate a specific facial structure within the image generation process. It is crucial for creating consistent characters, as it ensures that the character's face remains the same despite changes in clothing or background. The video script instructs the viewer to download Face ID adapters and use them within the Control Net to achieve this consistency.

💡Web UI Extensions

Web UI Extensions refer to the additional features or tools that are integrated into a web-based user interface to enhance its functionality and user experience. In the context of the video, these extensions are used to manage and control the image generation process, allowing the user to upload and manipulate images, select different adapters, and generate the desired outputs.

💡Resetting to Stable

Resetting to Stable in the video context refers to the process of reverting the system to a stable or default state after making changes or updates. This ensures that the system functions correctly and that the new settings or additions are properly applied. It is an important step in the video's process to ensure that the image generation is reliable and consistent.

💡Realistic Vision

Realistic Vision in the video refers to the desired outcome of the image generation process, which is to create images that look realistic and true to life. This concept is central to the video's theme, as it emphasizes the goal of achieving a high level of detail and authenticity in the generated images, as demonstrated by the prompt used: 'a girl, yellow shirt, smiling, Masterpiece B best quality'.

💡The Prompt

The Prompt in the context of the video is the set of instructions or descriptions provided to the image generation system to guide the creation of the desired image. It is a crucial element in the process, as it communicates the user's vision to the system and influences the final output. The prompt is used to specify the characteristics of the image, such as the subject, clothing, and desired expression.

💡Automatic 1111

Automatic 1111 seems to refer to a specific mode or setting within the image generation system used in the video. While the exact nature of 'Automatic 1111' is not explicitly defined in the script, it is implied to be a method or process that allows for the generation of images with a consistent character, as demonstrated by the creator's ability to change the character's clothing and background without altering the facial features.

💡Config UI

Config UI stands for Configuration User Interface, which in the context of the video, refers to the interface that allows users to configure and adjust the settings for the image generation process. This interface is where users can select different models, pre-processors, and other parameters to achieve the desired results.

💡Gesture Control

Gesture Control in the video refers to the ability to manipulate and adjust the posture, movement, or expression of a character within the generated image. This is an important aspect of the image generation process, as it allows for the creation of dynamic and varied images that convey different emotions or actions.


Consistency in the video context refers to the maintenance of a uniform and recognizable appearance of the character across different images. This is achieved through the use of specific tools and settings within the image generation system, such as the Control Net and Face ID adapters, which ensure that the character's facial features remain the same despite changes in clothing or background.

💡Image Generation

Image Generation is the process of creating new images using computational methods, as demonstrated in the video. It involves the use of various parameters, models, and tools to produce visual content that meets the user's specifications. The video focuses on generating images of a character with consistent facial features and changing outfits or backgrounds to create a variety of scenes.


Introduction to creating consistent characters using automatic 1111

Preparation steps for using ControlNet and IP adapters

Updating ControlNet to the latest version for optimal performance

Downloading Face ID IP adapters for character creation

Instructions for organizing downloaded IP adapters in the correct folders

Restarting to stable diffusion and using the checkpoint 'Realistic Vision'

The simplicity of the prompt 'a girl, yellow shirt, smiling' for generating images

Explanation of the compatibility between pre-processor and model in ControlNet

Demonstration of adjusting the strength of the generated face with a control value

Showcasing the original and final result of the character's face

Changing the character's clothing to armor and setting the scene in front of a castle

Maintaining character consistency while altering clothing and background

Controlling gesture with the second ControlNet and using open POS pre-processor

Experimenting with different clothing and gestures for the character

Conclusion and call to action for likes and subscriptions