Consistent Cartoon Character in Stable Diffusion | LoRa Training

Luminous Initiative
15 Sept 202313:28

TLDRThe video script outlines a step-by-step process for creating a consistent cartoon character using Elora and kohaya SS. It begins with finding a character sheet on Pinterest for various poses, then using an image-to-image tab for upscaling and detail enhancement. The script details the use of control nets and specific settings for optimal results. After saving the images, the process involves using kohaya SS for image captioning and training a model named Laura with a set number of steps per image. The final step includes testing the trained Laura with the stable diffusion model to generate a character matching the original character sheet, with adjustments made for imperfections.


  • ๐ŸŽจ The video outlines a process for creating a consistent cartoon character using a model named Elora.
  • ๐Ÿ“„ A character sheet from Pinterest is used as a reference for different poses of the character.
  • ๐Ÿ–Œ๏ธ The control net is enabled with the open pose setting for initial character creation.
  • ๐Ÿ“ A simple prompt is written for the character, with the face left to default settings.
  • ๐Ÿ–ผ๏ธ The image is upscaled using the image to image tab with specific denoising strength and control net settings.
  • ๐Ÿ”„ Images are saved individually for different poses and then upscaled again in a batch process.
  • ๐Ÿ‘๏ธ Eye color imperfections in some images are addressed by additional upscaling.
  • ๐Ÿ“š Kohaya SS is used for training the character, with installation instructions provided in the video description.
  • ๐Ÿท๏ธ Image captioning is performed using the wd-14 method for cartoon characters.
  • ๐Ÿ“‚ A structured folder system is created for training, including an image log and model folders.
  • ๐Ÿ”ข A minimum of 1500 steps is ideal for training a Laura, with adjustments made based on the number of images.
  • ๐Ÿš€ After training, the Laura file is saved and copied into the stable diffusion web UI models folder for final character generation.

Q & A

  • What is the main objective of the video?

    -The main objective of the video is to demonstrate the process of creating a consistent cartoon character using a model named Elora and to explain the steps involved in the training process.

  • Where can one find character sheets for different poses of a character?

    -Character sheets for different poses can be found on various platforms, with Pinterest being a recommended source in the video.

  • Why is the control net enabled and what type of control is selected initially?

    -The control net is enabled to guide the AI in creating accurate poses based on the character sheet. The control type selected initially is 'open pose' to establish the character's basic structure.

  • How is the detail of the character's face handled in the process?

    -The detail of the character's face is initially left blank to use the default prompt. However, the 'after detailer' can be enabled to specify facial details if needed.

  • What is the purpose of upscaling the images and what techniques are used?

    -Upscaling the images is done to enhance the quality and resolution. Techniques such as 'Ultimate SD upscale' and '4X Ultra sharp' are used as upscalers.

  • How are the images prepared for Laura training?

    -Images are prepared for Laura training by saving them in different poses, upscaling them again if necessary, and ensuring there are no imperfections. Then, image captioning is performed to create text files with keywords for each image.

  • What is the importance of the number of steps in creating a Laura?

    -The number of steps is crucial as it determines the thoroughness of the training. A minimum of 1,500 steps is recommended for an average Laura creation. The number of images and steps are calculated to align with this recommendation.

  • How is the training process for Laura carried out?

    -The training process involves using the Kohaya SSGUI, loading a configuration file suitable for the user's hardware, selecting the appropriate stable diffusion model, and setting up the necessary folders and parameters for training.

  • What is the role of prompts in the Laura training process?

    -Prompts guide the AI on what to focus on during the training and generation of sample images. They are crucial in ensuring the final character aligns with the desired outcome.

  • How long does the Laura training with 1500 steps typically take?

    -The Laura training with 1500 steps can take almost half an hour, depending on the user's hardware specifications.

  • What is done with the trained Laura file after the training process?

    -After the training process, the trained Laura file is copied into the stable diffusion web UI models Laura folder, where it can be used to generate new images following the cartoon character's style.



๐ŸŽจ Creating a Cartoon Character with AI

The first paragraph outlines the process of creating a consistent cartoon character using an AI model named Elora. It begins with gathering a character sheet from Pinterest for various poses and setting up the control net with Open Pose. The creator writes a simple prompt for the character, focusing on the face with the After Detailer feature. The image is then upscaled using the Image to Image tab with specific settings, and the process is repeated for different poses. The images are saved and further upscaled in a batch process, with attention to details like eye color. Any imperfections are fixed through additional upscaling.


๐Ÿ“š Preparing for Laura Training with Kohaya SS

The second paragraph details the preparation for Laura training using Kohaya SS. It starts with installing Kohaya SS and proceeds to image captioning using either blip or wd-14 captioning, depending on the realism of the images. The creator emphasizes the importance of accurate keywords and the creation of specific folders for training purposes. The process involves setting up a training structure with a defined number of steps per image (150), which is calculated to meet a total of 1500 steps for the small number of images (9). All images and caption files are then organized in the designated folders for the training process.


๐Ÿš€ Launching Laura Training and Evaluation

The third paragraph describes the actual Laura training process, starting with loading a configuration file suitable for low VRAM hardware. The creator selects the stable diffusion model used earlier and sets up the folders for training. Parameters are adjusted for training, including batch size, precision, and image resolution. Samples are set to be saved every 400 steps, and a specific prompt is provided for sample image generation. After training, which can take around half an hour depending on hardware, a Laura file is saved in the model folder. The creator then copies this file to the stable diffusion web UI models Laura folder for evaluation. The character's consistency is noted, and adjustments are made using specific prompts to correct details like eye color, demonstrating the iterative process of refining the AI-generated cartoon character.



๐Ÿ’กCartoon Character

A cartoon character refers to a graphical representation used in animation, comics, or digital art, often designed with exaggerated features and expressions for storytelling purposes. In the video, the creator aims to develop a consistent cartoon character by training an AI model named Elora, using a character sheet and various image manipulation techniques to achieve a specific look and style.

๐Ÿ’กCharacter Sheet

A character sheet is a visual reference guide that contains detailed information and poses of a fictional character, used by artists and designers to maintain consistency in the character's appearance and attributes. In the context of the video, the character sheet is a crucial element for training the AI to recognize and replicate the desired features of the cartoon character across different images.

๐Ÿ’กControl Net

A control net is a feature in AI image generation models that allows users to guide the AI's output by providing it with specific visual cues or references. It helps in achieving a more accurate and desired result by controlling the AI's learning and generation process. In the video, enabling the control net with open pose and tile control types helps the AI to understand the structure and style of the cartoon character more effectively.


Upscaling refers to the process of increasing the resolution of an image while maintaining or improving its quality. This is often done using various algorithms and techniques to enhance the details and sharpness of the image. In the video, upscaling is an essential step to prepare the images for training the AI model, ensuring that the cartoon character is clear and consistent at higher resolutions.

๐Ÿ’กAfter Detailer

After Detailer is a feature in AI image generation models that refines and enhances the details of an image after the initial generation process. It can be used to fix imperfections or to add specific details that were not captured in the initial output. In the video, enabling After Detailer helps to ensure that the cartoon character's face is fixed and follows the default prompt for a more accurate representation.

๐Ÿ’กImage Captioning

Image captioning is the process of generating descriptive text, or captions, for images based on their visual content. This technique is often used in AI models to train the system to understand and generate images that are consistent with the provided captions. In the video, image captioning is used to create text files for each image, which will be used to train the AI model named Laura to recognize and generate cartoon characters.

๐Ÿ’กTraining Steps

Training steps refer to the number of iterations or cycles an AI model undergoes during the learning process. A higher number of training steps usually results in a more refined and accurate model, as it has more opportunities to learn from the input data. In the video, the creator specifies using a minimum of 1,500 steps for training Laura, ensuring that the AI model becomes proficient in generating the cartoon character consistently.

๐Ÿ’กKoyaha SS

Koyaha SS is a graphical user interface (GUI) for training and using AI models, particularly for text-to-image generation tasks. It provides users with an easy-to-use platform to train models like Laura and generate images based on text prompts. In the video, Koyaha SS is used to train Laura using the upscaled images and caption files, allowing the AI to learn and replicate the cartoon character's features.

๐Ÿ’กStable Diffusion Model

The Stable Diffusion model is a type of AI algorithm used for generating images from text prompts. It is known for its ability to create high-quality and detailed images by learning from a vast dataset of visual content. In the video, the Stable Diffusion model is the foundation upon which the cartoon character is trained, ensuring that the AI can generate consistent and accurate representations of the character.


Prompts are text inputs provided to AI models to guide the generation of specific outputs. In the context of text-to-image AI models, prompts describe the desired image in detail, which the AI uses to generate an image that matches the description. In the video, prompts are used to train Laura and also to generate images after the training process, ensuring that the AI produces cartoon characters that align with the creator's vision.

๐Ÿ’กUpscaling Issues

Upscaling issues refer to the problems that may arise when increasing the resolution of an image, such as loss of detail, pixelation, or color distortion. These issues can affect the quality and consistency of the final output. In the video, the creator addresses upscaling issues by fixing problematic images, particularly those with eye color issues, to ensure that the cartoon character remains consistent and visually appealing.

๐Ÿ’กTraining Process

The training process is the series of steps an AI model undergoes to learn from a dataset and improve its performance. It involves adjusting parameters, providing input data, and evaluating the model's output to achieve the desired outcome. In the video, the training process is detailed for creating a cartoon character using Laura, including image preparation, captioning, and the actual training with Koyaha SS.


The video outlines a process for creating a consistent cartoon character using a model named Elora.

A character sheet from Pinterest is utilized to depict different poses of a character.

The control net is enabled with the control type set to open pose for automatic image processing.

The character's face is fixed using the after detailer feature with default prompts.

Images are upscaled using the image to image tab with a denoising strength of 0.12.

The control net and tile control type are employed for ultimate SD upscale and 4X Ultra sharp upscaling.

Images are saved individually for different poses and then upscaled again in a batch process.

A new folder is created for output images to organize the upscaled images.

Some images with eye color issues are upscaled again to correct these imperfections.

Koyha SS is used for Laura training, with installation instructions provided in the video description.

Image captioning is performed using the utilities tab with wd-14 captioning chosen for cartoon characters.

Keywords are carefully selected and unnecessary ones are removed for each image.

A specific training process for Laura is described, including folder organization and step count.

The training process involves 1500 steps, with each image being revised 150 times.

After training, the Laura file is saved and copied into the stable diffusion web UI models Laura folder.

The final step is to check the Laura by generating an image without any prompt to verify character consistency.

The video concludes with a prompt for viewers to like, comment, and subscribe for more content.