LORA + Checkpoint Model Training GUIDE - Get the BEST RESULTS super easy

Olivio Sarikas
10 May 202334:38

TLDRThe video provides a comprehensive guide on training LORA and models to achieve high-quality results in AI image generation. It emphasizes the importance of understanding the training process, selecting diverse and high-quality images, and using descriptive keywords for effective training. The host recommends starting with training on celebrity images for ease and legality, and discusses the differences between LORA and full models, suggesting LORA for faces and models for more complex subjects. The video also covers the technical aspects, including the use of tools like Koya SS for training, and offers tips on folder organization, image resizing, and the significance of steps and epochs in training. Finally, it introduces a merging trick to improve model quality by combining it with a better model, and highlights the benefits of higher resolution images for better training outcomes.


  • ๐ŸŒŸ **Community Support**: Utilize a Discord channel for model training to connect with helpful people and get support.
  • ๐Ÿ“š **Understanding the Process**: Grasp how the training process works to select appropriate images and understand how the model interprets them.
  • ๐Ÿ–ผ๏ธ **Image Selection**: Choose images that represent a variety of expressions, fashion styles, and lighting situations to enhance the AI's learning.
  • ๐Ÿ” **Image Quality**: Use high-quality, sharp images to ensure the AI can accurately interpret details during the training process.
  • ๐Ÿ”‘ **Keyword Importance**: Use descriptive keywords to enable variability and allow the AI to understand and react to different styles and features.
  • โš™๏ธ **Choosing Between Lora and Model**: Decide whether to use a Lora (smaller, versatile) or a full model (larger, more consistent) based on the complexity of the subject.
  • ๐ŸŽญ **Training on Star Portraits**: For beginners, training on star portraits is advantageous due to the abundance of images and legal considerations for private use.
  • ๐Ÿ“ˆ **Image Quantity and Quality**: The number of images needed depends on the subject's complexity; higher quality images with fewer numbers can suffice for less complex subjects like faces.
  • ๐Ÿ”„ **Training Steps and Epochs**: Use an appropriate number of steps per image and epochs based on the number of images available and the complexity of the training subject.
  • ๐Ÿ–ฅ๏ธ **Software and Tools**: Use tools like Koya SS for model training and captioning tools for image file keywording.
  • ๐Ÿ”ง **Merging Models**: Improve model quality by merging it with a better model, even if the initial training isn't perfect, to achieve desired results.

Q & A

  • What is the main focus of the video guide?

    -The main focus of the video guide is to provide an easy-to-follow process for training LORA and Checkpoint models to achieve the best results in AI image generation.

  • Why is it important to understand the training process of LORA and models?

    -Understanding the training process is important because it helps you select the right kind of images for training and comprehends how the model interprets these images, which in turn improves the output quality.

  • What role does the size of objects in the image play during the training process?

    -The size of objects, especially faces, is crucial because smaller objects in the image occupy a smaller part of the noise, making it difficult for the model to reconstruct them into larger parts of the image accurately.

  • What are the different types of images needed for training a model on a person?

    -For training a model on a person, you need images that capture different emotions, facial expressions, fashion styles, hairstyles, head rotations, and lighting situations to help the AI learn the face and body in various contexts.

  • Why is image quality important for training models?

    -High-quality, sharp, and well-defined images are important because they allow the AI to better distinguish individual elements, such as eyelashes, which are crucial for the model to learn and reproduce details accurately.

  • How do keywords in text files affect the training process?

    -Keywords act as variables that the AI uses to learn the differences between various styles, colors, and features. Accurate and specific keywords enable the AI to understand and reproduce the desired variations in the output images.

  • What are the differences between training with a LORA and a full model?

    -A LORA is a smaller, more versatile add-on that can be applied to various models and is great for faces. A full model, or checkpoint, is larger and more consistent, making it easier to train and suitable for themes like architecture.

  • Why is it suggested to train on images of a star for beginners?

    -Training on images of a star is suggested for beginners because there is a vast array of images available, covering different expressions, clothing, and lighting styles, making it easier to spot and correct problems in the training process.

  • How many images are typically needed for training a model?

    -The number of images needed depends on the complexity of the subject. For a face, as few as 15 high-quality images might suffice, while more complex subjects like architectural styles may require more images for the AI to learn effectively.

  • What is the significance of steps and epochs in the training process?

    -Steps refer to the number of iterations in the training process per image, while epochs represent the number of times the entire training set is run through. More epochs with fewer steps can often lead to better results, as it allows for more iterations and refinement.

  • What is the recommended image size for training models?

    -The minimum recommended image size is 512x512 pixels. Larger images are better as they provide more detail for the AI to learn from, but they may slow down the training process due to increased GPU power requirements.



๐Ÿš€ Introduction to Training AI Models for Exceptional Results

The video begins with a greeting and an introduction to the topic of training AI models, specifically LoRAs (Low-Rank Adaptations) and models, to achieve impressive results. The speaker emphasizes the ease of obtaining good results and outlines the structure of the video: discussing why and how the process works, showcasing the best tools, and revealing a merging trick for enhanced outcomes. A Discord channel is mentioned as a resource for additional help and community interaction.


๐ŸŽฏ Understanding the Training Process and Image Selection

The paragraph delves into the mechanics of AI training, explaining how input photos are translated into noise and then reconstructed into an output image. It highlights the importance of image selection, emphasizing the need for a variety of expressions, fashion styles, hairstyles, head rotations, and lighting situations to help the AI learn the intricacies of the subject. The paragraph also touches on the significance of image size and quality, noting that higher resolution and sharpness facilitate better AI comprehension and training results.


๐Ÿ–Œ๏ธ Keyword Importance and Choosing Between LoRAs and Models

This section underscores the role of keywords in training, describing them as variables that allow the AI to understand and vary aspects like hair style and color. The difference between LoRAs and models is explored, with LoRAs presented as smaller, versatile add-ons suitable for faces and various styles, while models are larger, more consistent, and better for themes like architecture. The video suggests starting with training on images of a celebrity for beginners due to the abundance and variety of images available.


๐ŸŒŸ Training Details: Image Quantity, Quality, and Training Parameters

The paragraph discusses the number of images needed for training, suggesting that more complex subjects require a larger dataset, while simpler ones, like a single face, need fewer images. It explains the concepts of steps and epochs in the training process and provides guidelines on how to determine the appropriate number of steps per image and epochs for effective training. The importance of image size is reiterated, with a recommendation for a minimum size of 512x512 pixels for better AI training.


๐Ÿ“š Organizing Training Materials and Software Setup

The speaker provides a detailed guide on organizing training materials, suggesting a folder structure for source images, logs, models, and other resources. It then outlines the software setup process for Koya SS, including the installation of Python, git, and Visual Studio, along with specific terminal commands for installation. The video also covers the process of captioning image files with keywords using the WD14 captioning tool and the importance of reviewing and editing these keywords for accuracy.


๐Ÿ› ๏ธ Finalizing Training Setup and Starting the Training Process

The final paragraph covers the final steps in setting up the training environment, including selecting a model for training, organizing folders for images, logs, and models, and defining training parameters such as batch size and the number of epochs. It also addresses common issues like running out of VRAM and suggests adjustments to resolve them. The video concludes with the actual training process, emphasizing the need for patience and providing tips for troubleshooting and improving the training outcomes.




LORA, which stands for 'Low-Rank Adaptation', is a technique used in AI image generation to modify existing models without the need to retrain them from scratch. In the context of the video, LORA is used to adapt models to specific styles or subjects, such as a particular face or fashion style. It is a smaller version of a full model and can be applied to various models, making it versatile and efficient for training purposes.

๐Ÿ’กModel Training

Model training refers to the process of teaching an AI system to perform a specific task, in this case, generating images that match certain criteria. The video discusses how to achieve good results by selecting appropriate images, understanding the training process, and using the right tools. It is a crucial part of creating AI models that can produce high-quality outputs.


Discord is a communication platform that the video's creator uses to interact with the community and provide support for LORA and model training. It is mentioned as a resource where viewers can find helpful people and the creator himself for assistance with their training endeavors.

๐Ÿ’กAI Image Generation

AI image generation is the process of creating images using artificial intelligence. The video script explains how an input photo is dissolved into noise and then reconstructed into a new image through the learning process. This technology is central to the theme of the video, as it forms the basis for the training and application of LORA and models.

๐Ÿ’กTraining Method

The training method is the approach used to teach the AI how to interpret and recreate images. The video explains that the method involves dissolving an input image into noise and then attempting to reconstruct it, with the goal of producing an output image that closely resembles the input but in a desired style or with certain features.

๐Ÿ’กImage Quality

Image quality is a critical factor in training AI models. High-quality images that are sharp and well-defined are preferred because they allow the AI to better understand and recreate the details of the image. The video emphasizes the importance of using high-resolution, non-blurry, and uncompressed images for optimal training results.


Keywords are descriptive terms used in the training process to help the AI understand the characteristics of the images it is learning from. The video script discusses how using specific keywords can influence the variability in the AI's output, allowing it to learn different styles, colors, and features that can be adjusted later.


In the context of the video, epochs refer to the number of times the entire training set is run through the learning algorithm. The video suggests that using multiple epochs with fewer steps per image can lead to better training results, as it allows for iterative improvements over the training process.

๐Ÿ’กMerging Trick

The merging trick is a technique described in the video to improve the quality of an AI model by combining it with another, more refined model. This can be particularly useful when the initial model training does not yield perfect results, allowing the creator to achieve the desired outcome with fewer steps.

๐Ÿ’กGPU Power

GPU (Graphics Processing Unit) power is an important consideration when training AI models, as it affects the speed and efficiency of the training process. The video mentions that higher resolution images can slow down training due to increased demand on GPU resources.

๐Ÿ’กKoya SS

Koya SS is the software mentioned in the video for training AI models. It is described as user-friendly and having a large community for support. The video provides a detailed guide on how to install and use Koya SS for the training process, including setting up the environment and running the training.


The presenter shares a guide on training LORA and models for achieving amazing results in AI image generation.

Joining a specific Discord channel can provide helpful resources and community support for LORA and model training.

Understanding the training process is crucial for selecting the right images and enabling the model to comprehend them effectively.

The importance of image size, especially for faces in the image, is emphasized for accurate reconstruction by the model.

Diverse images are necessary for training, including different emotions, fashion styles, and lighting situations.

High-quality, non-blurry images are recommended for better definition by the AI during the training process.

The use of specific keywords in text files is vital as they act as variables for the AI to learn and adapt.

The difference between LORA and a full model is explained, with LORA being a smaller, versatile add-on.

The presenter suggests training on images of a star for private research purposes due to the abundance of varied images available.

The number of images needed for training depends on the complexity of the subject; faces require fewer images compared to styles with more variation.

Training steps and epochs are detailed, explaining the difference and their significance in the training process.

The benefits of using higher resolution images for training are discussed, including better quality and more details for the AI to learn from.

A merge trick is introduced to improve model quality by combining it with a better model, even if the initial training isn't fully optimized.

The use of uncropped images is suggested to avoid losing important training data and to allow the software to determine the best resolution and ratio.

Tools for finding and resizing images, such as Google Images and Bulk Resize, are recommended for preparing training data.

A folder structure is suggested for organizing project files, including separate folders for images, logs, models, and source images.

Koyasu is introduced as the software of choice for training models, with a guide on how to install and set it up.

Captioning of image files is emphasized for creating keyword text files that the AI uses to understand and train on the images.

The use of a tool like 'boru data set tag manager' is suggested for efficient keyword management and refinement.

The presenter demonstrates how to use the trained model in combination with another model to achieve desired results through a merging process.