how to train any face for any model with embeddings | automatic1111

Robert Jene
16 May 202343:19

TLDRIn this informative video, the creator explains how to train an AI model using embeddings in Stable Diffusion with Automatic1111. The process involves gathering high-quality images of the subject, preprocessing them, creating an embedding, and fine-tuning it through various steps and learning rates. The video also shares tips for optimizing the training process, ensuring the best results for generating realistic images of the chosen subject.


  • 🌟 The video provides a tutorial on training an embedding in stable diffusion using Automatic1111 for the purpose of applying a person's face onto various models.
  • 🎥 The creator demonstrates the process using images of celebrities like Charlize Theron, Zooey Deschanel, and Katie Sackhoff, and shows examples of AI-generated images.
  • 📸 The importance of selecting high-quality images for training is emphasized, with tips on finding and filtering suitable images from various sources like Google Images and IMDb.
  • 🖼️ The process of converting WebP files to PNG for better compatibility and handling in the training software is explained.
  • 👁️‍🗨️ The video highlights the significance of image resolution, recommending at least 512x512 pixels for effective training of the embedding.
  • 🎨 Tips on using image editing software to crop and upscale images are provided, ensuring that the images are properly prepared for the training process.
  • 🔍 The creator explains how to create and name an embedding file, and how to determine the number of vectors per token based on the number of input images.
  • 📂 The organization of training data and embedding files is discussed, with advice on maintaining a structured folder system for clarity and ease of use.
  • 🛠️ The video outlines the pre-processing of images and the role of captions in refining the AI's understanding of each image before training.
  • 🔧 The process of training the embedding is detailed, including setting up the learning rate, batch size, and gradient accumulation steps for optimal results.
  • 📊 The use of monitoring tools to track the training progress and loss values is mentioned, with guidance on interpreting these metrics to evaluate the success of the training.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is training embeddings in stable diffusion using AI to generate images of specific faces on various models.

  • What are some of the actresses mentioned in the video?

    -Some of the actresses mentioned in the video include Charlize Theron, Zooey Deschanel, and Katee Sackhoff.

  • How does the video demonstrate the AI-generated images?

    -The video demonstrates AI-generated images by showing examples of different actresses in various settings, comparing them to their real-life photos and highlighting the quality and accuracy of the generated images.

  • What is the importance of selecting the right images for training the AI?

    -Selecting the right images for training is crucial because it ensures that the AI can accurately learn and reproduce the desired facial features and expressions, minimizing errors and enhancing the quality of the final output.

  • What are some tips for gathering images for training?

    -Some tips for gathering images include using search engines like Google, looking at websites like IMDb and Pinterest, avoiding images with obstructions, watermarks, or poor resolution, and ensuring the images are at least 512x512 pixels.

  • How does the video address the issue of image file formats?

    -The video addresses the issue of image file formats by discussing the conversion of WebP files to PNG using specific software and shortcuts, as PNG files are more compatible with the AI training process.

  • What is the purpose of upscaling images in the training process?

    -Upscaling images is done to improve the quality of the images and remove graininess or artifacts, ensuring that the AI can better recognize and reproduce the details of the face being trained.

  • How does the video handle the issue of overfitting in AI training?

    -The video discusses the importance of adjusting the number of training steps and the learning rate to prevent overfitting, where the AI becomes too specialized and loses its ability to generalize to new data.

  • What is the role of the embedding file in the AI training process?

    -The embedding file is a crucial part of the AI training process as it contains the learned features and characteristics of the face being trained. This file is used to guide the AI in generating accurate and realistic images of the subject.

  • What are some of the challenges faced during the AI training process as highlighted in the video?

    -Some challenges faced during the AI training process include dealing with image file formats, selecting high-quality images, avoiding overfitting, and ensuring that the training steps and learning rates are properly adjusted for optimal results.



🎥 Introduction to AI Image Generation and Embedding Training

The paragraph introduces the concept of using AI for image generation, specifically with stable diffusion. The speaker shares their experience in generating images of celebrities like Charlize Theron and Zooey Deschanel. They discuss their process of training an 'embedding' in stable diffusion to apply a person's face to various models. The speaker also mentions their intention to share time-saving tricks and quality-enhancing techniques, and encourages viewers to contribute their own tips in the comments section. They express their determination to deliver a comprehensive guide without mispronouncing technical terms, acknowledging the importance of clear communication in the learning process.


🔍 Gathering and Preparing Images for Embedding Training

This paragraph delves into the specifics of gathering images for embedding training. The speaker guides viewers on how to source high-quality images of the person whose face they wish to train, using Google Images and IMDb as examples. They emphasize the importance of selecting images that clearly show the person's face, avoiding images with obstructions, other people, watermarks, or extreme lighting conditions. The speaker also discusses the use of Pinterest and Flickr as alternative sources for images, and provides tips on how to handle images that don't display correctly due to file format or resolution issues.


🖼️ Upscaling and Enhancing Images for AI Training

The speaker continues with the process of enhancing and upscaling images for AI training. They discuss the importance of having images that are at least 512 by 512 pixels and provide tips on how to upscale lower resolution images using various tools and techniques. The speaker also covers how to handle images with artifacts or graininess, and shares their experience with different upscaling methods. They mention the use of Earth and View for upscaling and confirm the improved quality of the upscaled images through the resolution displayed in the program.


✂️ Cropping Images and Creating a Training Set

In this paragraph, the speaker explains the next steps in preparing the images for training, which involves cropping the images to focus solely on the person's face. They detail the process of using an image editing tool to crop out unnecessary elements and ensure that the person's face is clearly visible. The speaker also talks about the need for a variety of images, including portraits, mid-frame shots, and full-body frames, and provides instructions on how to resize these images to the required dimensions. They discuss the use of a specific online tool, Burmy, for cropping and exporting the images in the correct format for training.


📝 Creating and Configuring the Embedding File

The speaker moves on to the creation of the embedding file, which is crucial for training the AI model. They explain the process of naming the embedding file after the person whose face is being trained and discuss the significance of the number of vectors per token, which determines the size of the embedding file and the amount of information it can hold. The speaker references a Reddit article and a GitHub article for further information on this topic. They also provide guidance on how to create the embedding file using the training tab in the AI software, emphasizing the importance of using the correct image dimensions and the right number of vectors per token.


🖱️ Pre-Processing Images and Initiating Training

The paragraph describes the pre-processing of images as a necessary step before training the AI model. The speaker explains how to extract the images to a specific folder and verifies the quality of each image. They discuss the use of the pre-process images tab in the software to copy and paste the path of the image folder. The speaker also talks about the importance of checking the auto-generated captions to ensure they accurately describe the images without including unnecessary details. They mention the need to edit the captions to avoid over-training the AI on incorrect or irrelevant information.


🚀 Fine-Tuning the Training Process and Evaluating Results

The speaker discusses the fine-tuning of the training process, focusing on the learning rate and gradient accumulation steps. They share their experiences with different configurations and how these affected the quality of the training outcome. The speaker provides a detailed explanation of the training tab settings, including the data set directory, prompt template, max steps, and other parameters. They also talk about the importance of monitoring the training process through saved images and embedding files, and using a script to analyze the training progress and loss points. The speaker emphasizes the need to adjust the training based on the observed results and the potential need to retrain with different settings.


🔄 Adjusting Training Parameters and Testing Different Prompts

In this paragraph, the speaker continues to experiment with different training parameters, aiming to improve the AI-generated images. They discuss the process of adjusting the training rate and steps, and the impact of these changes on the final output. The speaker also talks about testing different prompts and settings to achieve the desired result, including the use of various models and the importance of prompt engineering. They share their findings from testing different embeddings and steps, and the decision-making process behind selecting the best result for further refinement.


🎨 Finalizing the Training and Presenting the Outcomes

The speaker concludes the video by presenting the final outcomes of their AI training efforts. They discuss the results obtained from different training steps and the selection of the most satisfactory image. The speaker also talks about testing the trained model in various AI models and shares the different visual outcomes. They reflect on the overall process, the lessons learned, and the potential for future improvements. The speaker encourages viewer engagement through comments and subscriptions, and teases the topic of the next video, involving the training of 'Laura's' model, highlighting the distinction between embeddings and models.



💡stable diffusion

Stable diffusion is a term used in the context of AI-generated images. It refers to a model that uses machine learning algorithms to create new images based on a set of input data. In the video, the creator is using stable diffusion to generate images of faces by training the model with specific facial features and characteristics. This technique allows for the creation of unique and realistic images that can be used in various applications, such as in the examples shown with the actresses and characters mentioned in the script.


In the context of the video, an embedding is a representation of data that allows similar data points to be near each other in a space, often used in machine learning and AI models like stable diffusion. The creator trains an embedding by inputting images of a specific face, which the AI then uses to learn and generate new images with similar features. The embedding helps the AI model understand the unique characteristics of the face and reproduce them accurately in the generated images.

💡AI-generated images

AI-generated images refer to the process of creating visual content using artificial intelligence. In the video, the creator demonstrates this by using stable diffusion to generate images of faces based on real actors and characters. The AI takes input images and learns to replicate and modify the facial features to produce new, realistic images. This technology has various applications, from entertainment to design and beyond.

💡training data

Training data consists of the input, usually images or other types of data, used to teach a machine learning model how to perform a specific task. In the video, the creator uses images of faces as training data to train the stable diffusion model. By inputting these images, the model learns to recognize and reproduce the facial features, ultimately generating new images with those learned characteristics.

💡Charlie's Theron

Charlie's Theron is mentioned as an example of a character portrayed by the actress Charlize Theron. In the context of the video, images of Charlie's Theron from the movie Mad Max: Fury Road are used to demonstrate the process of training an AI model with specific facial features. The creator uses these images as part of the training data to generate new images with similar features using stable diffusion.

💡Zooey Deschanel

Zooey Deschanel is an actress who is mentioned in the video as an example of a person whose face the creator has generated using the stable diffusion model. The creator uses images of Zooey Deschanel from different movies to illustrate the process of training an AI model to generate images with specific facial features and expressions.

💡Katie Sackhoff

Katie Sackhoff is mentioned as one of the favorite sci-fi actresses of the video creator. The script refers to images from the show 'The Mandalorian,' in which Sackhoff appeared, and how the AI model can be trained to generate images with her facial features. This serves as an example of how the stable diffusion model can be used to create AI-generated images of specific individuals.

💡Amber mid Thunder

Amber mid Thunder is mentioned in the context of gathering images for training an AI model. The video creator uses images of Amber mid Thunder from different sources, such as Google Images and IMDb, to train the stable diffusion model. This demonstrates the process of selecting and preparing training data for the AI to learn and generate images with similar facial characteristics.

💡HD wallpapers

HD wallpapers refer to high-definition desktop background images. In the video, the creator mentions looking for HD wallpapers as a source of images for training the AI model. These images are typically high-resolution and clear, making them suitable for training the stable diffusion model to generate high-quality AI-generated images.


Upscaling is the process of increasing the resolution of an image while maintaining or improving its quality. In the context of the video, the creator discusses upscaling images to meet the requirements for training the stable diffusion model. This is done to ensure that the images are clear and detailed enough for the AI to learn from and generate high-quality facial features in the AI-generated images.


Cropping in the context of the video refers to the editing process of selecting and cutting out a portion of an image, usually to focus on a specific subject like a face. The creator discusses cropping images of Amber mid Thunder to ensure that only the face is included in the training data for the stable diffusion model. This helps the AI model to focus on learning and reproducing the facial features accurately.


The video presents a method for training embeddings in stable diffusion using Automatic1111.

Charlie's Theron from Mad Max: Fury Road and Eon Flux is used as an example of AI-generated images.

The process involves gathering images of the person whose face you want to train, using websites like Google Images and IMDb.

Images should be at least 512 by 512 pixels, with no other people, watermarks, or extreme brightness/darkness.

The video demonstrates how to upscale low-resolution images and remove unwanted elements like microphones using image editing software.

The importance of using diverse angles and expressions in the training images is emphasized for better AI learning.

The process of cropping images to focus solely on the face or full body is detailed for optimal training results.

Creating an embedding file is explained, which is essential for training the AI model to recognize the specific face.

The video provides insights on determining the number of vectors per token for the embedding based on the number of input images.

The pre-processing of images is discussed, including how to check and edit the AI-generated captions for accuracy.

Training the embedding with different learning rates and gradient accumulation steps is explored to find the optimal settings.

The video highlights the importance of monitoring training progress and adjusting parameters to prevent over-training.

The use of scripts to analyze and graph the training data is introduced for a more informed decision-making process.

The practical application of the trained embedding is demonstrated by generating images of Amber Mid Thunder using various models.

The video discusses the concept of over-training and how it affects the flexibility of the embedding in producing varied outputs.

The process of re-training the embedding with adjusted parameters to improve results is outlined.

The video concludes with a demonstration of the final results, showcasing the effectiveness of the training process.