TEXTUAL INVERSION - How To Do It In Stable Diffusion (It's Easier Than You Think) + Files!!!

Olivio Sarikas
15 Oct 202216:20

TLDRTextual inversion in Stable Diffusion is a technique that allows for adding new styles or objects to text-to-image models without altering the base model. The process involves creating an embedding for new keywords, which can be used to initialize styles in Stable Diffusion. By training with a set of images, users can generate robust datasets and train models to produce accurate outputs. The tutorial covers the steps from installation to training, emphasizing the importance of image selection and prompt construction. It showcases how textual inversion can enhance AI-generated art by personalizing styles and encouraging creative experimentation.


  • ๐Ÿ“Œ Textual inversion is a technique that can be used with Stable Diffusion to create AI-generated images based on specific styles or subjects.
  • ๐Ÿ” The process begins by understanding what textual inversion is and how it can be applied within the Stable Diffusion framework.
  • ๐Ÿ“‹ To get started, one needs to have a local installation of Stable Diffusion and follow the provided installation guide.
  • ๐Ÿ”— Downloading pre-trained styles and subjects from the Stable Diffusion Conceptualizer can be a quick way to start experimenting with textual inversion.
  • ๐Ÿ“‚ Properly saving the downloaded style files within the Stable Diffusion folder is crucial for the process to work correctly.
  • ๐ŸŽจ Creating a unique name for your textual inversion project helps avoid accidental use of the same style in other projects.
  • ๐Ÿ–ผ๏ธ Selecting a good number of input images (15-100) that are similar but not identical ensures a more accurate training process.
  • ๐Ÿ“ Setting the correct image resolution is important for maintaining the aspect ratio and reducing training time.
  • ๐Ÿš€ Training the model can take a significant amount of time, depending on the complexity and the steps set for the training process.
  • ๐Ÿ”„ The ability to continue training after an initial session allows for refining the results and achieving better outcomes.
  • ๐ŸŽญ Experimenting with different animals or subjects can lead to interesting and unique AI-generated images, showcasing the creative potential of textual inversion.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is textual inversion in Stable Diffusion and how to perform it using one's own images.

  • What is textual inversion in the context of Stable Diffusion?

    -Textual inversion in Stable Diffusion is a technique where the AI is trained with a specific style or subject using a set of images to generate new images that match the input style or subject.

  • How can one acquire the pre-trained styles for Stable Diffusion?

    -Pre-trained styles for Stable Diffusion can be found on the Stable Diffusion Conceptualizer, where users can download and use them immediately.

  • What is the recommended value for the number of vectors per token in textual inversion?

    -A value between 8 and 10 is recommended for the number of vectors per token in textual inversion.

  • What is the significance of the number of vectors per token in the process?

    -The number of vectors per token determines the size of the embedding. A larger value means more information about the subject can fit into the embedding, but it also reduces the prompt allowance.

  • How many input images are suggested for the textual inversion process?

    -At least 15 input images are suggested, though one can also use 50 or 100 pictures, depending on the project.

  • What should be the characteristics of the input images for textual inversion?

    -The input images should be fairly similar to each other but not too similar. They should also be a little bit different but not too different to provide variety during the training process.

  • What is the recommended resolution for the input images in the textual inversion process?

    -The recommended resolution for the input images is 512 by 768 pixels.

  • How long does the training process typically take?

    -The training process can take a considerable amount of time. For instance, on a powerful computer with a 3080 TI, 20,000 steps took two and a half hours.

  • What is the purpose of the 'Create Flipped Copies' option in the process?

    -The 'Create Flipped Copies' option doubles the amount of training images by flipping each image to the other side, enhancing the training data variety.

  • How can one use the trained textual inversion in Stable Diffusion?

    -After training, the textual inversion can be used in the Stable Diffusion 'Text to Image' or 'Image to Image' features by entering the prompt and appending the project name at the end.



๐ŸŽจ Introduction to Textual Inversion with Stable Diffusion

This paragraph introduces the concept of textual inversion using Stable Diffusion, a method that may seem complex but is quite straightforward. The speaker uses Stable Diffusion's local installation to demonstrate textual inversion, guiding viewers on how to install and use it. The explanation includes what textual inversion is and how it can be utilized with pre-trained styles and subjects available on the Stable Diffusion Conceptualizer. It also provides instructions on downloading and using these styles, emphasizing the importance of saving the style files within the Stable Diffusion folder for proper functionality.


๐Ÿ–ผ๏ธ Preparing Images for Textual Inversion

The second paragraph delves into the specifics of preparing images for textual inversion. It emphasizes the need for a sufficient number of similar yet distinct images, such as multiple bunny pictures created using Mid-journey. The speaker discusses the importance of image resolution, suggesting a reduction to 512 by 768 pixels to optimize training time. The paragraph outlines the process of copying source and destination directories, setting the correct image size, and utilizing options like creating flipped copies and using BLIP captions for file names. It also touches on the limitations of Stable Diffusion's prompt allowance and how token usage can affect it.


๐Ÿ› ๏ธ Training Textual Inversion with Stable Diffusion

This paragraph explains the training process of textual inversion in Stable Diffusion. It covers the steps to create a file for textual inversion, including setting up the number of vectors per token, which affects the size of the embedding and the prompt allowance. The speaker provides guidance on selecting the right prompts, using the textual inversion template, and setting the correct resolution for training. It also discusses the significance of the number of training steps, offering advice on how to balance the quality of results with the duration of the training process. The paragraph further explains how to process the images, set up the training parameters, and the importance of creating a backup of the embedding during training.


๐Ÿ“ธ Using and Evaluating Textual Inversion Results

The final paragraph discusses how to use the textual inversion results in Stable Diffusion. It explains the structure of the textual inversion folder and how to identify the project-specific embeddings and test images. The speaker shares insights on evaluating the training results, suggesting that sometimes less-trained versions can yield better outcomes. The paragraph also highlights the flexibility of using different animals and styles for creating unique AI-generated images. It concludes with the speaker's encouragement for viewers to experiment with textual inversion, emphasizing its potential for artistic exploration and creativity within AI art.



๐Ÿ’กTextual Inversion

Textual inversion is a process used in AI art generation, particularly with Stable Diffusion, where the AI is trained to understand and replicate specific styles or subjects based on a set of input images. The video explains that it's easier than it sounds, and demonstrates how to use the Stable Diffusion software to achieve this. It's related to the main theme as it allows for the creation of unique AI-generated art by inverting the text or style associated with certain images.

๐Ÿ’กStable Diffusion

Stable Diffusion is an AI model used for generating images from textual descriptions. It is an open-source project that has gained popularity for its ability to create high-quality, diverse images. In the video, the creator discusses using Stable Diffusion for textual inversion, which is a technique to train the AI with custom images to produce specific styles or subjects in the generated images. This is central to the video's message as it showcases the capabilities of Stable Diffusion for personal art creation.

๐Ÿ’กEmbeddings Folder

The embeddings folder is a crucial part of the Stable Diffusion setup where various style and subject files, known as embeddings, are stored. These files are used to initialize the style or subject when generating images with Stable Diffusion. The video emphasizes the importance of this folder in the textual inversion process and instructs viewers on how to download and save new embeddings files within it.


In the context of the video, a prompt is a textual description or a set of keywords that the AI uses to generate an image. The prompt is a critical aspect of controlling the output of the AI, as it provides the AI with the information needed to create an image that matches the desired style or subject. The video provides insights into how to craft effective prompts for textual inversion in Stable Diffusion.


Tokens, in the context of the video, refer to the elements of the text that the AI uses to understand and generate images. The number of vectors per token determines the size of the embedding, which affects how much information about the subject can be included. The video suggests that a value between 8 and 10 is effective for this purpose, as it balances the amount of information with the prompt allowance.

๐Ÿ’กInput and Output Folders

Input and output folders are used in the process of training Stable Diffusion for textual inversion. The input folder contains the original images that the AI will learn from, while the output folder is where the processed images generated by the AI will be saved. These folders are essential for organizing the training data and results of the textual inversion process.


Training in the context of the video refers to the process of teaching the Stable Diffusion AI to recognize and replicate specific styles or subjects based on the input images. This involves setting up various parameters, such as the number of steps, and running the AI to produce an embedding that captures the essence of the input images. The training process is central to the video's theme as it is the method used to achieve textual inversion.

๐Ÿ’กSample Images

Sample images in the video are the AI-generated images that are produced during the training process at regular intervals. These images serve as a visual representation of the AI's progress and help the user understand how well the AI is learning from the input images. They are crucial for monitoring the effectiveness of the training and making adjustments if necessary.

๐Ÿ’กTextual Inversion Folder

The textual inversion folder is a directory within the Stable Diffusion setup that contains the results of the textual inversion training. It includes the embeddings and test images that showcase the AI's ability to replicate the trained style or subject. This folder is essential for accessing and using the trained AI models for image generation.

๐Ÿ’กArtificial Intelligence (AI)

Artificial Intelligence, or AI, refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. In the context of the video, AI is used to generate images through the Stable Diffusion model. The AI learns from the input images to produce new images that match the desired style or subject, as defined by the textual inversion process.


Textual inversion in Stable Diffusion is easier than it sounds.

Stable Diffusion local install automatic 1111 is used for demonstration.

Textual inversion allows you to train Stable Diffusion with your own images.

The stable diffusion conceptualizer offers pre-trained styles and subjects.

Input images are not output images from Stable Diffusion, but originals for comparison.

To download styles, right-click the name and open the link in a new tab.

Save the downloaded style file inside the Stable Diffusion embeddings folder.

An outdated version of Stable Diffusion can be updated by downloading a new zip file from the website.

Textual inversion requires a unique name to avoid accidental reuse of styles.

The number of vectors per token affects the size of the embedding and the prompt allowance.

Images for training should be similar but not too similar, and at least 15 in number.

The resolution of images is important for defining the ratio and training time.

Processing images involves a source and destination directory for input and output.

Flipped copies can be created to double the amount of training images.

The AI uses a prompt template file with descriptions for training images.

Training can be paused and resumed with different settings for further refinement.

Sample images are rendered every 150 steps during training.

Textual inversion enables the creation of unique styles and artistic AI-generated images.

Different versions of the trained model can be tested for varying results.