Stable Diffusion and better AI art - Textual Inversion, Embeddings, and Hasan

Frank The Tank
18 Oct 202208:20

TLDRThe video discusses alternative models to Stable Diffusion and introduces the concept of textual inversion. It explores the potential and limitations of models like Waifu Diffusion and the impact of training data on their output. The video also delves into embeddings and hyper networks, showcasing their role in creating stylized AI art. The creator's experiments with training embeddings on specific image sets are shared, highlighting the possibilities and current challenges in this AI art space.


  • 🎨 The video discusses alternative models to the stable diffusion model and their impact on AI-generated art.
  • 🔍 Textual inversion is a process of adding new elements to AI models, which can lead to mixed results but showcase the potential of stable diffusion.
  • 🖼️ The quality of AI models is dependent on the training material, with the regular stable diffusion model being trained on a vast number of images resulting in painterly outputs.
  • 🌐 Waifu diffusion, an alternative model trained on anime images, is introduced as a notable example of different AI models available for use.
  • ⚠️ Users are cautioned about the potential for explicit content when using certain AI models, such as the waifu diffusion model.
  • 🔄 The video highlights the differences in stylistic outputs between various AI models, including the novel AI model.
  • 📊 Hyper networks and embeddings are discussed as newer technologies in AI art generation, with the former being associated with a distinct, stylized look in images.
  • 🔧 Users can create and trade their own embeddings, which are a novel way of storing data in image form, through a training process.
  • 🖼️ The effectiveness of embeddings is still a topic of exploration, with the video showcasing the process of training images and the resulting AI-generated outputs.
  • 🎭 The video concludes with a look forward to the potential of AI in art, emphasizing the importance of experimentation and community collaboration.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is Stable Diffusion and better AI art, focusing on alternative models, textual inversion, embeddings, and hyper networks.

  • What is textual inversion?

    -Textual inversion is the process of adding new elements to AI models, which may not necessarily be directly to the models but can be demonstrated through examples.

  • How does the video address the Novel AI leak?

    -The video discusses the excitement around the Novel AI leak because people believed it to be a better model than the regular Stable Diffusion model. However, it also presents a counter-view that models are only as good as the training material they're based on.

  • What is Waifu Diffusion and how is it different from the regular Stable Diffusion model?

    -Waifu Diffusion is an alternative model trained using anime images from the Danburu library. It differs from the regular Stable Diffusion model in that it produces more stylized, anime-like images.

  • What are the potential issues with using Waifu Diffusion?

    -Waifu Diffusion may produce explicit images because it feels like it was trained to create such content. Users should be cautious with the prompts they use with this model.

  • What is an embedding in the context of AI and Stable Diffusion?

    -In the context of AI and Stable Diffusion, an embedding is a method of storing data in the form of a picture. Individuals can train their own embeddings and share them with others.

  • What are the requirements for creating an embedding?

    -To create an embedding, one needs a folder full of images that meet specific criteria, such as being exactly 512 by 512 pixels and avoiding text. The images should be used to train the embedding, which can then be shared with others.

  • How can embeddings be shared and used?

    -Embeddings can be shared by generating an embedding token that others can use in their AI models. Users can create embeddings based on specific data and then distribute the tokens for others to utilize in their own projects.

  • What is the significance of the human form in training AI models?

    -The human form is significant in training AI models like Stable Diffusion because there are more portraits of people than other subjects. This means that models may perform better when trained on images of people, as seen in the video with the training on pictures of Hassan.

  • How can embeddings be mixed with other AI art techniques?

    -Embeddings can be mixed with other AI art techniques by using them as part of the prompt or input for the AI model. This allows for more creativity and control over the output, as demonstrated by the use of Victorian lace in the video.

  • What are the potential future developments in the world of AI based on the video?

    -The video suggests that the future of AI could involve more advanced uses of embeddings, hyper networks, and other technologies. As AI continues to develop, there may be new ways to create and share data, leading to even more powerful and diverse AI-generated art.



🌟 Introduction to Alternative Models and Textual Inversion

The video begins with an introduction to the topic of alternative models in the context of stable diffusion, highlighting the excitement around the novel AI leak and its potential advantages over the regular stable diffusion model. The video aims to explore these alternative models, such as the waifu diffusion model, and their capabilities through examples. It also touches on the concept of textual inversion, which involves adding new elements to models, and the power of stable diffusion demonstrated by various programmers. The discussion emphasizes that models are as good as the training material they are based on, and the regular stable diffusion model's tendency to produce painterly outputs due to its training data.


📸 Exploring Waifu Diffusion and Stylistic Differences

This paragraph delves into the specifics of the waifu diffusion model, which was trained using anime images from the Danburu library. It explains how this model can be accessed through Hugging Face and used in conjunction with the stable diffusion web UI. The video provides a practical demonstration by using the waifu diffusion model with a portrait prompt, cautioning viewers about the potential for explicit content due to the model's training. The discussion also contrasts the stylistic differences between the waifu diffusion model and the novel AI model, highlighting the unique visual outcomes that can be achieved with alternative models.

🔍 Discussing Embeddings and Hyper Networks

The video moves on to discuss the concept of embeddings and hyper networks, explaining their role in the diffusion process and how they contribute to the stylized appearance of images. It mentions that novel AI was the first to incorporate hyper networks into the diffusion process, which has influenced the look of the generated images. The video then transitions to explain the creation of personal embeddings, which involve training a model on a specific set of images. It provides a step-by-step guide on how to train embeddings, emphasizing the importance of using high-quality, non-text images and the correct image dimensions. The video also shares the creator's personal experiences with training embeddings and the potential for future developments in this area of AI.



💡Stable Diffusion

Stable Diffusion is an AI model that generates images from textual descriptions. It is based on a large dataset of images and uses a process called diffusion to create new images that emulate the styles it has been trained on. In the video, the creator discusses the capabilities of this model and its potential for creating art in various styles, highlighting that the quality of the output is heavily influenced by the training data used.

💡Textual Inversion

Textual Inversion refers to the process of adding new elements or content to AI models, which can enhance or alter their performance. In the context of the video, the creator discusses how this technique can be applied to AI art generation models like Stable Diffusion to produce varied and innovative results. However, the video also cautions that the effectiveness of textual inversion is still a subject of exploration and may yield mixed outcomes.


Embeddings are a method of representing data in a form that can be easily processed by AI models. In the video, the creator introduces embeddings as a novel concept in the AI art space, explaining that they allow individuals to train their own data representations. These embeddings can then be shared and used to influence the output of AI models, such as Stable Diffusion, to produce images that reflect the characteristics of the trained data.

💡Hyper Networks

Hyper Networks are a concept in AI where a network is used to generate or modify other networks. In the context of the video, the creator discusses how Hyper Networks have been incorporated into the AI art generation process, particularly by Novel AI, to create very stylized and consistent outputs. The use of Hyper Networks is presented as a significant factor in the distinctive visual style of the images produced by certain AI models.

💡Waifu Diffusion

Waifu Diffusion is an alternative model to the standard Stable Diffusion model, trained specifically on anime images from the Danburu library. The video highlights that using Waifu Diffusion can result in outputs that have a more stylized, anime-like appearance. However, the creator also warns that this model may produce explicit content, advising viewers to exercise caution when using it.

💡Training Data

Training data refers to the collection of examples used to teach an AI model how to perform a specific task. In the video, the importance of training data is emphasized, with the creator explaining that the nature of the training data has a direct impact on the style and quality of the AI-generated images. For instance, the standard Stable Diffusion model is trained on a vast number of images, resulting in outputs that often resemble painterly or artistic works.

💡AI Art

AI Art is a form of digital art that is created with the assistance of artificial intelligence. In the video, the creator explores the potential of AI models like Stable Diffusion and Waifu Diffusion to generate art, discussing the different styles and levels of detail that can be achieved. The video also touches on the ethical considerations of creating AI art, such as avoiding the use of images that may infringe on likeness rights.

💡Hugging Face

Hugging Face is a platform that provides access to various AI models, including Stable Diffusion and Waifu Diffusion. In the video, the creator mentions Hugging Face as a resource for downloading different AI models and emphasizes its role in the AI art community. The platform is presented as a valuable tool for those interested in experimenting with AI-generated art.

💡High-Res Fix

High-Res Fix refers to a feature that allows for the generation of high-resolution images by AI models like Stable Diffusion. The video uses this feature to demonstrate the improved quality of AI-generated portraits, showcasing the model's ability to create detailed and realistic images. The High-Res Fix is presented as an advancement that enhances the visual appeal and usability of AI art.

💡Novel AI

Novel AI is a platform that was mentioned in the video as having a significant impact on the AI art community due to its use of Hyper Networks. The creator discusses how Novel AI's models produce very stylized images, which is attributed to the use of Hyper Networks in their diffusion process. The video also addresses the leak of Novel AI's code, which allowed for the creation and sharing of custom Hyper Networks by the community.

💡Image Resizing

Image resizing is the process of altering the dimensions of an image while maintaining its quality and detail. In the video, the creator uses a website called 'Beer Me' to resize images to the required 512 by 512 pixels for training embeddings. This step is crucial for preparing images to be used in AI models and is highlighted as an efficient and user-friendly method for achieving the desired image dimensions.


