Stable Diffusion 2.1 Released!

Nerdy Rodent
7 Dec 202204:30

TLDRStable Diffusion 2.1 introduces two new models with 512 and 768 resolution, and a refined dataset that improves upon the previous release by reducing adult content and enhancing architecture, design, and wildlife scenes. This version offers a balance of rendering high-quality architectural concepts and detailed images of people and pop culture. It also boasts better anatomy, particularly with hands, and supports a variety of art styles. Users can download the model and configuration files from the Hugging Face site and adjust settings for optimal performance. Comparisons demonstrate the enhanced capabilities of 2.1 over 2.0 in various prompts, suggesting an overall preference for the updated version.


  • 🚀 Stable Diffusion 2.1 has been released, succeeding version 2.0.
  • 🎨 Two new models are introduced in 2.1, with 512 and 768 resolution.
  • 📊 The 2.1 version was trained on a refined dataset, excluding content not suitable for work, which was overly restrictive in the 2.0 version.
  • 🏙️ The new data set for 2.1 has a greater focus on architecture, design, wildlife, and landscape scenes, improving the quality in these areas.
  • 🖼️ Stable Diffusion 2.1 offers a balance between the capabilities of 2.0 and enhanced features.
  • 👤 The new release boasts improved anatomy, particularly in the depiction of hands.
  • 🎨 The 2.1 version is better at rendering a variety of art styles compared to 2.0.
  • 💻 Users can easily download and install the automatic 1111 web UE for Windows or Linux.
  • 📄 Instructions for installation and setup are provided, including downloading the 2.1768 non-ema pruned checkpoint and the configuration file.
  • 🔧 The 2.1 release requires full precision and suggests solutions for users without X formers.
  • 🔎 A comparison of prompts between versions 2.0 and 2.1 showcases the improvements in detail and style in the latest release.

Q & A

  • What is the main improvement in Stable Diffusion 2.1 compared to version 2.0?

    -Stable Diffusion 2.1 introduces two new models with 512 and 768 resolution, and a new dataset that addresses the previous over-filtering issue of version 2.0, leading to better quality in architecture, design, wildlife, and landscape scenes.

  • How did the NSFW filters change from version 2.0 to 2.1?

    -In version 2.1, the NSFW filters are less sensitive, but they still manage to reduce the majority of adult content, striking a balance between content moderation and creative freedom.

  • What are the benefits of fine-tuning Stable Diffusion 2.1 off of version 2.0?

    -Fine-tuning Stable Diffusion 2.1 off of version 2.0 allows the new model to retain the positive aspects of the previous version, such as the ability to render beautiful architectural concepts and natural scenery, while also improving upon areas like anatomy and diverse art styles.

  • What are some of the specific improvements seen in Stable Diffusion 2.1?

    -Stable Diffusion 2.1 shows improvements in anatomy, particularly with hands, and the ability to produce images in a wider range of art styles. It also delivers better images of people and pop culture.

  • How can one obtain and install the Stable Diffusion 2.1 model?

    -To obtain and install the Stable Diffusion 2.1 model, one needs to download the 2.1768 non-ema pruned checkpoint from the Hugging Face site and place it in the Stable Diffusion models directory, along with the corresponding config file.

  • What should users do if they encounter black images when using Stable Diffusion 2.1?

    -If users experience black images, it may be due to the lack of X formers. They can resolve this by setting the environment variable 'attention_precision' to 'fp16' or by using the '--no-half' option if they are using the automatic 1111 web UE.

  • How does the Stable Diffusion 2.1 model handle different art styles compared to version 2.0?

    -Stable Diffusion 2.1 has been improved to handle a variety of art styles better than version 2.0, as demonstrated by its ability to create anime-style illustrations and surrealistic images.

  • What was the issue with hand anatomy in version 2.0, and how has it been addressed in 2.1?

    -In version 2.0, hand anatomy was often incorrect, with issues like extra fingers or unrealistic shapes. Version 2.1 has thoroughly redone and improved the hand anatomy, resulting in more realistic hand images.

  • What is the significance of the automatic 1111 web UE for Stable Diffusion 2.1?

    -The automatic 1111 web UE provides an easy-to-use interface for downloading and installing Stable Diffusion 2.1, making it accessible for users on Windows or Linux.

  • How does the script suggest users can share their preferences between Stable Diffusion 2.0 and 2.1?

    -The script encourages users to share their preferences between the two versions by commenting on which one they prefer, whether it's for the improved architecture of 2.0 or the diverse styles and better anatomy of 2.1.

  • What additional help is available for users who need assistance with prompting on Stable Diffusion 2.0?

    -For users who need help with prompting on Stable Diffusion 2.0, the script suggests looking at a referenced video that likely provides guidance and examples for effective use of the model.



🚀 Introduction to Stable Diffusion 2.1

This paragraph introduces the release of Stable Diffusion 2.1, highlighting the improvements and new features over the previous 2.0 version. Two new models with 512 and 768 resolutions are presented, along with a refreshed dataset that the 2.1 model was trained on. The previous release had a not suitable for work (NSFW) filter that was too restrictive, leading to a reduced dataset. However, 2.1 addresses this by having less sensitive NSFW filters while still reducing adult content. The 2.1 version was fine-tuned from the 2.0 version, offering the best of both worlds with enhanced capabilities in rendering architectural concepts, natural scenery, and images of people and pop culture. The release also boasts improved anatomy and better handling of various art styles.



💡Stable Diffusion 2.1

Stable Diffusion 2.1 refers to the latest version of an AI model known for generating highly detailed images based on textual descriptions. This version builds upon the improvements and capabilities of its predecessor, Stable Diffusion 2.0, by offering enhancements in image resolution, the inclusivity of subjects in generated images, and overall image quality. Specifically, the script mentions the introduction of models with 512 and 768 resolutions and adjustments in the training dataset to balance content representation. This version aims to provide a more versatile tool for generating images of architecture, landscapes, people, and pop culture with improved anatomy and art styles.

💡NSFW filter

The NSFW (Not Safe For Work) filter is a mechanism used in AI models to prevent the generation of adult or inappropriate content. In the context of Stable Diffusion 2.1, the script highlights a modification in the sensitivity of the NSFW filter compared to version 2.0. While version 2.0 had a higher sensitivity leading to a reduction of people in the dataset, version 2.1 has adjusted the filter to be less sensitive. However, it still effectively reduces the majority of adult content. This adjustment allows for a broader representation of subjects while maintaining content appropriateness.


Fine-tuning in machine learning refers to the process of taking a pre-trained model and further training it on a new, typically smaller, dataset to adjust or improve its performance on specific tasks. Stable Diffusion 2.1 was fine-tuned from Stable Diffusion 2.0, leveraging the foundational capabilities of the original model while enhancing its performance and output quality. This process allows the new version to inherit the strengths of 2.0, such as rendering architectural concepts and natural scenery, and extend its capabilities to produce better images of people and pop culture with improved anatomy and a variety of art styles.

💡Automatic 1111 web UI

The Automatic 1111 web UI is mentioned as a user interface for easily downloading, installing, and running Stable Diffusion models on Windows or Linux systems. It simplifies the interaction with the AI model by providing a graphical interface where users can input prompts, configure settings, and generate images. This tool enhances accessibility to Stable Diffusion for users who may not be comfortable working directly with code or command-line interfaces.

💡Hugging Face

Hugging Face is a company known for its work in the field of natural language processing and machine learning. It provides a platform for hosting and sharing AI models, including Stable Diffusion versions. The script directs users to the Hugging Face site to download the Stable Diffusion 2.1 model and the necessary configuration file. This platform plays a crucial role in disseminating AI advancements by making them accessible to a broad audience of developers, researchers, and enthusiasts.

💡Full Precision

Full Precision, in the context of AI models, refers to the level of numerical precision used in calculations during model inference. For Stable Diffusion 2.1, the script mentions that the model expects full precision operations. This is important because lower precision (e.g., FP16) might be used to speed up computations but can lead to issues like the generation of black images if not supported. The script offers solutions to ensure that users can run the model without encountering these problems, indicating the model's requirements for computational accuracy.

💡Comparison prompts

Comparison prompts refer to the textual inputs used to generate images for comparing the capabilities of Stable Diffusion 2.0 and 2.1. The script provides examples of various prompts, such as creating images of a rat in armor, a space alien portrait, or a surreal scene of opera on the moon. These prompts are used to showcase the improvements in image quality, detail, and creativity between versions, illustrating how Stable Diffusion 2.1 enhances the generation of diverse and complex images.

💡Anatomy improvement

Anatomy improvement in the context of Stable Diffusion 2.1 highlights the enhanced ability of the model to generate images with accurate and realistic human and animal anatomy. The script specifically mentions that the representation of hands has been significantly improved, which is a common challenge in art and image generation. This enhancement allows for more realistic and believable images of people and creatures, contributing to the model's versatility and appeal.

💡Art styles

Art styles in the context of the script refer to the diverse range of visual art forms that Stable Diffusion 2.1 can emulate in its image generation, including anime, fantasy, and surrealism. The model's improved capacity to capture and replicate various art styles allows users to create images that are not only visually appealing but also stylistically varied, catering to a wide range of artistic preferences and inspirations.

💡Configuration file

A configuration file is a crucial component mentioned in the process of setting up Stable Diffusion 2.1. It contains settings and parameters that define how the model operates and generates images. The script instructs users to download this file along with the model checkpoint and place them in the specified directory. This file ensures that the model functions correctly under the user's intended specifications, enabling the customization of the image generation process.


Stable Diffusion 2.1 release introduces two new models with 512 and 768 resolution.

The 2.1 release was trained on a new dataset, addressing the previous 2.0 release's issue of a too high not suitable for work (NSFW) filter.

The new data set for 2.1 includes more architecture, design, wildlife, and landscape scenes, improving the quality in these areas.

NSFW filters in 2.1 are less sensitive but still reduce the majority of adult content.

Stable Diffusion 2.1 is fine-tuned based on Stable Diffusion 2.0, combining the best features of both versions.

The new release improves anatomy rendering, particularly hands, and enhances art styles.

The automatic 1111 web UE is easy to download and install for use on Windows or Linux.

Users need the new Stable Diffusion 2.1 model and the configuration file from the Hugging Face site for setup.

Full precision is expected in the 2.1 release; without X formers, black images may appear if proper settings are not configured.

The GitHub page suggests using the environment variable attention_precision=fp16 or the --no-half option for the automatic 1111 web UE.

Comparisons between Stable Diffusion 2 and 2.1 show improvements in various prompts, such as a rat in detailed plate armor.

The 2.1 version delivers better results for matte acrylic face portraits of space aliens with magnificent tiaras.

Anime style illustrations, such as a fantasy forest with mystical squirrels, show improvements in style rendering.

Surrealism tests, like a woman singing opera on the moon with a rodent chorus, demonstrate the model's capability.

Hand anatomy in 2.1 has been redone and shows noticeable improvements over the 2.0 version.

A test without any negative prompts highlights the differences between 2.0 and 2.1, with 2.1 showing more comprehensive style capabilities.