Run Stable Diffusion 3 Locally! | ComfyUI Tutorial

Markury AI
12 Jun 202403:48

TLDRThis tutorial demonstrates how to locally run Stable Diffusion 3 Medium using ComfyUI. The process involves downloading necessary files from Hugging Face, updating ComfyUI, and installing models into the correct folders. After setup, the user can generate images with natural language prompts, showcasing the model's ability to create detailed and ethereal images like a female character with aurora-like hair. The video also touches on the need for community feedback regarding licensing issues.

Takeaways

  • ๐Ÿ˜€ The tutorial is about using Stable Diffusion 3 Medium with ComfyUI.
  • ๐Ÿ“ To access the model, one must fill out a form on Hugging Face and agree to access the repository.
  • ๐Ÿ“š Necessary files to download include sd3 medium safe tensors, text encoders like CLIP G, CLIP L, T5 XXL, and the ComfyUI workflows.
  • ๐Ÿ”„ If ComfyUI is already running, it needs to be closed for an update, which can be done through the 'update_comfy_ui.bat' script.
  • ๐Ÿ“ After updating, install the CLIP models into the 'clip' folder and the sd3 medium safe tensor into the 'checkpoints' folder.
  • ๐Ÿ–ผ๏ธ The generation process involves loading the checkpoint and using a natural language prompt for image creation.
  • ๐ŸŒ The example prompt provided is for a female character with hair resembling the northern lights.
  • ๐Ÿ†“ The model's weights have been released for free, which is a significant development.
  • ๐Ÿ“œ There are licensing issues that need to be addressed, and the community is encouraged to open issues or contact Stability AI for updates.
  • ๐Ÿ” The script suggests a more natural language style prompt is more effective than a tag style for this model.
  • ๐ŸŽจ The generated image is described as 'amazing,' indicating high satisfaction with the model's capabilities.

Q & A

  • What is the main topic of the tutorial video?

    -The tutorial video is about how to use Stable Diffusion 3 medium and integrate it with ComfyUI.

  • Why is the model referred to as 'gated'?

    -The model is called 'gated' because access to it requires filling out a form on Hugging Face, indicating it is restricted and not freely available to everyone.

  • What files does the user need to download from Hugging Face for Stable Diffusion 3 medium?

    -The user needs to download the 'sd3 medium.safetensors', 'clip G', 'clip L', and 'T5 XXL' text encoders in fp16 format.

  • What is the purpose of the 'update comfy ui.bat' file?

    -The 'update comfy ui.bat' file is used to update the ComfyUI software to the latest version.

  • Why is it necessary to close ComfyUI before updating it?

    -Closing ComfyUI before updating ensures that the update process is not interrupted and that the software is not running while changes are being made.

  • What is the recommended way to organize the downloaded models in the ComfyUI directory?

    -The recommended way is to place the downloaded models in the respective folders, such as 'clip' for the text encoders and 'checkpoints' for the 'sd3 medium.safetensors' file.

  • What is the 'basic inference workflow' mentioned in the script?

    -The 'basic inference workflow' is a ComfyUI workflow that the user can download and use for generating images with Stable Diffusion 3 medium.

  • What does the user need to do after updating ComfyUI and organizing the models?

    -The user needs to start ComfyUI using the 'Nvidia GPU dobat' and then load the checkpoint 'sd3 medium.safetensors' to begin using the software.

  • What is the example prompt provided in the video for generating an image?

    -The example prompt is 'a female character with long flowing hair that appears to be made of ethereal swirling patterns resembling the northern lights or Aurora Borealis'.

  • What issue is mentioned regarding the licensing of Stable Diffusion 3 medium?

    -The licensing is described as 'a little messed up', and the user is encouraged to open an issue or contact Stability AI to update the license.

  • How does the video suggest the community can help with the licensing issue?

    -The video suggests that it should be a community effort to let Stability AI know about the licensing issue so that it can be updated.

Outlines

00:00

๐ŸŽจ Introduction to Using Stable Diffusion 3 Medium

The video begins with an introduction to the Stable Diffusion 3 Medium model, which has just been released. The host explains the process of accessing the gated model by filling out a form on Hugging Face and agreeing to access the repository. The viewer is guided through downloading necessary files such as the 'sd3 medium.safetensors', 'clip G', 'clip L', 'T5 XXL', and 'fp16', as well as the 'comfy UI workflows' for basic inference.

๐Ÿ› ๏ธ Updating Comfy UI and Installing Models

This section details the steps required to update Comfy UI, which includes closing the running application, navigating to the directory, and executing the 'update comfy ui.bat' file. The host emphasizes the importance of updating to the latest version for compatibility with the new models. Following the update, the process of installing the CLIP models into the 'clip' folder and placing the 'sd3 medium.safetensors' file into the 'checkpoints' folder is described, ensuring the user is prepared to start using Comfy UI with the new models.

๐Ÿš€ Starting Comfy UI and Testing the Model

The final part of the script instructs the viewer on how to start Comfy UI using the 'Nvidia GPU dobat' and then load the newly downloaded 'sd3 medium.safetensors' checkpoint. The host demonstrates the use of the model by inputting a descriptive prompt for a female character with hair resembling the Northern Lights. The video showcases the model's capabilities by generating an impressive image, highlighting the model's ability to understand and respond to natural language prompts effectively. The host concludes by expressing satisfaction with the release of the model's weights for free and encourages the community to help address licensing issues by opening issues or contacting Stability AI.

Mindmap

Keywords

๐Ÿ’กStable Diffusion 3

Stable Diffusion 3 is a type of AI model used for generating images from text descriptions. In the context of the video, it is the primary subject being demonstrated. The video aims to guide viewers through the process of downloading and using this AI model, emphasizing its recent release and the excitement surrounding it.

๐Ÿ’กComfyUI

ComfyUI is a user interface that simplifies the interaction with AI models like Stable Diffusion 3. The script mentions updating ComfyUI to ensure compatibility with the new model. It is a key component in the tutorial, facilitating the user's experience with the AI model.

๐Ÿ’กHugging Face

Hugging Face is a platform where developers and users can access various AI models, including Stable Diffusion 3. The script instructs viewers to visit Hugging Face, fill out a form to access the gated model, and download the necessary files, highlighting its role as a repository for AI models.

๐Ÿ’กGated Model

A gated model refers to a type of AI model that is not freely available to the public and requires some form of access control, such as filling out a form or agreeing to terms. In the video, Stable Diffusion 3 is described as a gated model, indicating that users must go through a process to gain access to it.

๐Ÿ’กTensors

In the context of AI and machine learning, tensors are multi-dimensional arrays of numerical values that are used to represent data within models. The script mentions downloading 'sd3 medium safe tensors,' which are essential files for the Stable Diffusion 3 model to function properly.

๐Ÿ’กText Encoders

Text encoders are components of AI models that convert text into a format that can be understood by the model. The script specifies downloading 'clip G clip, L and T5 xx,' which are types of text encoders used in conjunction with the Stable Diffusion 3 model to interpret text prompts.

๐Ÿ’กWorkflow

A workflow in the context of the video refers to a series of steps or procedures followed to achieve a specific outcome, such as generating an image using Stable Diffusion 3. The script mentions using a 'basic inference workflow' with ComfyUI to guide the user through the process.

๐Ÿ’กCheckpoints

In machine learning, checkpoints are points at which the state of a model is saved during training, allowing for recovery or evaluation. The script refers to loading a checkpoint, which in this case is the 'sd3 medium safe tensors,' to initialize the Stable Diffusion 3 model for image generation.

๐Ÿ’กNvidia GPU

Nvidia GPUs are high-performance graphics processing units designed for handling complex computations, including those required for running AI models like Stable Diffusion 3. The script mentions running 'Nvidia GPU dobat,' indicating the use of Nvidia's technology to power the AI model.

๐Ÿ’กQ prompt

A Q prompt, or query prompt, is a text input given to an AI model to guide its output. In the video, a Q prompt is used to instruct the Stable Diffusion 3 model to generate an image of a 'female character with long flowing hair made of ethereal swirling patterns resembling the northern lights.'

๐Ÿ’กAurora Borealis

Aurora Borealis, also known as the northern lights, is a natural light display in the Earth's sky, predominantly seen in the high-latitude regions. In the script, it is used as a descriptive element in the Q prompt to guide the AI in creating an image with a specific aesthetic.

Highlights

Introduction to using Stable Diffusion 3 Medium and ComfyUI.

Accessing the gated model on Hugging Face by filling out a form and agreeing to terms.

Downloading necessary files such as sd3 medium safe tensors, text encoders, and ComfyUI workflows.

Updating ComfyUI by running the 'update comfy ui.bat' script.

Installing CLIP models into the ComfyUI models directory.

Creating a new folder for sd3 medium safe tensors and adding the file to the checkpoints.

Starting ComfyUI with the Nvidia GPU 'dobat' to ensure optimal performance.

Loading the sd3 medium safe tensors as the checkpoint in ComfyUI.

Using the example prompt to generate an image with a natural language description.

Observing the generated image with a female character and ethereal swirling patterns.

Comparing the prompt style to SDXL but closer to the natural language.

Expressing excitement about the release of the model's weights for free.

Discussing the licensing issue and the need for community effort to update it.

Encouraging users to open an issue or contact Stability AI about the license.

Concluding the tutorial with a reminder to have a great day.