SDXL用のCounterfeitとネガティブエンベッディングスでたー!【stable diffusion】

AI is in wonderland
29 Jul 202307:12

TLDRAlice from AI's in Wonderland introduces a model supporting both Counterfeit and SDXL, highlighting the impact of three Negative Embeddings tailored for SDXL. She compares the effects of Standard, Realistic, and Anime-like Embeddings on image quality and shares her experience using Comfy UI for generating cleaner images. Alice also discusses the results of adding a negative prompt for deformity prevention, showcasing the potential of Negative Embeddings in enhancing image generation.

Takeaways

  • 🚀 Introduction of a model supporting both Counterfeit and SDXL, named CounterfeitXL.
  • 📈 The models are large, approximately 7GB, potentially filling up storage space.
  • 🌟 Exclusive to SDXL, three Negative Embeddings (A-Standard, B-Realistic, C-Anime-like) are discussed.
  • 🖼️ CIVITA showcases prompts and images for reference, with a note that they may not match the original Counterfeit's quality.
  • 🎨 Plans to experiment with Comfy UI for quicker and cleaner image generation.
  • 🎥 A video on Comfy UI is planned, albeit delayed due to breaking news.
  • 👧 The demonstration begins with drawing a girl in a school uniform using Counterfeit XLα without LoRA.
  • 📊 The image settings include 1024x1024 size, 35 total steps, and a CFG scale of 7 with clip skip 2.
  • 🔍 Evaluation of the sampler with DPM++2MSD crow for image generation.
  • 📈 Testing the Upscale Model with different styles, starting with Anime, using Real ESRGAN 4x and Anime6B.
  • 🌈 Exploration of the effects of Negative Embeddings on image quality and detail, with variations observed.

Q & A

  • What models are discussed in the video?

    -The models discussed in the video are CounterfeitXL and SDXL.

  • How large are the CounterfeitXL and SDXL models?

    -The CounterfeitXL and SDXL models are about 7GB each.

  • What are Negative Embeddings and how many are there for SDXL?

    -Negative Embeddings are a technique used to refine the output of the model, and there are three of them exclusively for SDXL.

  • What are the three types of Negative Embeddings mentioned in the script?

    -The three types of Negative Embeddings are A for Standard, B for Realistic, and C for Anime-like.

  • What is the purpose of the CIVITA's side prompts and images?

    -The CIVITA's side prompts and images are provided as references to compare the generated images with the original Counterfeit.

  • What is the image size and total steps used in the demonstration?

    -The image size used is 1024 by 1024, and the total steps are 35, with up to 28 steps being the Base model.

  • What sampler and upscale model is Alice planning to use?

    -Alice plans to use the sampler with DPM++2MSD crow for image generation and an Upscale Model for 1.5x upscale with denoising strength 0.3.

  • What is the effect of using Negative Embeddings from category A?

    -Using Negative Embeddings from category A makes the face more solid and the cherry blossoms more distinct.

  • What issue is observed with the Realistic Negative Embeddings (category B)?

    -With the Realistic Negative Embeddings (category B), the hand becomes a little distorted, and it doesn't lean towards a real image.

  • What is the outcome of using Anime-style Negative Embeddings (category C)?

    -Using Anime-style Negative Embeddings (category C) doesn't result in significant changes, but there is some variation.

  • How does adding a negative prompt from the template affect the image?

    -Adding a negative prompt from the template results in a cleaner hand but changes the composition, making direct comparison difficult.

Outlines

00:00

🖌️ Introduction to Counterfeit XL and SDXL Models

Alice from AI's in Wonderland introduces the CounterfeitXL and SDXL models, noting their large size of about 7GB. She mentions the challenge of limited storage but commits to continuing their use. Alice also discusses the three Negative Embeddings exclusive to SDXL, which are Standard (A), Realistic (B), and Anime-like (C), expressing her intent to explore their effects. She encourages viewers to check out prompts and images on CIVITA's side, acknowledges the potential quality differences, and shares her plans to experiment with Comfy UI. Alice explains the model settings, including the Counterfeit XLα model without LoRA, a 1024x1024 image size, 35 total steps, and specific parameters for the sampler and Upscale Model. She plans to post a video on Comfy UI the following week, despite a delay due to breaking news, and provides a walkthrough of the Comfy UI screen settings.

Mindmap

Keywords

💡Counterfeit and SDXL

Counterfeit and SDXL refer to two different models mentioned in the video. CounterfeitXL is a large model, approximately 7GB, which is used for generating images. SDXL, or Stable Diffusion eXtreme Learning, is another model that is also quite large and is used to enhance the quality of the generated images. Both models are integral to the video's theme of exploring image generation and manipulation techniques.

💡Negative Embeddings

Negative Embeddings are a feature used exclusively for SDXL. They are designed to refine the output of the image generation process by excluding certain elements or characteristics. The video discusses three types of Negative Embeddings: Standard (A), Realistic (B), and Anime-like (C). These embeddings are used to evaluate their effects on the final image, playing a crucial role in the theme of enhancing image quality and style.

💡Comfy UI

Comfy UI is mentioned as a tool or interface that the speaker plans to use for generating quicker and cleaner images. It suggests that Comfy UI has a user-friendly design that allows for efficient image production, which is relevant to the video's focus on optimizing the image generation process.

💡DPM++2MSD crow

DPM++2MSD crow refers to a specific sampler used in the image generation process. It is a technical term that denotes a particular algorithm or method for creating images. The use of this sampler is part of the video's exploration of different techniques to achieve high-quality image outputs.

💡Upscale Model

An Upscale Model is a technique used to increase the resolution of images while maintaining or improving their quality. In the context of the video, the speaker has set up an Upscale Model to generate images at 1.5x upscale with a denoising strength of 0.3, which is a method to refine the images and make them appear clearer and more detailed.

💡Refiner

In the context of the video, a Refiner is a term used to describe a stage in the image generation process that refines or improves the quality of the base image. It is part of the overall theme of the video, which is about enhancing and perfecting the image generation process to achieve better results.

💡CFG scale

CFG scale refers to the configuration scale in the image generation process. It is a parameter that adjusts the level of detail or complexity in the generated images. In the video, the speaker sets the CFG scale to 7, which indicates a higher level of detail in the images produced.

💡No LoRA

No LoRA indicates that the model being used does not incorporate the LoRA (Low-Rank Adaptation) technique. LoRA is a method for efficiently adapting large neural networks for specific tasks without changing the entire model. In the context of the video, not using LoRA suggests a focus on the base model's capabilities without additional adaptations.

💡Prompt

A prompt in this context is an input or instruction given to the image generation model to produce a specific output. It is a crucial element in the video's theme of creating images, as it guides the model in generating the desired content, such as a girl in a school uniform.

💡Real ESRGAN 4x

Real ESRGAN 4x is an enhanced version of the ESRGAN (Enhanced Super-Resolution Generative Adversarial Network) model, which is used for upscaling images. The '4x' denotes that it is capable of increasing the resolution of images by four times. In the video, it is used as part of the Upscale Model to improve the quality of the generated images.

💡Anime6B

Anime6B refers to a specific model or dataset used for generating anime-style images. It is mentioned in the context of the Upscale Model, indicating that the speaker is exploring the generation of anime-like images as part of the video's theme of experimenting with different styles and techniques in image creation.

Highlights

Introduction of a model that supports both Counterfeit and SDXL.

The CounterfeitXL and SDXL models are approximately 7GB in size.

Three Negative Embeddings are exclusively for SDXL: Standard (A), Realistic (B), and Anime-like (C).

CIVITA features prompts and images for reference.

Comfy UI is considered for producing quicker and cleaner images.

The model used is Counterfeit XLα without LoRA.

The image size is set to 1024 by 1024 with a total of 35 steps.

Using the sampler with DPM++2MSD crow for image generation.

Upscale Model is set to generate images at 1.5x upscale with denoising strength 0.3.

A video on Comfy UI is planned for the following week.

Base model leaves some noise in the image.

Negative Embeddings from A make the face more solid and distinct.

Negative Embeddings from B make the hand distorted, but the face becomes cute when upscaled.

Anime-style (C) doesn't show significant changes with Negative Embeddings.

Negative Embeddings work well for cherry blossom petals.

The addition of a negative prompt from a template improves the image significantly.

The hand becomes clean, but the composition changes with the negative prompt.

Various settings, including realistic ones, were tested for image generation.