とにかくかわいい顔を探す Stable Diffusion

ダルトワ★TV
29 Dec 202311:53

TLDRこの動画では、Stable Diffusionを使用して絵を描くとよく似た顔が生成される問題に対処する方法を紹介します。通常の設定では同じような顔が頻繁に出るため、「Low CFG Method」を用いて、低いCFGスケールとステップ数で絵を描き、その後、TileとControlNet、Ultimate SD Upscaleを使用してアップスケールすることで、独特でキュートな顔を見つけ出す方法を解説しています。

Takeaways

  • 😀 使用Stable Diffusion绘制时,常常出现相似的面孔。
  • 🔍 本次尝试使用“低CFG方法”来寻找不寻常的漂亮面孔。
  • 🎨 通过调整Steps和CFG Scale的数值,可以创作出更多样的可爱图片。
  • 🤔 通常使用的Chiled Remix、Bracing Evo Mix和BRA等checkpoints会导致面孔相似。
  • 🚫 降低CFG Scale和Steps可能会导致图片不完整,但也可能创造出独特的面孔。
  • 🔧 为了改善低CFG和Steps导致的粗糙或模糊效果,可以使用Tile和ControlNet以及Ultimate SD Upscale进行放大处理。
  • 📊 通过X/Y/Z图来选择具有吸引力的面孔,并通过调整Steps和CFG Scale的值来生成不同的图片。
  • 📈 使用DPM++ 2S a Karras作为sampler,可以在低CFG Scale和Steps设置下获得更好的结果。
  • 🖼️ 在选择用于放大的图片时,应基于整体氛围而非仅仅面部特征。
  • 👍 通过这种方法,即使在降低CFG和Steps的情况下,也能创作出多样化且吸引人的面孔。
  • ⚠️ 降低CFG和Steps意味着生成的图片可能不会完全遵循提示,结果具有一定的随机性。

Q & A

  • What is the 'Low CFG Method' mentioned in the transcript?

    -The 'Low CFG Method' refers to a technique used with Stable Diffusion to create unique and diverse faces by using a low number of steps and a low CFG (Control Flow Guidance) scale during the image generation process.

  • Why might using the standard settings in Stable Diffusion result in similar-looking faces?

    -Using the standard settings with a high number of steps and a high CFG scale can lead to images that converge towards a common, typical appearance, resulting in a lack of diversity in the generated faces.

  • What is the purpose of using a low number of steps and CFG scale in the 'Low CFG Method'?

    -Using a low number of steps and CFG scale allows for less convergence towards a single solution, which can result in more varied and unique facial features in the generated images.

  • What are the recommended settings for the 'Low CFG Method' according to the transcript?

    -The recommended settings for the 'Low CFG Method' are around 20 to 30 steps and a CFG scale of 7, but the transcript suggests experimenting with even lower values for both to achieve more diversity.

  • How does the transcript suggest improving the quality of images generated with the 'Low CFG Method'?

    -The transcript suggests using the 'Tile' feature, 'ControlNet', and the 'Ultimate SD Upscale' extension to upscale and improve the quality of images generated with the 'Low CFG Method'.

  • What is the significance of the X/Y/Z plot in the 'Low CFG Method'?

    -The X/Y/Z plot is used to systematically explore different combinations of steps and CFG scales to find the most aesthetically pleasing facial features before upscaling the image.

  • Why is it important to choose the right sampler for the 'Low CFG Method'?

    -Selecting the appropriate sampler is crucial because not all samplers will produce the desired level of diversity and uniqueness in facial features when using the 'Low CFG Method'.

  • What is the role of the 'DPM++ 2S a Karras' sampler in the 'Low CFG Method'?

    -The 'DPM++ 2S a Karras' sampler is recommended for the 'Low CFG Method' because it is particularly effective at generating diverse and unique facial features under the method's low convergence conditions.

  • How does the transcript address the issue of images not following prompts when using the 'Low CFG Method'?

    -The transcript acknowledges that using the 'Low CFG Method' can lead to images that don't closely follow the prompts due to the lower convergence, which introduces an element of luck or randomness in the results.

  • What is the final step in the 'Low CFG Method' after generating and selecting a face from the X/Y/Z plot?

    -The final step is to upscale the selected image using the 'Tile' feature, 'ControlNet', and 'Ultimate SD Upscale' to improve its quality and resolution.

Outlines

00:00

🎨 Exploring the 'Low CFG Method' for Unique Art with Stable Diffusion

The paragraph discusses the challenge of creating unique and diverse faces using the AI art tool Stable Diffusion, which often produces similar results. The speaker introduces the 'Low CFG Method' as a technique to achieve more varied and interesting facial features. This method involves using a low number of steps and a low CFG (Control Flow Guidance) scale to allow for more creative freedom and less convergence to common facial patterns. The speaker also mentions the use of specific samplers like 'DPM++ 2S a Karras' and the importance of using tools like 'Tile and ControlNet' along with 'Ultimate SD Upscale' for upscaling the images. The process involves selecting a source image, using X/Y/Z plots to find a pretty face, and then upscaling the chosen image for a cleaner result.

05:00

🔍 Refining Art with X/Y/Z Plots and Upscaling Techniques

This section delves into the practical steps of using X/Y/Z plots to refine the art generated by Stable Diffusion. The speaker explains how to adjust the settings to generate a grid of images with varying steps and CFG scales, allowing for the exploration of different facial expressions and styles. The goal is to find a face that is not fully converged, which can lead to unique and appealing results. Once a preferred image is selected, the speaker describes the process of upscaling it using img2img with a chosen sampler, ensuring to include the number of steps and denoising strength for quality enhancement. The use of 'Tile' in ControlNet and 'Ultimate SD Upscale' with 'R-ESRGAN 4x+' upscaler is highlighted for achieving a 2x magnification of the image.

10:50

🎭 The Role of Samplers and the Impact of Luck in AI Art Generation

The final paragraph reflects on the variability and unpredictability inherent in AI art generation, particularly when using the 'Low CFG Method'. The speaker acknowledges that while the method can produce stunning results, it also means that the outcome is somewhat dependent on luck, as the lower CFGs and steps can lead to less adherence to the original prompts. The paragraph concludes with a humorous note on the role of luck in life and art, suggesting that even with advanced tools and techniques, there is an element of chance that adds to the excitement and creativity of the process. The speaker invites viewers to explore different models and samplers to continue the journey of AI art creation.

Mindmap

Keywords

💡Stable Diffusion

Stable Diffusion is an AI model that generates images from text prompts. It is a type of diffusion model used in deep learning for image generation. In the context of the video, the host discusses how using Stable Diffusion often results in similar-looking faces, indicating that the AI might be converging towards a limited set of styles or features when generating human faces.

💡Low CFG Method

CFG stands for Control Flow Guidance, a parameter in diffusion models that controls the degree of guidance the model uses when generating an image. A 'Low CFG Method' suggests using a lower CFG value to allow for more variability and less predictable outcomes in the generated images, aiming to create unique and diverse faces.

💡Sampler

In the context of AI image generation, a sampler refers to the algorithm used to generate the image based on the input prompts and parameters. Different samplers can produce different styles or qualities of images. The video discusses the importance of choosing the right sampler to achieve the desired outcome, such as generating a wider variety of faces.

💡Steps

Steps in AI image generation refer to the number of iterations the model goes through to refine the image. A lower number of steps might result in a less refined or 'rougher' image, but as discussed in the video, it can also lead to more unique and less common outcomes, such as different kinds of cute faces.

💡CFG Scale

CFG Scale is a parameter that adjusts the strength of the control flow guidance. A lower CFG Scale, as mentioned in the video, can lead to more diversity in the generated images but might also result in images that are less coherent or more abstract. The video suggests using a low CFG Scale to find unusual pretty faces.

💡Bracing Evo Mix

Bracing Evo Mix is mentioned as one of the checkpoints or presets used in Stable Diffusion to generate images. It likely refers to a specific set of parameters that have been found to produce good results. The video suggests that using such presets can lead to overly similar results, and thus the host is exploring alternatives.

💡X/Y/Z plot

An X/Y/Z plot in the context of the video is a method for visualizing and selecting different image outcomes based on varying parameters. The X, Y, and Z axes likely represent different settings or parameters that can be adjusted to explore a range of image variations. The video describes using an X/Y/Z plot to find a pretty face by adjusting the Steps and CFG Scale.

💡Denoising

Denoising is a process in image generation where the model attempts to remove noise or artifacts from the image to produce a cleaner, more refined result. In the video, the host discusses using denoising with a low CFG Scale and number of Steps to improve the quality of the generated images without losing the unique features.

💡Tile and ControlNet

Tile and ControlNet are techniques or tools used in the upscaling process of generated images. 'Tile' likely refers to a method of dividing the image into smaller sections that are processed individually and then reassembled, while 'ControlNet' might be a neural network architecture that helps control the upscaling process. The video mentions using these in conjunction with an extension called 'Ultimate SD Upscale'.

💡Ultimate SD Upscale

Ultimate SD Upscale is an extension or tool mentioned in the video for upscaling images generated by Stable Diffusion. It is used in conjunction with other techniques like Tile and ControlNet to improve the resolution and quality of the final image. The video suggests that this tool is part of the process for creating high-quality, unique images.

Highlights

使用Stable Diffusion绘制时,常常会得到相似的面孔。

本次将使用'低CFG方法'来寻找不寻常的美丽面孔。

尝试寻找不同类型的可爱面孔。

使用Stable Diffusion绘制女孩或姐姐时,总是得到相似的面孔。

降噪是如何工作的?目标是收敛到单一解决方案。

提高图像质量的步骤,容易导致面孔的统一。

通常使用Child Remix、Bracing Evo Mix和BRA等检查点,得到常见的面孔。

任何人都可以用AI绘制出好的作品,但无论谁绘制,看起来都相似。

想要绘制更美丽的作品。

尝试使用采样器的收敛来绘制新面孔。

在Stable Diffusion中绘制美丽图像时,步骤数和CFG比例应该是多少?

选择新的采样器,步骤数约为20到30,CFG比例为7,这是标准设置。

但使用这些标准设置,一切都会看起来相同。

使用旧型采样器、低CFG和低步骤数,我们无法完成图像。

如果按照正常方式进行,图像会看起来很奇怪。

这次,我将使用'低CFG方法',以极低的CFG比例和步骤数进行绘制。

稍后放大时,我们将使用Tile和ControlNet以及扩展Ultimate SD Upscale。

Stable Diffusion Web UI的哪个版本最好?最好有很多采样器,比如v1.7.0。

如果还不熟悉Stable Diffusion,请参考过去的视频学习。

降低CFG比例和步骤数,你会看到正在收敛的图像,对吧?

但如果你仔细观察正在收敛的图像...当面孔从破碎变为正常时...有时会出来可爱的面孔。

需要为这种方法选择一个好的采样器。

DPMP++ 2S a Karras是适合这种方法的好采样器。

首先,用随机种子值确定源图像。

在设置选项卡中,你必须更改图像大小的限制,否则你只能绘制小图像。

使用BracingEvoMix、提示和负面提示,改变采样器,步骤数为10,这是很低的,对吧?

生成图像,尝试绘制大约9张图像。

如果你找到了一个好的,就选择一个。

选择一个基于整体氛围而不是面孔的。

固定种子值,使用X/Y/Z图,设置X类型为步骤,值从4到16,设置Y类型为CFG比例,值从2到4。

这将生成39张图片,并且不要忘记将批次计数设置回1。

步骤4没有收敛,CFG比例2有点模糊。

在CFG比例3中稳定,在CFG比例4中已经很好地收敛。

选择其中一个,现在我们将只生成我们选择的那一个。

将创建的图片发送到img2img,按下这个图标。

选择用于清理的采样器。

使用低CFG方法,图片会太粗糙或模糊。

确保在这里包含步骤数。

启用Tile in ControlNet,选择Ultimate SD upscale从脚本。

目标大小类型,选择从大小缩放图像,放大器是R-ESRGAN 4x+,放大倍数是2x。

嘿,姐姐,我已经完全明白了,让我们尝试其他模型和采样器。

现在我们可以再次绘制漂亮的面孔了!

但有一个弱点。降低CFG和步骤数,意味着它不遵循提示,这意味着我画得太自由了。

即使在这样的领域,一切都归结为运气。生活中的一切都是运气!