AI로 그림만들기! 처음이용자를 위한 기초설명 (그대로 따라하기,무료,stable-diffusion)

뉴럴닌자 - AI공부
23 Jul 202318:34

TLDRThis video script introduces first-time users to Stable Diffusion WebUI, a tool for creating images using AI. It covers model selection, Google Colab setup, and key features like VAE, prompts, sampling, and image quality enhancement. The guide also explains creating and managing batches, using CFG scale for prompt reflection, and adjusting seed values for unique images. Additionally, it touches on high-res fixes and facial enhancement techniques for improved image detail and clarity.


  • 📝 The video is a tutorial for first-time users of Stable Diffusion WebUI, guiding them through the process of creating images using various settings and options.
  • 💻 The process can be executed on Google Colab, eliminating the need for high computer specifications, or on a local machine with appropriate graphics card specifications.
  • 🔍 Users can select from a variety of models, each with different capacities, and even add new models via Google Drive for customized image creation.
  • 🔑 The model, or checkpoint, is crucial as it uses stored data to generate images, and users can choose from saved models or those included by default.
  • 🎨 VAE, the last color model, can be included or excluded from the image generation process, affecting the color quality and overall visual outcome.
  • 📝 Prompts are essential as they describe the desired image to the AI, with positive and negative prompts dictating what should and should not be included in the final image.
  • 🔄 Sampling is the algorithm used to create images from noise, with different methods like Euler A, DPM-Karras, and DDIM offering varying levels of detail and speed.
  • 📐 The size of the generated image is important, with SD1.5 models commonly used for their 512-pixel training, ensuring proper aspect ratios and image quality.
  • 🔢 The batch count and size determine how many images are generated at once, with considerations for VRAM usage and creation speed.
  • 🔍 CFG scale adjusts how strongly the prompt influences the image, with higher values increasing the likelihood of desired elements appearing, but also potentially distorting the image.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is to teach the basics for first-time users of Stable Diffusion WebUI.

  • Why does the computer specification not matter in this process?

    -The computer specification does not matter because the process will be executed using Google Colab, which allows users to run the process without needing powerful hardware on their local machines.

  • How can users install and use an alternative to the Colab environment?

    -If users have computer graphics card specifications, they can install and use it instead of the Colab environment.

  • What is the purpose of selecting a model in Stable Diffusion WebUI?

    -The purpose of selecting a model is to choose the specific AI model that will be used to create the image. Different models have different capacities and can affect the overall image shape.

  • How can users add models through Google Drive?

    -Users can add models through Google Drive by saving the model to their Google Drive and then selecting it from there for use in the WebUI.

  • What is a VAE in the context of the video?

    -VAE stands for Variational Autoencoder, which is the last color model used in the process. It can be included in checkpoints and is usually distributed without being included by default.

  • What are positive and negative prompts in Stable Diffusion WebUI?

    -Positive prompts are descriptions of what should be in the image, while negative prompts are descriptions of what should not be in the image. They help guide the AI in creating the desired content.

  • What is the role of the sampling method in image creation?

    -The sampling method is an algorithm that creates an image from noise. It determines how the initial noise values are sampled step by step to complete the image, with different methods showing different levels of detail and speed.

  • Why is the step number important in the sampling process?

    -The step number refers to the number of times the model samples the image. More steps generally result in more detailed images, but setting too high a number can lead to a deterioration in quality.

  • What is the function of the CFG scale?

    -The CFG scale is a value that indicates how much to apply the prompt. A higher value strongly reflects the prompt content, while a lower value results in a weaker reflection, potentially ignoring or underestimating the prompt words.

  • What is the purpose of the seed value in image generation?

    -The seed value is used for generating the initial noise value. It determines the starting point for the image creation process. Entering the same seed value will consistently produce the same image, while using different values or random values can result in varied outcomes.

  • How can users enhance the quality of images using the high-res fix feature?

    -The high-res fix feature is an essential option for improving image quality. It allows users to increase the size of the image and enhance the detail, resulting in a higher resolution and more detailed output.

  • What is the difference between using Latent and ESRGAN series for upscaling images?

    -Latent and ESRGAN series are different upscaling methods. Latent requires a high denoising value to increase detail and can result in a more blurred image if the denoising is low. In contrast, ESRGAN series can be set to lower denoising values, such as 0.4, and still increase detail without significantly altering the original image.



🌟 Introduction to Stable Diffusion WebUI

This paragraph introduces viewers to the basics of using Stable Diffusion WebUI for first-time users. It emphasizes the importance of understanding the values set during image creation and provides a step-by-step guide on how to access and utilize the platform, which includes using Google Colab and the potential for installing alternative environments with specific graphics card specifications. The paragraph also covers model selection, the use of Google Drive for model storage, and the process of running Colab. It concludes with a brief explanation of the model, VAE, and the prompt system, highlighting the significance of these elements in creating images with desired qualities.


📸 Image Creation Process and Sampling

This segment delves into the technical aspects of image creation using Stable Diffusion WebUI. It explains the concept of sampling, an algorithm that transforms noise into an image, and the various sampling methods available, such as Euler A, DPM-Karras, and DDIM. The importance of the number of steps in sampling for achieving detail is discussed, as well as the impact of model size on image quality. The paragraph also touches on the use of prompts in different forms, the significance of batch count and size, and the role of the CFG scale in reflecting prompt content. It concludes with a discussion on the effects of high CFG values and the use of specific prompts to enhance image details.


🔄 Understanding Seed Values and Variations

This paragraph focuses on the role of seed values in the image generation process. It explains how seed values determine the initial noise values from which the image is sampled. The paragraph discusses the implications of using default and custom seed values, including the consistency of image output and the use of the dice and recycling icons for randomization. Additionally, it introduces the concept of 'Extra' for creating slightly varied images and the 'high-res fix' feature for improving image quality. The paragraph also explores the impact of denoising strength and the Hi-Res Step setting on image detail and the importance of balancing these parameters for optimal results.


🎨 Enhancing Image Quality and Details

The final paragraph discusses advanced techniques for enhancing the quality and detail of images created with Stable Diffusion WebUI. It introduces the concept of Latent, an upscaler that requires a high denoising value for effective image enhancement. The paragraph compares the use of Latent with the ESRGAN series, highlighting the latter's ability to increase detail with lower denoising values. It also presents a method for improving facial clarity through the use of 'inpainting' and 'DDetailer', which are extensions designed to redraw faces at a higher resolution. The paragraph concludes with a brief overview of the key points covered in the video and an expression of hope that the information provided is helpful to viewers.



💡Stable Diffusion WebUI

Stable Diffusion WebUI is a user interface designed for utilizing the Stable Diffusion model, which is an AI-based system for image generation. The video script discusses the basics of using this interface for first-time users, including how to execute processes and create images. It is the central platform through which users interact with the AI to generate content, as mentioned in the script when it talks about selecting models and adjusting settings for image creation.

💡Google Colab

Google Colab is a cloud-based platform that allows users to run Python programs and access AI models without the need for high-end computer specifications. In the context of the video, it is mentioned as the environment where the Stable Diffusion WebUI process will be executed, emphasizing that the user's computer specifications are irrelevant as long as they can access Colab.

💡Model Selection

Model selection refers to the process of choosing a specific AI model or 'checkpoint' from a list to generate images. The video script explains that there are various models with different capacities, and users can select and execute one based on their preferences. Additionally, it mentions the possibility of adding models through Google Drive, which is crucial for users who want to experiment with different AI models to achieve desired image outcomes.


VAE, or Variational Autoencoder, is described in the script as the last color model that can be included in checkpoints. It is usually distributed without VAE, but its inclusion can affect the quality of the generated images. The VAE setting is important because it can influence whether the colors in the generated images appear faded or not, as mentioned in the context of the video when discussing the prompt and image quality.


A prompt is a form of language input that describes the image a user wants the AI to create. It is a critical component in the image generation process as it guides the AI in understanding and producing the desired content. The script explains that prompts can be positive or negative, specifying what should or should not be included in the image. The effectiveness of a prompt is tied to its clarity and specificity, which helps the AI generate images that closely match the user's intent.


Sampling is an algorithmic process mentioned in the script that creates an image from noise. It involves iteratively refining the image by removing noise, step by step, until the final image emerges. Different sampling methods like Euler A, DPM-Karras, or DDIM are referenced, each offering varying speeds and levels of detail in the generated images. The number of steps or iterations in the sampling process can affect the quality and detail of the final image.

💡CFG Scale

CFG Scale, or Control Flow Graph Scale, is a value that determines how much the AI's generated image will be influenced by the input prompt. A higher CFG Scale means the prompt's content will be more strongly reflected in the image, while a lower value results in a weaker reflection. The script advises that typical values range from 4 to 9 without problems, but extreme values can lead to distorted images or a lack of detail.

💡Seed Value

The seed value is a starting point for the random number generation process used in creating images. It is crucial because it determines the initial noise value from which the image is sampled. The script explains that entering -1 allows the WebUI to automatically input a random seed value, while entering a specific number like 123 will produce a consistent image each time. The seed value is a key element in ensuring reproducibility or variability in image generation.

💡High-Res Fix

High-Res Fix is a feature that enhances the quality and detail of the generated images by increasing their size. The script describes it as an essential option for improving image quality, where the denoising strength can affect how much detail is retained or added to the image. The High-Res Step and upscale values are components of this feature, with the former determining the number of times the image is enlarged and the latter setting how many times the image is magnified.

💡Latent Upscale

Latent Upscale is a method mentioned in the script for enlarging images. It is characterized by the need for a high denoising value to maintain image quality during the upscaling process. The script suggests that while this method can significantly increase detail, it may also result in a blurrier image if not properly adjusted. It is one of the options users have for refining the resolution and detail of their AI-generated images.


Inpaint is a technique used to modify or enhance specific parts of an image, as mentioned in the context of fixing facial features. The script explains that it can be used to redraw faces at a size of 512px, which helps to make the facial features clearer and more defined. This tool is particularly useful for addressing common issues where smaller facial elements might not be rendered accurately in the generated images.


DDetailer is an extension tool mentioned in the script that simplifies the process of enhancing image details. It is particularly useful for tasks like inpainting, where it can automatically redraw faces at a 512px size, improving clarity without requiring manual adjustments. The script suggests that DDetailer can make the process of refining images more efficient and effective for users.


Introduction to Stable Diffusion WebUI for first-time users.

Explaining the values set when creating an image one by one.

Execution of the process using Google Colab, eliminating the need for high computer specifications.

Option to install and use the software with computer graphics card specifications instead of Colab.

Selecting a model and understanding the model capacity.

Adding models through Google Drive for convenience.

The impact of the chosen model on the overall image shape.

Running Colab by pressing the blue button and understanding the different versions available.

Google Drive integration for immediate saving of created images or using saved models.

Explanation of the model, also known as a checkpoint, used by AI to create images.

Understanding VAE, the last color model, and its inclusion in checkpoints.

Utilizing prompts to express the desired image for the AI to create.

The concept of positive and negative prompts and their effects on the final image.

Entering prompts related to image quality to improve the output.

Creating an image by pressing the Generate button and understanding the potential color issues.

Setting up VAE for improved image quality and understanding the role of sampling in image creation.

Explaining the importance of size in image creation and the common use of SD1.5 models.

Using word combinations in the prompt for efficient image creation.

Creating multiple images by setting the layout and understanding the limits of batch sizes.

The role of CFG scale in reflecting the prompt content and its impact on image clarity.

Understanding the seed value and its influence on generating unique images.

Utilizing extras for additional variation in image creation without significant changes.

High-res fix as an essential feature for improving image quality and detail.

The use of different upscaling values and their effect on the final image quality.

Enhancing face details using inpainting and DDetailer for improved facial clarity.

Conclusion and hope for the video's helpfulness for users of Stable Diffusion WebUI.