Kasucast #7 - Using stable diffusion and textual inversion to create stylized character concept art

kasukanra
8 Oct 202235:56

TLDRIn this Kasucast episode, the creator explores the use of stable diffusion and textual inversion for generating stylized character concept art. They demonstrate the process of training a new embedding with a blend of waifu diffusion 1.2 and stable diffusion 1.4, using a curated dataset of fantasy art influences. The video showcases live training, image pre-processing, and the application of various prompts and settings to achieve a desired character design. The creator also discusses the use of loopback to refine images and incorporates post-processing techniques in Photoshop to finalize the artwork, offering viewers a comprehensive guide to leveraging AI in character design.

Takeaways

  • 🎨 The video demonstrates how to use stable diffusion for creating stylized character concept art.
  • 🔍 The presenter has been experimenting with stable diffusion and shares finished pieces to gauge viewer interest.
  • 🆕 Automatic1111 repository has updated with a textual inversion tab, streamlining the process.
  • 🌐 The presenter trains a new embedding using a combination of waifu diffusion 1.2 and stable diffusion 1.4.
  • 👩‍🎨 The training data includes a curated set of images from various artists and styles that the presenter admires.
  • 📝 The presenter details the process of creating a custom checkpoint by merging different models in Automatic's web UI.
  • 🖌️ Textual inversion is used to train the model with specific style preferences, such as 'my favorite fantasy artists'.
  • 🎭 The video includes a live demonstration of the training process and the generation of character artwork.
  • 🖥️ Photoshop is utilized to refine the generated images, adjusting details like lighting and facial features.
  • 🔧 The presenter discusses the use of settings like loopback, CFG scale, and denoising strength to control image generation.
  • 🎨 Final image adjustments include using filters and plugins to add painterly effects and focus on key image areas.

Q & A

  • What is the main focus of Kasucast #7?

    -The main focus of Kasucast #7 is using stable diffusion and textual inversion to create stylized character concept art.

  • What update to the automatic 1111 repository is mentioned in the video?

    -The automatic 1111 repository was updated with the textual inversion tab, which was previously not integrated.

  • Which models does the presenter train the embedding on?

    -The presenter trains the embedding on a combined version of waifu diffusion 1.2 and stable diffusion 1.4.

  • What is the purpose of creating a textual inversion embedding?

    -The purpose of creating a textual inversion embedding is to capture the style of favorite fantasy artists to use in generating character artwork.

  • What is the role of the initialization text in the process?

    -The initialization text, such as 'female character' and 'fantasy character design', helps guide the style of the generated artwork.

  • What is the significance of the data set used for training?

    -The data set used for training includes art from various artists that the presenter admires, which influences the style of the generated character art.

  • How does the presenter create a custom checkpoint or model?

    -The presenter creates a custom checkpoint or model by using the checkpoint merger tool in automatic's web UI, merging standard diffusion 1.4 and waifu diffusion 1.3 with an interpolation amount of 0.5.

  • What is the purpose of the prompt template file in the training phase?

    -The prompt template file assists the textual inversion training phase by directing the style, such as 'a portrait in this style', which helps the model understand the desired artistic direction.

  • What is the significance of the loopback setting in the generation process?

    -The loopback setting takes the output of one image generation and feeds it back into the image-to-image process with the same prompts, allowing for iterative refinement of the image.

  • How does the presenter refine the generated images in Photoshop?

    -The presenter refines the generated images by using tools like the spot healing brush, liquify tool, and hue/saturation adjustments to correct imperfections and enhance details.

  • What final touches does the presenter apply to the image to complete the artwork?

    -The presenter applies final touches such as high pass filtering for sharpening, Gaussian blur for lighting effects, camera raw filter for color adjustments, and an artistic plugin to add painterliness to the image.

Outlines

00:00

🎨 Introduction to Character Artwork with Stable Diffusion

The speaker begins by greeting the audience and referencing their previous video where they mentioned experimenting with stable diffusion for character artwork. They express their intention to showcase some finished pieces to gauge audience interest in the style. The speaker also notes an update to the automatic 1111 repository, specifically the addition of a textual inversion tab. They recount their experience with textual inversion using a different repository and explain their process of creating a new embedding within the automatic repository. The speaker details their training approach, combining waifu diffusion 1.2 with stable diffusion 1.4, and shares the initialization text used. They also discuss the data set they prepared, which includes art from various artists they admire, such as Krenz Cushart, Choco Fing R, and others, highlighting the diversity and inspiration behind their selection.

05:00

🖥️ Setting Up Textual Inversion Embedding

The speaker continues by demonstrating the process of setting up a textual inversion embedding. They guide the audience through creating a new embedding, selecting a learning rate, and specifying the data set directory. The speaker emphasizes the importance of the prompt template file in directing the style of the artwork during the training phase. They also discuss their choice to use a combined model of waifu diffusion and stable diffusion, explaining their reasoning and the process of creating this custom model. The speaker provides a real-time update on the training progress, sharing the loss values and their thoughts on the significance of these values. They also touch upon the debate around the number of training steps and images, referencing both academic recommendations and anecdotal evidence from other users.

10:02

🎭 Applying the Trained Embedding to Artwork

The speaker moves on to the application of the trained embedding to their artwork. They describe the process of selecting the embedding and entering prompts to guide the style of the generated art. The speaker shares their approach to image editing, including adjustments to sampling steps, denoising, and CFG settings. They also discuss the importance of the prompt structure, particularly the use of arrows and the inclusion of the artist's style in the prompt. The speaker provides examples of prompts they've used, such as 'oil painting of a beautiful female Sage in Alexander McQueen fashion,' and explains the reasoning behind the choice of words. They also mention the use of negative prompts to exclude unwanted elements from the generated images.

15:03

🖌️ Refining Generated Art with Photoshop

The speaker delves into the post-generation editing process using Photoshop. They discuss the use of the Hue/Saturation adjustment layer to correct color saturation and the spot healing brush to remove unwanted elements from the image. The speaker also demonstrates the use of the liquify tool to adjust facial features and the mixer brush for cleaning up areas of the image. They explain their approach to refining the artwork, focusing on details such as the character's ear and the clothing's texture. The speaker also shares their workflow for using the loopback feature, which allows them to iteratively improve the image by feeding the output back into the generation process.

20:05

🖼️ Final Touches and Adjustments in Photoshop

The speaker concludes the video by detailing the final stages of image editing. They discuss the use of the high pass filter to sharpen the image and the Gaussian blur to create atmospheric effects. The speaker also demonstrates the use of the camera raw filter for color adjustments and the curves tool for enhancing the image's lighting and focus. They explain their technique of painting in darkness on non-focal areas to draw attention to the character's face. Lastly, the speaker shares their method of adding painterly effects to the image using a plugin, ensuring that the face remains polished while other areas take on a more traditional art style. The speaker wraps up by expressing hope that the workflow shared will be helpful to the audience.

Mindmap

Keywords

💡Stable Diffusion

Stable Diffusion is a type of deep learning model used in the field of AI-generated art. It is designed to create images from text descriptions. In the context of the video, the creator uses Stable Diffusion to generate character artwork with a specific style, indicating its capability to capture and reproduce artistic nuances.

💡Textual Inversion

Textual Inversion is a technique used to train AI models on specific styles or subjects by providing them with a dataset of examples. The video creator mentions using Textual Inversion with an updated repository to train the AI on their favorite fantasy artists' styles, showcasing how it can be used to customize the output of AI-generated art.

💡Embedding

In machine learning, an 'embedding' is a learned numerical representation of data, such as text or images, that captures relevant features. The video discusses creating a new embedding to represent the style of fantasy artists, which is then used to guide the AI in generating artwork that matches this style.

💡Waifu Diffusion

Waifu Diffusion is a variant of the Stable Diffusion model, often used for generating anime-style images. The script mentions combining Waifu Diffusion with Stable Diffusion to create a unique model that can produce character designs with a blend of styles.

💡Prompt

A 'prompt' in AI art generation is the text description that guides the model to create a specific image. The video creator uses prompts like 'oil painting of a beautiful female Sage in Alexander McQueen fashion' to direct the AI to produce artwork with particular characteristics.

💡Sampling Steps

Sampling steps refer to the number of iterations the AI model goes through to refine the generated image. The creator sets a high number of sampling steps to ensure the AI has ample opportunity to process the prompt and produce a detailed image.

💡Denoising

Denoising in the context of AI image generation is the process of reducing noise or artifacts in the generated image to produce a cleaner, more polished result. The video mentions adjusting the denoising level to control the extent of changes made to the image during generation.

💡Face Restoration

Face Restoration is a feature that improves the quality and realism of faces in AI-generated images. The script describes enabling face restoration to ensure the generated characters have more accurate and detailed facial features.

💡Loopback

Loopback is a process mentioned in the video where the output image from one generation is used as input for the next, creating a chain of iterative improvements. This technique helps in refining the artwork by building upon the previous generation's result.

💡Negative Prompt

A negative prompt is a feature in AI art generation that allows the creator to specify elements they wish to exclude from the generated image. The video creator uses negative prompts to avoid unwanted features in the final artwork, ensuring the output aligns with their vision.

💡Artistic Stylization

Artistic stylization refers to the process of giving the AI-generated images a specific artistic style. The video demonstrates how the creator uses various techniques and settings to achieve a stylized look that reflects their preferences and the styles of their favorite artists.

Highlights

Introduction to using stable diffusion for character artwork

Demonstration of finished character art pieces using stable diffusion

Update on the automatic 1111 repository with the textual inversion tab

Explanation of the process to create a new embedding for fantasy artists

Training the embedding on a combined model of waifu diffusion 1.2 and stable diffusion 1.4

Selection of initialization text for character design

Preparation of a custom dataset with various fantasy art styles

Inclusion of art from favorite artists like Krenz Cushart and Choco Fing R

Description of the pre-processing steps for training images

Setting up the training and embedding parameters in the automatic repository

Creation of a merged model or checkpoint using the checkpoint merger tool

Progress update on training the textual inversion after 2 hours and 48 minutes

Utilization of the trained embedding to generate character art with specific styles

Adjustment of settings like sampling steps, CFG scale, and denoising strength for image generation

Explanation of the loopback feature and its impact on image generation

Post-processing of generated images in Photoshop for refinement

Final touches and adjustments to the character art using various Photoshop techniques

Completion of the character artwork and summary of the workflow