Kasucast #7 - Using stable diffusion and textual inversion to create stylized character concept art
TLDRIn this Kasucast episode, the creator explores the use of stable diffusion and textual inversion for generating stylized character concept art. They demonstrate the process of training a new embedding with a blend of waifu diffusion 1.2 and stable diffusion 1.4, using a curated dataset of fantasy art influences. The video showcases live training, image pre-processing, and the application of various prompts and settings to achieve a desired character design. The creator also discusses the use of loopback to refine images and incorporates post-processing techniques in Photoshop to finalize the artwork, offering viewers a comprehensive guide to leveraging AI in character design.
Takeaways
- 🎨 The video demonstrates how to use stable diffusion for creating stylized character concept art.
- 🔍 The presenter has been experimenting with stable diffusion and shares finished pieces to gauge viewer interest.
- 🆕 Automatic1111 repository has updated with a textual inversion tab, streamlining the process.
- 🌐 The presenter trains a new embedding using a combination of waifu diffusion 1.2 and stable diffusion 1.4.
- 👩🎨 The training data includes a curated set of images from various artists and styles that the presenter admires.
- 📝 The presenter details the process of creating a custom checkpoint by merging different models in Automatic's web UI.
- 🖌️ Textual inversion is used to train the model with specific style preferences, such as 'my favorite fantasy artists'.
- 🎭 The video includes a live demonstration of the training process and the generation of character artwork.
- 🖥️ Photoshop is utilized to refine the generated images, adjusting details like lighting and facial features.
- 🔧 The presenter discusses the use of settings like loopback, CFG scale, and denoising strength to control image generation.
- 🎨 Final image adjustments include using filters and plugins to add painterly effects and focus on key image areas.
Q & A
What is the main focus of Kasucast #7?
-The main focus of Kasucast #7 is using stable diffusion and textual inversion to create stylized character concept art.
What update to the automatic 1111 repository is mentioned in the video?
-The automatic 1111 repository was updated with the textual inversion tab, which was previously not integrated.
Which models does the presenter train the embedding on?
-The presenter trains the embedding on a combined version of waifu diffusion 1.2 and stable diffusion 1.4.
What is the purpose of creating a textual inversion embedding?
-The purpose of creating a textual inversion embedding is to capture the style of favorite fantasy artists to use in generating character artwork.
What is the role of the initialization text in the process?
-The initialization text, such as 'female character' and 'fantasy character design', helps guide the style of the generated artwork.
What is the significance of the data set used for training?
-The data set used for training includes art from various artists that the presenter admires, which influences the style of the generated character art.
How does the presenter create a custom checkpoint or model?
-The presenter creates a custom checkpoint or model by using the checkpoint merger tool in automatic's web UI, merging standard diffusion 1.4 and waifu diffusion 1.3 with an interpolation amount of 0.5.
What is the purpose of the prompt template file in the training phase?
-The prompt template file assists the textual inversion training phase by directing the style, such as 'a portrait in this style', which helps the model understand the desired artistic direction.
What is the significance of the loopback setting in the generation process?
-The loopback setting takes the output of one image generation and feeds it back into the image-to-image process with the same prompts, allowing for iterative refinement of the image.
How does the presenter refine the generated images in Photoshop?
-The presenter refines the generated images by using tools like the spot healing brush, liquify tool, and hue/saturation adjustments to correct imperfections and enhance details.
What final touches does the presenter apply to the image to complete the artwork?
-The presenter applies final touches such as high pass filtering for sharpening, Gaussian blur for lighting effects, camera raw filter for color adjustments, and an artistic plugin to add painterliness to the image.
Outlines
🎨 Introduction to Character Artwork with Stable Diffusion
The speaker begins by greeting the audience and referencing their previous video where they mentioned experimenting with stable diffusion for character artwork. They express their intention to showcase some finished pieces to gauge audience interest in the style. The speaker also notes an update to the automatic 1111 repository, specifically the addition of a textual inversion tab. They recount their experience with textual inversion using a different repository and explain their process of creating a new embedding within the automatic repository. The speaker details their training approach, combining waifu diffusion 1.2 with stable diffusion 1.4, and shares the initialization text used. They also discuss the data set they prepared, which includes art from various artists they admire, such as Krenz Cushart, Choco Fing R, and others, highlighting the diversity and inspiration behind their selection.
🖥️ Setting Up Textual Inversion Embedding
The speaker continues by demonstrating the process of setting up a textual inversion embedding. They guide the audience through creating a new embedding, selecting a learning rate, and specifying the data set directory. The speaker emphasizes the importance of the prompt template file in directing the style of the artwork during the training phase. They also discuss their choice to use a combined model of waifu diffusion and stable diffusion, explaining their reasoning and the process of creating this custom model. The speaker provides a real-time update on the training progress, sharing the loss values and their thoughts on the significance of these values. They also touch upon the debate around the number of training steps and images, referencing both academic recommendations and anecdotal evidence from other users.
🎭 Applying the Trained Embedding to Artwork
The speaker moves on to the application of the trained embedding to their artwork. They describe the process of selecting the embedding and entering prompts to guide the style of the generated art. The speaker shares their approach to image editing, including adjustments to sampling steps, denoising, and CFG settings. They also discuss the importance of the prompt structure, particularly the use of arrows and the inclusion of the artist's style in the prompt. The speaker provides examples of prompts they've used, such as 'oil painting of a beautiful female Sage in Alexander McQueen fashion,' and explains the reasoning behind the choice of words. They also mention the use of negative prompts to exclude unwanted elements from the generated images.
🖌️ Refining Generated Art with Photoshop
The speaker delves into the post-generation editing process using Photoshop. They discuss the use of the Hue/Saturation adjustment layer to correct color saturation and the spot healing brush to remove unwanted elements from the image. The speaker also demonstrates the use of the liquify tool to adjust facial features and the mixer brush for cleaning up areas of the image. They explain their approach to refining the artwork, focusing on details such as the character's ear and the clothing's texture. The speaker also shares their workflow for using the loopback feature, which allows them to iteratively improve the image by feeding the output back into the generation process.
🖼️ Final Touches and Adjustments in Photoshop
The speaker concludes the video by detailing the final stages of image editing. They discuss the use of the high pass filter to sharpen the image and the Gaussian blur to create atmospheric effects. The speaker also demonstrates the use of the camera raw filter for color adjustments and the curves tool for enhancing the image's lighting and focus. They explain their technique of painting in darkness on non-focal areas to draw attention to the character's face. Lastly, the speaker shares their method of adding painterly effects to the image using a plugin, ensuring that the face remains polished while other areas take on a more traditional art style. The speaker wraps up by expressing hope that the workflow shared will be helpful to the audience.
Mindmap
Keywords
💡Stable Diffusion
💡Textual Inversion
💡Embedding
💡Waifu Diffusion
💡Prompt
💡Sampling Steps
💡Denoising
💡Face Restoration
💡Loopback
💡Negative Prompt
💡Artistic Stylization
Highlights
Introduction to using stable diffusion for character artwork
Demonstration of finished character art pieces using stable diffusion
Update on the automatic 1111 repository with the textual inversion tab
Explanation of the process to create a new embedding for fantasy artists
Training the embedding on a combined model of waifu diffusion 1.2 and stable diffusion 1.4
Selection of initialization text for character design
Preparation of a custom dataset with various fantasy art styles
Inclusion of art from favorite artists like Krenz Cushart and Choco Fing R
Description of the pre-processing steps for training images
Setting up the training and embedding parameters in the automatic repository
Creation of a merged model or checkpoint using the checkpoint merger tool
Progress update on training the textual inversion after 2 hours and 48 minutes
Utilization of the trained embedding to generate character art with specific styles
Adjustment of settings like sampling steps, CFG scale, and denoising strength for image generation
Explanation of the loopback feature and its impact on image generation
Post-processing of generated images in Photoshop for refinement
Final touches and adjustments to the character art using various Photoshop techniques
Completion of the character artwork and summary of the workflow