【Stable Diffusion】2次画像を3次画像に変換する最も優れた方法【AIコスプレ】

30 Jun 202307:11

TLDRThis video script from Diffusion Labo channel outlines a method for converting 2D images into realistic 3D models, essential for AI cosplay. It details a process involving multiple stages, including model transformation and upscaling, and emphasizes the use of extensions like ADetailer and ControlNet for enhancing image quality and detail. The technique leverages the strengths of both 2D and 3D models to achieve a high-quality, realistic rendering, suitable for various applications.


  • 🎨 The script outlines a method for converting 2D images into realistic 3D images, improving upon a previously demonstrated technique.
  • 👗 In AI cosplay, accurately reproducing character costumes is crucial, and the process aims to prevent image breakdown when applying 2D rollers to realistic models.
  • 🔍 The use of LoRA (Low-Rank Adaptation) is optimized by reducing its application to preserve the original model's details.
  • 🖼️ The process involves generating an image with a 2D model, transferring it to a realistic model for processing, and enhancing image quality.
  • 📂 Necessary extensions for the process include ADetailer and an upscaler, which should be prepared and restarted after installation.
  • 🔧 The processing procedure consists of original image generation, model transformation, and a two-step upscaling process.
  • 🎲 For the initial stage, a 2D model is selected for its LoRA capabilities, with specific prompts and negative prompts, and a clip skip setting of 1.
  • 🌟 ADetailer is used to automatically recognize and enhance faces and hands, improving the beauty of the generated image.
  • 🔄 ControlNet is employed to control the model transformation, with settings adjusted for optimal results.
  • ✨ The final step involves upscaling the image with an extension like Ultimates SD, using specific upscalers to achieve a more realistic human skin texture.
  • 📈 The technique can be applied to any object if the 2D model generates a subject with a human-like ratio, making it versatile for various AI cosplay scenarios.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is the process of converting a two-dimensional image into a realistic three-dimensional image using various AI tools and techniques.

  • Why is it important to optimize the processing procedure in AI cosplay?

    -Optimizing the processing procedure in AI cosplay is important to accurately reproduce the character's costume and other details, preventing image breakdown that can occur with direct application of two-dimensional rollers on realistic models.

  • What is LoRA and how does it affect the image details?

    -LoRA is a technique used in AI image processing. Applying less LoRA helps to retain more details from the original model, as more LoRA can lead to loss of character details.

  • What are the necessary extensions and upscalers mentioned in the script?

    -The necessary extensions and upscalers mentioned include ADetailer, ControlNet, and Yon-X Nickel Back FS for the initial stages, and Ultimates SD and Yon-X Ultra Sharp for upscaling.

  • How does the video-to-image process work in the described procedure?

    -The video-to-image process involves generating an image with a two-dimensional model that is good at LoRA, then transferring it to a realistic model for final processing, which includes image quality enhancement.

  • What is the role of ADetailer in the process?

    -ADetailer plays a crucial role by automatically recognizing and rewriting faces and hands from the generated image, enhancing their detail and making them look more realistic.

  • How does the ControlNet extension contribute to the process?

    -ControlNet is used to control the model transformation and upscaling process, ensuring that the image maintains a realistic look and that the details from the original image are accurately rendered in three dimensions.

  • What is the significance of using different upscalers for the final image?

    -Different upscalers are used to achieve varying textures and levels of realism in the final image. They help convert doll-like textures to realistic human skin textures and enhance the overall quality of the image.

  • How does the script ensure the face looks realistic in the final image?

    -The script ensures the face looks realistic by using ADetailer to recognize and rewrite facial features, followed by upscaling with specific upscalers that are good at enhancing skin textures.

  • What is the advice given for using this technique for different objects?

    -The advice given is that if the two-dimensional model generates a subject with a ratio close to that of a real human being, this conversion technique can be used for any object, and it may sometimes be cleaner to apply ADetailer first before doing the model transformation.

  • How can viewers stay updated with the latest techniques and information?

    -Viewers are encouraged to continue subscribing to the channel, click the high rating button, and leave comments with any information they have. The channel will introduce any changes in information or processing techniques in future videos.



🎨 Optimizing 2D to 3D Image Conversion Process

This paragraph introduces the speaker from the Diffusion Labo channel and their intent to explain an optimized process for converting two-dimensional images into realistic three-dimensional images. The necessity of this process in AI cosplay is emphasized, where accurately reproducing a character's costume is crucial. The speaker highlights the risk of image breakdown when applying a two-dimensional roller directly to a realistic model and suggests a method to retain the model's original details by reducing LoRA application. The process involves generating an image with a two-dimensional model, transferring it to a realistic model for final processing, and enhancing image quality. The speaker also guides through preparing necessary extensions like ADetailer and an upscaler, and provides a step-by-step walkthrough of the processing procedure, including model transformation and upscaling, with specific settings for achieving the best results.


🖌️ Enhancing Realistic Texture in AI-Generated Skin

The second paragraph delves into the specifics of making the skin texture in AI-generated images more realistic. It discusses the use of different upscalers to achieve a more human-like texture and the importance of the initial model choice for the quality of the final output. The paragraph explains the process of applying ADetailer first for cleaner results, especially when dealing with early LoRA creations or recently created ones. The speaker shares their experience with the base processing content and how the available extensions at the time of conversion can significantly improve the quality. The paragraph concludes with a call to action for viewers to share information about other upscalers with different textures and to stay updated with the channel for the latest information and techniques.



💡Diffusion Labo channel

The Diffusion Labo channel is the source of the video script, indicating a platform or community focused on diffusion processes, likely in the context of image processing or AI-generated content. It is central to the video's theme as it sets the context for the discussion on converting 2D images to 3D.

💡Realistic three-dimensional image

A realistic three-dimensional image refers to a computer-generated image that appears lifelike and three-dimensional, mimicking the depth and textures of real-world objects or scenes. The video's main theme revolves around the process of transforming 2D images into such 3D images, which is crucial in various applications like AI cosplay.

💡Image-to-image processing

Image-to-image processing is a technique used in AI and machine learning to transform input images into different forms or styles, often to enhance or alter their appearance. In the context of the video, it is a key step in the conversion process from 2D to 3D images, where the initial 2D model is used to generate an image that is then further processed.


LoRA, or Low-Rank Adaptation, is a method used in AI models to modify and adapt the model's behavior without significantly changing its underlying structure. In the video, reducing the amount of LoRA is discussed as a way to preserve the original details of the model, which is essential when converting 2D images into detailed 3D images.

💡AI cosplay

AI cosplay refers to the use of artificial intelligence to create or enhance costumes and characters from various media, such as anime, video games, or movies. The video focuses on the technical aspects of generating realistic AI cosplay images, which is important for accurately representing the character's appearance.

💡Extensions and upscalers

Extensions and upscalers are additional software tools or modules that enhance the capabilities of a primary software or AI model. In the context of the video, these tools are used to improve the quality and detail of the 3D images generated from 2D images, optimizing the conversion process.


ADetailer is a specific extension mentioned in the script that is used to enhance the details in generated images, particularly focusing on facial and hand recognition and improvement. It is an essential tool in the process of converting 2D images to 3D by automatically refining the details of faces and hands.


ControlNet is an extension used in the process of converting 2D images to 3D, which allows for more precise control over the generation process. It is used to specify certain parameters and ensure that the final image aligns with the desired output, such as the model and tiling settings.

💡Ultimates SD and Yon-X Nickel Back FS

Ultimates SD and Yon-X Nickel Back FS are upscalers mentioned in the script, which are tools used to increase the resolution and quality of images. These upscalers are particularly adept at converting textures from a doll-like appearance to a more realistic human skin texture, enhancing the realism of the 3D images.

💡Noise reduction

Noise reduction is a process in image processing that aims to minimize or eliminate unwanted visual artifacts or 'noise' in an image. In the context of the video, noise reduction is a critical step in refining the 3D image, ensuring that the final output is clear and free from distortions.

💡Model transformation

Model transformation refers to the process of altering or adapting a base model to achieve a specific outcome. In the video, this involves changing the 2D model to a realistic 3D model, which is a central part of the conversion process from two-dimensional images to three-dimensional ones.


Introduction to converting a two-dimensional image into a realistic three-dimensional image.

Optimization of the processing procedure with the use of new extensions and upscalers.

Importance of accurate costume reproduction in AI cosplay using 3D images.

Use of LoRA to preserve details while transforming images.

Combining two-dimensional modeling with realistic model processing for quality enhancement.

Overview of necessary tools including ADetailer and upscaler extensions.

Procedure for model transformation and two-step upscaling.

Use of ADetailer for enhanced detail in facial and hand regions.

Application of ControlNet for more precise control over image conversion.

The process of transforming a two-dimensional image into a three-dimensional one.

Integration of Ultimates SD for upscaling and texture improvement.

Adjustment of noise levels and scale settings for optimal image quality.

Generation of high-quality images with realistic texture.

Potential application of this conversion technique to any object resembling human proportions.

Exploration of different upscalers for varying textures in future work.

Invitation for feedback and updates on new processing techniques.