Stable Diffusion Fix Hands Without ControlNet and Inpainting (Easy) | SDXL FREE! (Automatic1111)

Xclbr Xtra
23 Apr 202406:50

TLDRIn this video, the presenter demonstrates a simple method to generate realistic hands using the Real V SDXL model without resorting to complex techniques like control net or inpaintings. The process involves using two models: one for the initial generation and another for upscaling and enhancement. The video provides detailed settings for both models, including the use of mid-journey mimic, sampling steps, and CFG scale. The presenter also suggests using negative prompts to avoid common issues like blurry images or incorrect hand anatomy. The final result is a more aesthetic and realistic image, suitable for most use cases that do not require professional-level detail.


  • ๐ŸŽจ **Using SDXL Model**: The Real V SDXL model is capable of generating decent hands but may not feel as realistic as desired.
  • ๐Ÿšซ **No ControlNet or Inpainting**: The process does not require the use of ControlNet or inpaint techniques.
  • ๐Ÿ‘Œ **Simple Hand Creation**: The method is designed for simple hand creation, not for complex hand poses.
  • ๐Ÿ“ˆ **Upscaling Process**: The generated image is upscaled later, and the detailer is used during this stage.
  • ๐ŸŒŸ **Mid Journey Mimic**: The Mid Journey mimic setting at 0.5 is used to give an aesthetic feel without being too strong.
  • ๐Ÿ” **Negative Prompting**: Basic negative prompting is used to avoid NSFW content, blurriness, and bad hands.
  • ๐Ÿ”ข **Sampling Steps and Settings**: 50 sampling steps with DPM++ 3M, SD Exponential are used for the initial model.
  • ๐Ÿ”ง **Batch Count and CFG Scale**: A batch count of two and CFG scale between 6 to 7 are found to be effective settings.
  • ๐Ÿ”„ **Image to Image Transfer**: The generated image is then transferred to an image-to-image model for further enhancement.
  • ๐ŸŒˆ **Dreamshipper Turbo Model**: The Dreamshipper Turbo model is used for the second stage with settings adjusted for better realism.
  • โœ… **Realistic Results**: The final images are more realistic, with properly formed hands and fewer imperfections.

Q & A

  • What is the main focus of the video?

    -The main focus of the video is to demonstrate a method for generating realistic hands in images using the Stable Diffusion model without relying on ControlNet or inpainting techniques.

  • Which model is recommended for generating decent hands?

    -The Real V SDXL model is recommended for generating decent hands, as mentioned in the video.

  • What is the purpose of using two models in the process?

    -The purpose of using two models is to first generate an image with decent hand poses and then refine the image to make it more realistic by upscaling and denoising it with a different model.

  • What is the significance of the 'mid Journey mimic' setting?

    -The 'mid Journey mimic' setting is used to give an aesthetic feel to the image and to control the strength of the aesthetic effect, which is set at 0.5 to avoid it being too strong.

  • What are the steps used for sampling in the first model?

    -The video mentions using 50 sampling steps with DPM Plus+ 3M and SD exponential for the first model.

  • What is the role of the 'clip skip' setting?

    -The 'clip skip' setting is used to improve the details in the generated image, with a value of two being personally preferred by the presenter.

  • How does the presenter suggest improving the realism of the generated hands?

    -The presenter suggests using the 'dream shipper turbo' model with increased S3 caras scale and a batch count of four to improve the realism of the generated hands.

  • What is the recommended CFG scale for the turbo model?

    -The recommended CFG scale for the turbo model is one, as it is found to be good for the desired outcome.

  • What additional feature is enabled to enhance the image?

    -The 'ad tailor' feature is enabled to further enhance the image, and the 'freu integrated' and 'self-attention guidance integrated' settings are also enabled for better results.

  • What is the presenter's opinion on the final outcome of the process?

    -The presenter believes that the final outcome is quite realistic and should cover almost all use cases, especially for non-professional use.

  • Is there a suggestion for further improvement of the generated images?

    -The presenter suggests that for further improvement, one could try inpainting on the generated images.

  • What is the presenter's final verdict on the process?

    -The presenter concludes that the process is fairly simple and effective for generating images with proper hands, suitable for a wide range of applications.



๐ŸŽจ Creating Realistic Hands in Art with SDXL Model

The first paragraph introduces the topic of generating realistic hands without disfigurements using a simple process that doesn't require complex tools like control nets or in-painting. The speaker mentions using the real V SDXL model, which is known for producing decent hands but may not feel realistic enough. To enhance realism, two models are employed in the process, with the first being used for initial generation and the second for upscaling and detail enhancement. The speaker also discusses the importance of a full-body view to properly capture the hands in the desired poses. The settings used for the initial model include 50 sampling steps, DPM++ 3M, SD exponential, and a CFG scale of 6 to 7. The process also involves enabling certain features like self-attention and guidance integration within the Forge web UI, which streamlines the workflow without the need for additional extensions. The paragraph concludes with a note on adjusting the CLIP skip for better detail, although this doesn't significantly impact the hands' appearance.


๐Ÿš€ Enhancing Image Realism with Turbo Model Upscaling

The second paragraph details the process of improving the realism of generated images using a turbo model for upscaling. The speaker begins by discussing the initial outcome of the hand generation, noting that while the hands have the correct number of fingers, they appear plasticky and lack realism. The solution involves using a turbo model to enhance the image, which is done by adjusting the strength of the denoising and allowing the turbo model to refine the image. The speaker shares their findings that turbo models tend to produce more realistic results. The paragraph continues with the speaker's approach to refining the image further, suggesting that while in-painting could be used for professional purposes, the described method should suffice for most use cases. The speaker concludes by expressing hope that the viewers found the process helpful and thanks them for watching.



๐Ÿ’กStable Diffusion

Stable Diffusion is a term referring to a type of machine learning model used for generating images from textual descriptions. In the context of the video, it is the core technology that the tutorial is based on, aiming to improve the generation of hand images without distortion.


In the video, 'hands' is a key focus as the creator aims to demonstrate how to generate realistic and properly formed hands using the Stable Diffusion model. It is a common challenge in AI image generation for hands to appear natural and undistorted.


ControlNet is a technique used in AI image generation to control the output more precisely. The video mentions that the process shown will not use ControlNet, suggesting a simpler method for achieving realistic hands.


Inpainting is a process in image editing where missing or damaged parts of an image are filled in. The video script indicates that this technique will not be used, emphasizing a straightforward approach to generating hands.

๐Ÿ’กReal V SDXL Model

The 'Real V SDXL Model' is a specific model within the Stable Diffusion framework that the video mentions as being capable of generating decent hands. It is a key tool in the process outlined in the tutorial.

๐Ÿ’กMid Journey Mimic

This term refers to a setting or feature within the image generation process that gives the output an aesthetic feel. The video suggests using it at a 0.5 setting to avoid an overly strong effect.

๐Ÿ’กNegative Prompting

Negative prompting is a technique where the model is instructed to avoid certain outcomes, such as NSFW (Not Safe For Work) content or blurry images. The video mentions using negative prompts to improve the quality of the generated hands.

๐Ÿ’กSampling Steps

In the context of the video, 'sampling steps' likely refers to the number of iterations or steps taken during the image generation process. The script mentions using 50 sampling steps for a particular model.

๐Ÿ’กCFG Scale

CFG Scale stands for 'Control Flow Guidance Scale' and is a parameter that influences the level of detail in the generated image. The video discusses adjusting this scale to achieve better results.

๐Ÿ’กImage to Image

This term refers to a process where an existing image is used as a starting point to generate a new image with different characteristics or improvements. The video describes using this process to refine the generated hands.

๐Ÿ’กTurbo Model

A 'Turbo Model' in the context of the video is a type of AI model that works to enhance the quality of the generated image. It is mentioned as making the skin look more realistic in the final output.


Denoising is the process of removing unwanted details or 'noise' from an image to improve its clarity. The video script describes using a turbo model to denoise the image, contributing to more realistic hand images.


The video demonstrates a method to generate proper and non-disfigured hands using Stable Diffusion without complex tools like ControlNet or inpaintings.

The process is simple and suitable for creating normal poses with decent hands, avoiding extra fingers or other anomalies.

The Real V SDXL model is used for its ability to generate decent hands, but with some lack of realism.

Two models are utilized in the process to enhance the realism of the generated hands.

The importance of a full-body prompt is emphasized to ensure hands are visible and not just a close-up shot.

Mid Journey mimic is used at a 0.5 setting for an aesthetic feel, avoiding an overly strong outcome.

Negative prompting includes avoiding NSFW content, blurriness, and bad hands.

The video does not use a detailer initially as upscaling with a different model is planned.

50 sampling steps and DPM Plus+ 3M SD exponential are used with a batch count of two and CFG scale between 6 to 7.

Self-attention guidance integration is enabled in the Forge web UI for better results.

Clip skip is set to two for slightly better detail, though it does not affect the hands significantly.

The generated image may require rerunning to achieve the desired hand appearance.

The Dream Shipper turbo model is used for upscaling, with settings adjusted for optimal image quality.

The CFG scale is set to one for the turbo model, which is found to be sufficient for realistic results.

The ADtailor is enabled for additional adjustments, and self-attention guidance is maintained for quality.

The turbo model helps in making the skin appear more realistic and less plasticky.

The final images show a significant improvement in hand realism compared to the initial generation.

In-painting can be used for further improvement, but the method covers most use cases for non-professional use.