ComfyUI: Face Detailer (Workflow Tutorial)

7 Jan 202427:16

TLDRIn this tutorial, Mali demonstrates a workflow for enhancing AI-generated images using the Comfy UI manager and various nodes. The process involves facial feature editing, realistic image detailing, and graphical face addition. Mali covers basic to advanced techniques, including the use of bounding boxes and segmentation for accuracy, and the importance of prompts and checkpoints for desired outputs. The tutorial showcases how to fix low-resolution images, adjust facial features, and even change hairstyles with automation and fine-tuning.


  • ๐ŸŽจ The video tutorial focuses on using AI and image segmentation to edit and enhance facial features in images without manual painting.
  • ๐Ÿ”— Introduction of the Comfy UI manager and its significance in streamlining the workflow for image editing.
  • ๐Ÿ“ฆ The necessity of installing specific nodes and models, such as the impact and Inspire packs, for the workflows to function properly.
  • ๐Ÿค– Utilization of AI models like YOLO 8s for bounding boxes and YOLO 8N for segmentation to detect and refine facial elements.
  • ๐Ÿ” The process of fine-tuning the detection system by adjusting the B box threshold and dilation values for more accurate facial feature selection.
  • ๐Ÿ–Œ๏ธ The use of the detailer pipe and refiner nodes to add or enhance details in specific areas of the image, such as the face or hair.
  • ๐Ÿ“ธ Employing prompts and conditioning to guide the AI in achieving desired visual outcomes, like realistic or illustrative styles.
  • ๐Ÿ”„ The concept of running the image through multiple passes for further refinement and detail enhancement.
  • ๐Ÿ’ป The importance of selecting appropriate checkpoints for achieving intrinsic details based on the desired output.
  • ๐ŸŽญ Demonstration of the capability to change facial features, such as eyes and hair, using a combination of nodes and prompts.
  • ๐Ÿš€ Finalizing the workflow with the ultimate SD upscale node to upscale the image without adding further details.

Q & A

  • What is the main purpose of the tutorial?

    -The main purpose of the tutorial is to demonstrate workflows and techniques for editing and enhancing AI-generated images, particularly facial elements, using various tools and nodes within a comfortable user interface (UI).

  • What are the key tools mentioned for image segmentation and facial feature editing?

    -The key tools mentioned include CLIPS image segmentation, Comfy UI manager, and quality of life nodes by Dr Data, as well as the impact and Inspire packs for additional functionalities.

  • What models are used for bounding boxes and segmentation in the tutorial?

    -The tutorial uses the YOLO 8s face model for bounding boxes (bbox) and the YOLO 8N seg 2 model for segmentation. Additionally, the impact pack uses a more sophisticated 'segment anything' model.

  • How does the 'sdxl' node simplify the workflow?

    -The 'sdxl' node simplifies the workflow by creating a pipeline that carries information like conditioning VA models, reducing clutter. It eliminates the need to reconnect inputs for multiple nodes that require the same information, making long workflows less messy.

  • What is the role of the 'bbox detector' in the process?

    -The 'bbox detector' is used to detect the subject as a rectangle, which is then used for further processing like detailing and segmentation. It helps in identifying the area of interest for enhancement or editing.

  • How can the 'refiner' inputs in the 'sdxl' node be adjusted?

    -The 'refiner' inputs in the 'sdxl' node can be adjusted by manually connecting the appropriate models for segmentation and other refinement tasks. This allows for more precise control over the editing process.

  • What is the significance of the 'guide size' value in the tutorial?

    -The 'guide size' value determines the scaling of the mask area. If the mask size is smaller than this value, it scales it to the maximum size and then adds details. It helps in controlling the level of detail added to the image based on the detected area.

  • How does the 'force and paint' node work?

    -The 'force and paint' node forces regeneration of the image, even if the mask size is smaller than the guide size. It ensures that the detailing effect is applied even to smaller subjects, maintaining consistency in the editing process.

  • What is the purpose of the 'blur analyzer node' and when is it used?

    -The 'blur analyzer node' is used to improve the performance of the workflow by adding prompts that help in achieving better results. It is particularly useful when dealing with images that are difficult to restore or enhance using the basic settings.

  • How can the hair selection process be automated?

    -The hair selection process can be automated using the 'media pipe face mesh detector' and 'clip seg' nodes. These nodes, combined with appropriate prompts and mask manipulation, allow for accurate and consistent hair selection and enhancement.

  • What are the steps to fine-tune the hair selection and style?

    -To fine-tune the hair selection and style, one needs to adjust the mask threshold, binary value, and mask dilation. Additionally, creating a clothing mask and subtracting it from the overall mask can improve results when dealing with hair overlapping on clothes.

  • How does the tutorial address the issue of artifacts in the output?

    -The tutorial addresses the issue of artifacts by suggesting adjustments to the cropping factor, mask dilation, and threshold values. It also recommends using multiple passes with different checkpoints and fine-tuning prompts to correct or minimize artifacts.



๐ŸŽจ Introduction to AI Image Editing Workflow

This paragraph introduces the speaker, Mali, and the topic of AI-generated image editing. Mali explains that fixing distorted faces in AI images through manual inpainting is tedious, but can be streamlined using CLIPs image segmentation. The video will demonstrate how to edit facial elements and even transform realistic images into graphical ones or vice versa, all without manual painting. Mali acknowledges the support of channel members and outlines the tutorial's structure, which includes four workflows of increasing complexity and the necessity of certain nodes for the workflows. A special thanks is given to Dr. Data for his contributions to the Comfy UI manager and quality of life nodes.


๐Ÿ› ๏ธ Setting Up the Image Editing Pipeline

In this paragraph, Mali delves into the technical setup required for the image editing pipeline. The guide size value and its impact on scaling are discussed, as well as the noise mask's function. Mali explains the importance of the bbox threshold and its limitations, and how to refine the detection system. The paragraph also covers the use of the 'adder' and 'detailer' nodes for image enhancement, the significance of the crop factor, and the resolution's effect on the editing process. Mali shares practical examples to illustrate how to achieve desired results and the role of prompts in refining image details.


๐Ÿ”„ Advanced Workflows and Robustness Testing

Mali discusses advanced workflows and the importance of testing the robustness of the editing process. The paragraph explains how to handle images that cannot be fully restored with a single pass and the use of different checkpoints for varying results. Mali demonstrates how to use the blip analyzer node for manual prompting and compares the outcomes of different configurations. The paragraph also covers the process of fine-tuning checkpoints for intrinsic details and the impact of checkpoint selection on the final image, including examples of changing facial features and styles using various checkpoints.


๐Ÿ’‡โ€โ™€๏ธ Automating Facial Feature Selection and Hair Editing

This paragraph focuses on the automation of facial feature selection and hair editing. Mali introduces the media pipe face mesh detector and its use in creating masks for specific facial details. The process of enhancing eyes using an sdxl compatible Laura node is discussed, as well as the method for changing eye color. Mali explains how to use the clip seg node for automated hair selection and the importance of adjusting the mask threshold and dilation for optimal results. The paragraph also covers the creation of a clothing mask and the use of the switch node for streamlined mask management.


๐Ÿš€ Final Touches and Upscaling Images

In the final paragraph, Mali covers the last steps of the image editing process, including fine-tuning the hair mask and the use of the ultimate SD upscale node for upscaling images. The importance of adjusting the mode type for upscaling is emphasized to prevent additional details from being added. Mali provides a summary of the basic pipe connection and the addition of a note for reference. The tutorial concludes with a recap of the key points and a sign-off until the next video, accompanied by music.



๐Ÿ’กAI generated images

AI generated images refer to visual content that is created using artificial intelligence algorithms. In the context of the video, these images often have distorted faces which the creator aims to fix using various software tools and techniques. This process is part of the broader theme of enhancing and editing digital media with AI.

๐Ÿ’กImage segmentation

Image segmentation is a process in computer vision and image editing where an image is divided into multiple segments or sets of pixels, often to simplify or change the image's representation. In the video, the creator uses image segmentation to edit facial elements in AI generated images, which is crucial for improving the visual quality and realism of the images.


Workflows refer to the sequence of steps or processes involved in completing a task or achieving a goal. In the video, the creator outlines various workflows for editing AI generated images, ranging from basic to advanced techniques. These workflows are designed to efficiently address different levels of image distortion and enhance the images' realism.

๐Ÿ’กComfy UI manager

Comfy UI manager is a user interface management tool mentioned in the video that helps in organizing and managing nodes for image editing. It is a crucial part of the workflow as it simplifies the process of using complex nodes and packs, making the editing process more efficient and user-friendly.

๐Ÿ’กGitHub page

GitHub is a web-based hosting service for version control and collaboration that allows developers to share and manage code. In the context of the video, the GitHub page is where the creator provides access to additional nodes and tools required for the image editing workflows. It serves as a resource for viewers to find and download necessary components for their projects.

๐Ÿ’กHugging face models

Hugging face models refer to a collection of pre-trained machine learning models available on the Hugging Face platform, which are widely used in natural language processing and other AI applications. In the video, these models are utilized for tasks such as bounding boxes and segmentation, which are essential steps in editing AI generated images.

๐Ÿ’กYOLO models

YOLO, an acronym for 'You Only Look Once', is a popular real-time object detection system used in computer vision. In the video, YOLO models are used for both bounding box detection, which identifies the subject as a rectangle, and segmentation, which uses a silhouette as a mask. These models are crucial for accurately selecting and editing facial features in AI generated images.

๐Ÿ’กDetailer pipe

The Detailer pipe is a component in the image editing process that adds details or enhances specific areas of an image based on the input it receives. In the video, it is used in conjunction with other nodes to refine and improve the quality of the facial features in AI generated images, particularly after the initial bounding box and segmentation steps.


Checkpoints in the context of machine learning and AI refer to saved states of a model's training process. These checkpoints can be used to resume training or to apply the model's current state to new data. In the video, checkpoints are used to apply specific details or styles to the AI generated images, such as realistic or illustrative styles.


In the context of AI and machine learning, prompts are inputs or instructions given to the model to guide its output. In the video, prompts are used to fine-tune the editing process, such as specifying the type of hair or facial features to be added or modified in the AI generated images.


Upscaling refers to the process of increasing the resolution of an image, typically to enhance its quality and detail. In the video, upscaling is used after the editing process to improve the overall appearance of the AI generated images, making them more detailed and visually appealing.


AI-generated images with distorted faces can be fixed using image segmentation techniques.

The tutorial introduces a workflow that automates facial element editing in a comfortable UI, reducing the need for manual painting.

The use of CLIPs image segmentation allows for the editing of facial features in a batch process.

Realistic images can be given a graphical face or AI-generated images can be enhanced with facial realism using the same tools.

The tutorial covers four workflows, starting with basic and progressing to advanced techniques.

Custom nodes from Dr Data on GitHub are essential for some workflows and enhance the quality of life within the UI manager.

The tutorial requires the installation of specific packs and models for different stages of the workflow.

The use of bounding boxes and segmentation models like YOLO 8s and YOLO 8N seg 2 improves accuracy in facial feature detection.

The refiner inputs within the node are crucial for the final stages of the workflow.

The addition of the detailer pipe and load image nodes is key for refining and enhancing the facial features.

The guide size value and crop area play a significant role in determining the level of detail added to the image.

The tutorial demonstrates how to adjust settings like bbox threshold and dilation for optimizing detection and detailing.

The use of prompts and the blip analyzer node can significantly improve the performance and outcome of the image processing.

The tutorial shows how to fine-tune the settings for different facial features like eyes and hair using various nodes and prompts.

The process of creating a clothing mask and subtracting it from the overall mask is explained for dealing with artifacts caused by overlapping elements.

The switch node setup allows for streamlined switching between different masks for focused detailing.

The tutorial concludes with a demonstration of how to upscale the image without adding further details, using the ultimate SD upscale node.