Stable Cascade img2img in ComfyUI

23 Feb 202405:07

TLDRIn this video, the creator introduces a new feature to an existing workflow, focusing on image-to-image functionality using the efficientnet model. The tutorial guides viewers through modifying the settings, removing stage C, and incorporating a custom latent image. By using a VAE encode node and adjusting denoise settings, the creator demonstrates how to transform a base image into a new version with desired elements, such as futuristic armor. The video emphasizes the importance of updating the ComfyUI and experimenting with settings to achieve the desired outcome.


  • 🚀 The video introduces an update to the previous workflow by integrating image-to-image functionality.
  • 📚 An efficiency net model is required for this new functionality, which is detailed in a linked paper in the description.
  • 🌐 The model can be found in the same Hugging Face repository as previous models used in the tutorials.
  • 🎨 Modifications are made to the workflow, specifically removing the Stage C and replacing it with a custom latent image.
  • 🖼️ A 'load image' node is created to load a base image onto which a new image will be superimposed.
  • 🔄 An VAE encode node is used to encode the loaded image into a latent representation.
  • 🔧 The workflow involves connecting the latent output to the first sampler's latent input port.
  • 📝 The script provides an example of changing a dress to an armor and helmet using a specific prompt.
  • 🔄 Adjustments to the 'D noise' setting in the first sampler can influence how the source image is transformed.
  • 💻 It is recommended to update the ComfyUI to the latest version to avoid errors when implementing the workflow.
  • 📢 The video creator encourages viewers to ask questions in the comments for further clarification.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is the integration of image to image functionality into a workflow, using an efficiency net model for creating a new image based on a base image.

  • What changes need to be made to the previous workflow?

    -The changes include removing the stage C and replacing it with a custom latent image, achieved by loading an image and encoding it with the efficiency net model.

  • What is the role of the efficiency net model in this process?

    -The efficiency net model is used for encoding the base image to create a new latent image, which is then used in the first case sampler.

  • Where can viewers find the efficiency net model?

    -The efficiency net model can be found in the same Hugging Face repository mentioned in the video description, where other models used previously are also located.

  • How does the VAE encoder node fit into the workflow?

    -The VAE encoder node is used to encode the loaded image into a latent representation that can be utilized by the first case sampler.

  • What is the significance of the D noise setting in the first case sampler?

    -The D noise setting is crucial for determining how much of the source image is reflected in the generated image. Adjusting this setting allows for control over the presence of desired features like armor and helmet in the output image.

  • What was the initial issue faced when trying to run the workflow with an old configuration?

    -The video creator experienced an error when trying to run the workflow with their old configuration. It was resolved by updating the ComfyUI to the latest version.

  • What is the recommended D noise setting to start with?

    -The video creator suggests starting with a D noise setting of 0.5 and adjusting it according to the desired outcome.

  • How does the video demonstrate the iterative process of refining the image?

    -The video shows multiple attempts with different D noise settings (0.5, 0.7, and 0.8) to refine the image, illustrating the trial-and-error process needed to achieve the desired result.

  • What advice does the video creator give for users who encounter issues?

    -The video creator advises users to ensure their ComfyUI is updated and to seek help by commenting below the video if they encounter any issues.



🚀 Introducing Image-to-Image Workflow with EfficiencyNet

This paragraph introduces viewers to a new video tutorial focused on enhancing an existing workflow by integrating image-to-image functionality. The host explains that they will demonstrate how to use an EfficiencyNet model, a new and efficient neural network, to transform a base image into a new one. They mention that the model can be found in the description and guide viewers on how to make necessary changes to the workflow, particularly with the stable Cascade and empty latent image settings. The host also provides a step-by-step guide on how to replace the Stage C with a custom latent image, load an image node, and encode the image using the VAE encode node. They emphasize the importance of the D noise setting in achieving the desired transformation and suggest experimenting with different settings to get the best results.


👋 Closing and Future Engagement

In this closing paragraph, the host expresses hope to see the viewers again soon and encourages them to leave comments if they have any questions. The paragraph serves as a warm and engaging conclusion to the video, inviting viewer interaction and creating a sense of community. The host's aim is to ensure that the viewers feel supported and looked forward to their continued participation in future tutorials.



💡Stable Cascade

Stable Cascade refers to a method in image processing where multiple layers or stages of filters or transformations are applied to an image in a cascading manner. In the context of the video, it is used to describe the process of transforming one image into another by adding new visual elements on top of the base image. The script mentions that the Stable Cascade is part of the workflow being updated to include image-to-image functionality, which is essential for creating a new image based on a source image with added features like armor and a helmet for a futuristic soldier.

💡Efficiency Net Model

The Efficiency Net Model is a type of neural network architecture that is designed to be efficient in terms of computation and memory usage while still maintaining high performance. In the video, the presenter mentions that this model is quite new and is required for encoding the image to create a latent representation, which is a crucial step in the image-to-image translation process. The model is linked in the description for viewers to access, indicating its importance in achieving the desired visual effects.

💡Image-to-Image Functionality

Image-to-Image Functionality refers to the process of converting one image into another by adding, modifying, or transforming visual elements. In the video, this concept is central to the workflow being demonstrated, where the goal is to take a base image and create a new image on top of it. The presenter guides the viewers through the process of integrating this functionality into their workflow, which involves using the Efficiency Net Model and adjusting settings to achieve the desired outcome.

💡Custom Latent Image

A Custom Latent Image is a representation of an image that has been encoded into a lower-dimensional space, capturing the essential features and structure of the original image. In the context of the video, the presenter replaces the previous stage with a custom latent image to facilitate the image-to-image translation process. This involves loading an image and encoding it using the Efficiency Net Model, which then allows the creation of a new image with added elements such as futuristic armor and a helmet.

💡VAE Encode Node

VAE Encode Node refers to a specific component in the workflow that is used for encoding an image using a Variational Autoencoder (VAE). In the video, the presenter instructs the viewers to create a VAE encode node to transform their loaded image into a latent representation. This encoded image is then used as input for the first sampler in the Stable Cascade process, which is crucial for generating the final image with the desired modifications.

💡Fnet Encoder

The Fnet Encoder is a specific type of encoder model used in the video for encoding images into latent representations. It is mentioned that the Fnet Encoder is a safe choice for the encoding process and that it can be found in the same repository as other models used in the workflow. The encoder is essential for the image-to-image translation process, as it helps to create a custom latent image that can be used to modify the base image with new visual elements.

💡Denoise Setting

The Denoise Setting is a parameter in the image processing workflow that affects how the new image is generated from the latent representation. In the video, the presenter emphasizes the importance of adjusting the denoise setting to achieve a realistic and desired outcome. By changing the denoise setting, the presenter is able to control the presence and clarity of the added elements, such as the helmet and armor, on the source image.


In the context of the video, a Prompt refers to a description or a set of instructions that guide the image generation process. The presenter provides a prompt as an example, which is a futuristic soldier wearing a face-covering helmet with high-tech, colorful LED armor. The prompt serves as a creative input for the image-to-image translation process, helping to shape the final visual outcome.


ComfyUI appears to be the user interface of the software or platform being used in the video for image processing. The presenter mentions updating ComfyUI to ensure compatibility with the new workflow, indicating that it is an essential tool for implementing the Stable Cascade image-to-image process. The interface allows users to load images, adjust settings, and manage the workflow for creating custom visual content.


In the context of the video, an Update refers to the process of upgrading or improving software to a newer version in order to access new features or fix potential issues. The presenter advises viewers to update their ComfyUI to avoid errors when attempting the workflow. This highlights the importance of keeping software up-to-date to ensure the smooth functioning of the image processing tasks and to take advantage of the latest improvements and capabilities.


Introducing an addition to the previous workflow for image processing.

Integration of image to image functionality to create new images based on a base image.

Utilization of the EfficiencyNet model for encoding images.

Link to the EfficiencyNet paper provided in the description for further reading.

Changes required in the settings of the existing workflow.

Deletion of the Stage C and modification of Stage B.

Creation of a custom latent image using a load image node.

Encoding of the image with a VAE encode node using the new EfficiencyNet model.

Connection of the latent port to the first case sampler.

Selection of a new prompt for image transformation.

Adjustment of the D noise setting for better image representation.

Demonstration of the updated image with the desired armor and helmet.

Explanation of the importance of the denoise setting in achieving the desired output.

Guidance on experimenting with settings to create the desired image.

Emphasis on updating the ComfyUI for successful workflow execution.

Recommendation to use a manager for a smoother workflow experience.

Invitation for questions and future tutorials.