Realistic Face Swap with Stable Diffusion | EasyPhoto sd webui A1111

Nerdy Rodent
20 Oct 202315:46

TLDRThe video provides an in-depth guide on using the EasyPhoto extension for the Stable Diffusion web interface to perform realistic face swapping in photographs. The host explains the installation process, including the necessary downloads which can take up to 60GB, and the potential for extension conflicts. After training the face-swapping model with a set of personal photos, the video demonstrates the face transfer process using various images, including photographs, paintings, and statues. The results are mixed, with better outcomes on photographic images. The video also explores the extension's capabilities for video face swapping, which is a bit flickery and still experimental. The host concludes by noting that EasyPhoto is user-friendly and effective for photo face swapping, comparing it to the Reposer workflow for generating full-body images from a single photo.


  • 📷 **Easy Photo Extension**: It's a tool for face swapping in photographs using the Stable Diffusion model.
  • 🔧 **Installation**: Install Easy Photo and Control Net from the extensions tab, and be prepared for a large download that may require up to 60GB of disk space.
  • ⚠️ **Compatibility Warning**: There might be clashes with other extensions, and disabling some might be necessary for Easy Photo to work.
  • 💻 **System Requirements**: At least 10GB of VRAM is needed for training, and at least three Control Nets for inference.
  • 📚 **Training Process**: Upload 5 to 20 half-body or head-and-shoulder photos for training, which takes about 25 minutes.
  • 📈 **Advanced Options**: Users can adjust settings like resolution, model, and steps for training, though the defaults are generally good.
  • 🖼️ **Inference Tab**: Offers options for using a template, uploading a batch, and using SDXL beta for more complex tasks.
  • 🎨 **Creative Limitations**: The tool works better with photographic images and has limitations with non-photorealistic inputs like paintings or statues.
  • 🕒 **Processing Time**: Be prepared for longer wait times, especially when using advanced features like super resolution.
  • 📹 **Video Capability**: The batch upload feature can be used to create videos, but the results may be a bit flickery and require fine-tuning.
  • 🧪 **Experimental Features**: The SDXL beta tab is marked as experimental and may require more VRAM (at least 16GB) for stable operation.
  • 🔄 **Makeup Transfer Issue**: There's a known issue with the makeup transfer feature due to deprecated aliases in an old version of numpy, which is not compatible with Python 3.10.

Q & A

  • What is the name of the extension explored in the video?

    -The extension explored in the video is called 'Easy Photo'.

  • What are the two required extensions for the Easy Photo to work?

    -The two required extensions for Easy Photo to work are 'Easy Photo' and 'Control Net'.

  • How much disk space is needed for the Easy Photo extension to download all its files?

    -The Easy Photo extension can take up to 60 gigabytes for all of the downloads.

  • What is the minimum amount of VRAM required for training with Easy Photo?

    -At least 10 gigabytes of VRAM are required for training with Easy Photo.

  • How many control nets are needed for inference in Easy Photo?

    -At least three control nets are needed for inference in Easy Photo.

  • What kind of photos are recommended for training Easy Photo?

    -5 to 20 half-body photos or head and shoulder photos are recommended for training Easy Photo.

  • How long does the training process take in Easy Photo?

    -The training process in Easy Photo takes about 25 minutes.

  • What is the purpose of the 'User ID' in Easy Photo?

    -The 'User ID' in Easy Photo is a name given to the training subject, which helps to save the trained model under that specific name.

  • What is the 'Max steps per photo' setting in Easy Photo?

    -The 'Max steps per photo' setting in Easy Photo determines the number of training steps per photo, with a default of 200.

  • How does the Easy Photo extension handle the face swapping process with glasses?

    -The Easy Photo extension tends to remove the glasses as they were not present in the original training photos.

  • What is the 'Makeup Transfer' option in the Easy Photo extension?

    -The 'Makeup Transfer' option in the Easy Photo extension is intended to transfer makeup from one face to another, but it may result in errors related to deprecated numpy aliases.

  • Can the Easy Photo extension be used to create videos?

    -The Easy Photo extension has a batch upload feature that can be used to create videos, although the results may be a bit flickery and processing can take a long time.



😀 Introduction to Easy Photo Extension

The first paragraph introduces the Easy Photo extension, a tool for swapping faces in photographs. It explains the process of installing the extension and the initial setup, which includes downloading necessary files such as the Chill Out Mix Stable Diffusion 1.5 checkpoint and Control Net models. The user is warned about potential disk space requirements and the possibility of extension conflicts. The paragraph also covers the training process, including uploading photos, selecting a model, and adjusting advanced options like resolution and validation steps.


📈 Training and Advanced Options

This paragraph delves into the training process for the Easy Photo extension. It discusses the importance of using the same stable diffusion checkpoint for better results and setting a unique user ID for the training subject. The paragraph also outlines the advanced options available during training, such as enabling reinforcement learning and skin retouching. It emphasizes that the default settings are generally sufficient, but experienced users can tweak parameters like the base model, resolution, and training steps for optimal results.


🖼️ Testing with Various Images

The third paragraph focuses on testing the Easy Photo extension with different types of images, including photographs with glasses, paintings, and statues. It discusses the extension's performance with each image type and the results it produces, such as removing glasses or blending faces with statues. The paragraph also touches on the use of cartoon faces and the challenges of converting them into realistic photos. It concludes with a brief mention of the batch upload feature and the potential for creating videos with the extension.


🎥 Video Capabilities and SDXL Beta

This paragraph explores the video capabilities of the Easy Photo extension and the SDXL Beta tab. It mentions the flickering effect observed in videos and the option to turn off super resolution to speed up processing. The SDXL Beta tab is described as experimental and requires at least 16 gigabytes of VRAM. The paragraph discusses the various options available for generating new images, such as selecting body types and clothing colors, and the potential for creating a Laura file for further use.

🔄 Conclusion and Comparison with Reposer

The final paragraph concludes by summarizing the capabilities of the Easy Photo extension, emphasizing its ease of use and effectiveness in swapping faces in photos. It compares Easy Photo with the Reposer workflow, which can generate a complete version of a face, including hair, body, and clothing, from a single image without any training. The paragraph suggests that Easy Photo performs better with photographic-style images and encourages users to experiment with the tool.



💡Face Swap

Face Swap refers to the process of replacing one person's face with another's in a photograph or video. In the context of the video, it is the primary function of the 'EasyPhoto' extension for the Stable Diffusion web interface, which allows users to train a model on their own face and then apply it to other images or videos.

💡Stable Diffusion

Stable Diffusion is a machine learning model used for generating images from textual descriptions. The video discusses using Stable Diffusion in conjunction with the EasyPhoto extension to perform realistic face swaps, indicating its role in the broader field of AI-generated content.

💡EasyPhoto Extension

The EasyPhoto extension is a tool that integrates with the Stable Diffusion web interface to facilitate face swapping. It is highlighted in the video as a user-friendly method to train a face model and subsequently apply it to various images, showcasing its utility in the process of face manipulation.

💡Training a Model

Training a model in the context of the video means providing the AI with a set of images (a dataset) of a specific face. The model learns from these images to replicate the face accurately during the face swap process. It is a crucial step before performing face swaps, as it ensures the AI understands the features of the face it is to replicate.

💡VRAM (Video RAM)

VRAM, or Video RAM, is the memory used by graphics processing units (GPUs) for rendering images, videos, and 3D animations. The video mentions the requirement of at least 10 gigabytes of VRAM for training, emphasizing the resource-intensive nature of the face swapping process.

💡Control Net

Control Net is a component mentioned in the video that is necessary for the inference stage of the face swapping process. It is one of the prerequisites for using the EasyPhoto extension, indicating its role in the technical setup required for the AI to perform its tasks.


Inference in the context of AI refers to the process of applying a trained model to new data. After the face model is trained, the video discusses using inference to apply the model to different images, which is how the face swapping is actually executed.


A checkpoint in the context of machine learning is a saved state of the model at a particular point in training. The video mentions the Chill Out Mix Stable Diffusion 1.5 checkpoint and an SDL checkpoint, which are used as starting points for the face swapping process.

💡Unreal Engine

Unreal Engine, while not explicitly mentioned in the transcript, is likely the game engine used for the 'Reposer workflow' referenced in the video. It is a widely used engine for creating high-quality, realistic visuals in games and other interactive experiences, which may relate to the discussion on generating poses from a single image.


Resolution in digital imaging refers to the number of pixels in an image, which determines its clarity and detail. The video discusses setting the resolution for the face swapping process, with higher resolutions requiring more VRAM and potentially leading to more realistic results.

💡Super Resolution

Super Resolution is a technique used to increase the resolution of an image beyond its original size, enhancing its detail. The video mentions the option to use super resolution during the face swapping process, but notes that it can result in very large image sizes and longer processing times.


EasyPhoto is a user-friendly extension for face swapping in photographs using the Stable Diffusion model.

The extension can be installed via the extensions tab and requires a good amount of disk space for downloads.

Training the face-swapping model involves uploading 5 to 20 half-body or head and shoulder photos.

The training process takes approximately 25 minutes and requires at least 10 GB of VRAM.

Advanced options are available for customization, but the default settings are suitable for most users.

The face-swapping model can handle various input types, including photographs, paintings, and statues.

Cartoon faces can also be trained and swapped with the extension, although results may be strange.

The extension supports batch uploads and can attempt to create videos, although the results may be flickery.

The SDXL beta tab allows for the generation of new images with the swapped face, requiring at least 16 GB of VRAM.

The extension provides a variety of settings for customization, including resolution, model selection, and reinforcement learning.

The face-swapping model can struggle with certain elements like glasses, but generally performs well with faces.

The training process is straightforward, with a clear guide on uploading photos and naming the model.

The inference tab offers multiple options for applying the trained face, including templates, batch uploads, and SDXL beta.

The extension is capable of handling a variety of tasks, from simple face swaps to more complex image manipulations.

EasyPhoto's face-swapping capabilities are most effective with photographic style images.

The extension is experimental and may receive updates to improve stability and functionality.

EasyPhoto is a versatile tool that can generate a variety of poses from a single image without the need for extensive training.

The extension is easy to use and offers a quick way to swap faces in photos, similar to Reposer workflow.