This new Open Source Model is better than Midjourney or SD3?! | Flux local ComfyUI Install Guide

Endangered AI
3 Aug 202416:30

TLDRThe video discusses the emergence of new open-source image generation models, particularly the Flux model by Black Forest, which is seen as superior to Stable Diffusion 3. It covers the installation process on Comfy UI, the three released versions of the model, and their capabilities. The video also compares Flux with AA Flow and other models, highlighting Flux's impressive performance in generating detailed images with correct human proportions and text encoding.

Takeaways

  • 🌐 The open-source image generation model landscape is evolving with the release of new models like AA flow and Black Forest's Flow 1.0, which some consider superior to Midjourney or SD3.
  • 🆕 Black Forest, the team behind Flow 1.0, has released three versions of the model: a non-commercial dev model, a commercial-ready Schnell model, and a close-source version available via API.
  • 📝 The dev model from Black Forest, despite being non-commercial, is highly impressive and has garnered attention for its capabilities.
  • 🔍 The Schnell model is a commercial-ready version that can be used in projects with the appropriate terms outlined.
  • 🛠️ To use the Flow models on Comfy UI, users need to download specific files and place them in designated folders within the Comfy UI models directory.
  • 📚 The Comfy UI team has provided an example page on GitHub to assist with setting up the models, including the necessary T5 XXL clip text encoder.
  • 🔄 The workflow for using the new models in Comfy UI involves setting up nodes for various parameters, including noise, guide, and sigmas, which are different from traditional SDXL workflows.
  • 🎨 Flux models have shown significant improvements in generating images, especially in areas like finger detail and overall image quality.
  • 🔍 Comparisons between Flux, AA flow, and other models reveal Flux's superior performance in producing more detailed and aesthetically pleasing images.
  • 👀 Flux's text encoding capabilities are notable, as demonstrated by its ability to incorporate text into images in a visually compelling way.
  • ⚙️ The number of steps in the Flux model's generation process can lead to substantial differences in output, unlike other models where the result is typically the same.

Q & A

  • What is the title of the video guide and what is it about?

    -The title of the video guide is 'This new Open Source Model is better than Midjourney or SD3?! | Flux local ComfyUI Install Guide'. It is about the installation and comparison of a new open-source image generation model called Flux, which is being compared to other models like Midjourney and Stable Diffusion 3.

  • What is the significance of the AA flow model in the context of the video?

    -The AA flow model is significant as it is one of the recently released open-source models that emerged after the release of Stable Diffusion 3. It is considered by some to be an improvement over Stable Diffusion 3, setting the stage for the introduction of the Flux model.

  • Who is Black Forest and what is their contribution to the open-source image generation models?

    -Black Forest is a company composed of the former SDXL team. They have contributed to the field by releasing Flow 1.0, an open-source model that is being positioned as a superior alternative to Stable Diffusion 3.

  • What are the three versions of the Flux model released by Black Forest?

    -Black Forest has released three versions of the Flux model: the Dev model (non-commercial with the possibility of obtaining a license), the Schnell model (commercial ready), and a close-source version provided via their API.

  • What is the issue with Stable Diffusion 3 that the new models, including Flux, are trying to solve?

    -The issue with Stable Diffusion 3 that the new models, including Flux, are trying to solve is the generation of images with women on grass, which is likely a metaphor for the model's limitations in generating realistic human figures or backgrounds.

  • What is the process of installing the Flux model on ComfyUI?

    -The process involves downloading the model files from Black Forest Labs' Hugging Face page, placing the model files in the appropriate folders within the ComfyUI models directory, and ensuring that the text encoder (T5 XXL clip) is correctly integrated into the workflow.

  • Why is the T5 XXL clip important in the installation process?

    -The T5 XXL clip is important because it serves as the text encoder for the Flux model. It is the same text encoder used by Stable Diffusion 3, and it needs to be downloaded and placed in the model's clip folder for the workflow to function correctly.

  • What are the differences between the sampler custom Advanced and the traditional sampler in the ComfyUI workflow?

    -The sampler custom Advanced in the ComfyUI workflow is different from the traditional sampler in that it uses nodes for setting up parameters, allowing for a more flexible and customizable image generation process.

  • How does the Flux model perform in comparison to other models like AA flow and Colors?

    -The Flux model performs exceptionally well, with improvements in areas such as human proportions, hand detailing, and text encoding. It is noted for fixing issues like fingers that other models like AA flow and Colors have struggled with.

  • What is the significance of the number of steps in the image generation process for the Flux model?

    -The number of steps in the Flux model's image generation process can result in substantial differences in the output, unlike other models where the number of steps typically refines the image without significant changes. This feature allows for more variation and control over the final image.

  • What is the current status of the Dev model in terms of commercial use?

    -The Dev model, while impressive, is currently non-commercial. However, there is a possibility to request a license for its use, which could be a point of interest for community members looking to build on top of the model.

Outlines

00:00

🤖 Emergence of Open-Source Image Generation Models

The script discusses the unexpected surge of open-source image generation models following the release of Stable Diffusion 3. It highlights the AA flow model as a notable contender and introduces Black Forest, a company that has released the Flow 1.0 model, which is considered superior. The company offers three versions of the model: a non-commercial dev model, a commercial-ready Schnell model, and a close-source version via API. The script emphasizes Black Forest's respect for the open-source community and provides a step-by-step guide on setting up the Schnell model on Comfy UI, including downloading the model files and the necessary CLIP encoder.

05:00

🔧 Setting Up and Comparing Flux Models in Comfy UI

This paragraph provides a detailed walkthrough of configuring the Flux model in Comfy UI, explaining the process of downloading and setting up the necessary components. It also delves into the structure of the workflow, comparing it to traditional sdxl workflows and explaining the function of each node in the process. The script then showcases a comparison of image generation results from Flux, AA flow, and other models, highlighting the improvements in finger detail and overall aesthetics that Flux offers over its competitors.

10:02

🎨 Exploring Realism and Text Encoding with Flux

The script moves on to test the Flux model's capabilities in generating more realistic images and handling text encoding. It describes the process of tweaking prompts to generate images of a Victorian entrance hall and a female knight, noting the model's impressive performance in terms of human proportions, face details, and text rendering. The paragraph also discusses the model's unique behavior with the lightning model, which shows significant visual differences based on the number of steps taken in the generation process.

15:02

🏴‍☠️海盗主题测试与开源模型的未来展望

The final paragraph presents an experiment using the Flux model to create an image of a female pirate with the word 'flux' on the bow of a ship. It reflects on the model's performance and the potential for upscaling to improve results. The script concludes with thoughts on the future of open-source image generation models, expressing excitement about the rapid development in the field and the hope that competition will foster innovation similar to the progress seen in large language models.

Mindmap

Keywords

💡Open Source Model

An open source model refers to a software model that is freely available for use, modification, and distribution by the public. In the context of the video, it discusses the emergence of new open source image generation models that have been released by various groups, challenging the dominance of proprietary models like Midjourney or SD3. The video highlights the AA flow model and the Flux model by Black Forest as significant contributions to this space.

💡Stable Diffusion 3

Stable Diffusion 3 is a proprietary image generation model that has been a significant player in the AI art generation field. The video mentions it as a point of comparison for the new open source models, suggesting that the latter may offer superior capabilities or different features that address some of the issues with Stable Diffusion 3, such as the infamous 'women on grass' problem.

💡AA Flow Model

The AA Flow Model is one of the open source image generation models mentioned in the video. It is presented as an impressive alternative to Stable Diffusion 3, with the release of its 0.2 version. The video script suggests that the AA Flow Model has been well-received and is considered a significant development in the open source AI art generation community.

💡Black Forest

Black Forest is the team behind the Flux model, which is another open source image generation model discussed in the video. The team is described as being 'for-profit' but still supportive of the open source community. They have released three versions of their model, indicating a commitment to different use cases and commercial possibilities.

💡Dev Model

The Dev Model is one of the versions of the Flux model released by Black Forest. It is non-commercial and can be used under certain terms, which the video suggests may require a license. The script highlights the Dev Model's impressive capabilities, expressing disappointment that it is not available under commercial terms due to its high quality.

💡Schnell Model

The Schnell Model is another version of the Flux model, described as being 'lightning based' and commercially ready for use in projects. The video script indicates that this model is also very capable and could be used in commercial projects, with terms already established for its use.

💡Comfy UI

Comfy UI is a user interface mentioned in the video that is used to run the Flux model. The script provides a guide on how to install and run the model using Comfy UI, indicating that it is a platform or tool that supports the operation of AI image generation models.

💡T5 XXL Clip

The T5 XXL Clip is a text encoder used in conjunction with the Flux model, as mentioned in the video. It is the same text encoder used with Stable Diffusion 3, suggesting a level of compatibility or similarity in how text prompts are handled between different models.

💡Workflow

In the context of the video, a workflow refers to the sequence of steps or processes used to generate images with the Flux model in Comfy UI. The script describes a detailed workflow that includes various components such as the sampler, guide, text encoder, and scheduler, which are all part of the image generation process.

💡Scheduler

A scheduler in the video refers to a component of the image generation process that determines the steps and the denosing rate. It is part of the workflow in Comfy UI and is used to control the progression of the image generation, affecting the final output's quality and characteristics.

💡ControlNet

ControlNet is mentioned in the video as a potential tool or method to improve the results of image generation with the Flux model. While not fully explained in the script, it suggests a form of control or guidance system that could be applied to the model to enhance its performance.

Highlights

The release of the AA flow model is seen as a superior alternative to Stable Diffusion 3.

Black Forest, the former SDXL team, has released Flow 1.0, which is considered next level in image generation models.

Black Forest has released three versions of the model: Dev, Schnell, and a close-source version via their API.

The Dev model is non-commercial but can be licensed for use, while the Schnell model is commercial-ready.

The Dev and Schnell models are impressive and can solve the 'women on grass' problem that Stable Diffusion 3 had.

A guide is provided on how to install and run the Flow models on Comfy UI.

Instructions on downloading the necessary files from Black Forest Labs' Hugging Face page are given.

The process of placing the downloaded model files in the correct folders within Comfy UI is explained.

A recommendation to download the T5 XXL clip for use with the models is made.

A workflow example is provided to understand the components and setup for using the models in Comfy UI.

The Sampler Custom Advanced node setup and its parameters are detailed.

The importance of the CLIP text encoder and its integration with the models is discussed.

Examples of generated images using the new models are shown, demonstrating the models' capabilities.

Comparisons between Flux, AA flow, and other models are made, highlighting the improvements in finger rendering.

The Flux model is praised for its consistent high quality in generating images with correct human proportions.

The flexibility of the Schnell model in generating different outcomes based on the number of steps is noted.

The potential for the open-source model development to accelerate, similar to the large language model space, is discussed.

The impact of multiple open-source model creators on the speed and diversity of development is considered.