SD3 Medium Base Model in ComfyUI: Not as Good as Expected – Better to Wait for Fine-Tuned Versions
TLDRStability AI's SD3, initially met with high expectations, has faced setbacks including leadership changes and financial struggles. Despite these, SD3 was released on June 12th, featuring improved photorealism, prompt adherence, and text generation. The release includes three model files with varying clip inclusions, requiring Comfy UI for optimal use. While SD3 shows promise, it has flaws, particularly in generating human figures. The community awaits fine-tuned versions for better performance.
Takeaways
- 😀 Stability AI announced the release of SD3, a major version following SD 1.5 and SDXL, which was highly anticipated.
- 😔 The company faced leadership changes and financial difficulties, leading to concerns about the future of SD3.
- 📅 Despite setbacks, SD3 was officially open-sourced and released on June 12th as scheduled.
- 🖼️ SD3 showcases excellent photorealistic effects and the ability to understand complex prompts involving spatial relationships and compositional elements.
- 📝 Improvements in text generation are evident, with no artifacts or spelling errors in the generated images.
- 🔧 The new architecture used by SD3 is the multimodal diffusion Transformer (DIT), which contributes to its advantages.
- 📚 The official recommendation for using SD3 is through ComfyUI, which was recently updated to support SD3.
- 📁 Three model files were released, with the smallest being SD3 Medium at 4.34 GB, requiring separate CLIP downloads for use in ComfyUI.
- 💻 Users need to upgrade ComfyUI to the latest version to ensure compatibility with SD3.
- 🐑 SD3 has shown the ability to correctly interpret and generate text on images, such as writing a nickname on a hat.
- 🙁 However, SD3 has flaws, particularly in generating human figures, which has been a point of complaint among users.
- 🔮 The future of SD3 depends on the adoption and development of third-party models, such as control nets and other enhancements.
Q & A
What was the initial expectation for SD3 based on previous versions?
-The initial expectation for SD3 was that it would be another major version like SD 1.5 and SDXL, expected to be widely used and highly anticipated.
What challenges did Stability AI face before the release of SD3?
-Stability AI faced several challenges including the resignation of the company's founder and CEO Emad Moake, the departure of the core research team, and funding difficulties due to their free open-source business model, which put the company's financial situation in jeopardy.
When was SD3 officially released by Stability AI?
-SD3 was officially released by Stability AI on June 12th, as announced at AMD's launch event.
What are the notable capabilities of SD3 as showcased in the initial images?
-The initial images showcased SD3's excellent photorealistic effect, adherence to complex prompts involving spatial relationships, compositional elements, actions, and styles, and an evident improvement in text generation without artifacts or spelling errors.
What is the multimodal diffusion Transformer (DIT) and how does it contribute to SD3's advantages?
-The multimodal diffusion Transformer (DIT) is the new architecture used by SD3, which is responsible for the model's photorealistic effect, prompt adherence, and improved text generation capabilities.
How many model files were released for SD3 and what are their sizes?
-Three model files were released for SD3: the smallest is SD3 medium at 4.34 GB, the medium size is 5.97 GB, and the largest is the full package with everything included.
What is the recommended software to use with SD3 according to the official recommendation?
-The official recommendation is to use ComfyUI with SD3.
What are the hardware requirements for using SD3 in ComfyUI?
-The hardware requirements for using SD3 in ComfyUI include a graphics card with sufficient VRM to handle the model size plus the additional clips, and it's advised not to enable T5 if the graphics card has low VRM.
What was the result of testing SD3's text generation ability with a prompt for a sheep with a hat?
-SD3 correctly generated the image of a sheep with a yellow hat that said 'Mimi' on its head, demonstrating its text generation ability.
What are some of the flaws observed in SD3's performance?
-Some flaws observed in SD3's performance include poor generation of human figures and issues with certain prompts leading to broken or scary images.
What is the future outlook for SD3 and what factors will influence it?
-The future of SD3 depends on the adoption and speed of third-party models, fine-tuning, and the development of control mechanisms like Laura and ControlNet.
Why can't the Pony series model author adapt to SD3?
-Due to license issues, the Pony series model author, Asly Har, confirmed they cannot adapt their model to SD3.
Outlines
🚀 Launch of Stability AI's SD3
Stability AI's much-anticipated SD3 was expected to follow the success of its predecessors, SD 1.5 and SDXL. However, the company faced challenges with its CEO, Emad Moake, stepping down and the core research team resigning due to a free open-source business model that led to funding issues. Despite these setbacks, SD3 was officially released on June 12th as promised. The video will explore the new features of SD3, how to download and install it, and compare its image quality and usage method to MJ. It will also cover the hardware requirements and provide a hands-on demonstration using Comfy UI.
🔍 SD3's Features and Performance Review
The video script discusses the features and performance of Stability AI's newly released SD3 model. It highlights SD3's photorealistic effects, prompt adherence, and improved text generation capabilities. The script also mentions the new architecture, the multimodal diffusion Transformer (DIT), which underpins these features. The video demonstrates the model's capabilities through hands-on testing using Comfy UI, comparing different model files and their requirements. It notes that while SD3 performs well in certain areas, such as spatial relationships and text generation, it struggles with generating human figures. The video concludes with a discussion of workflow customization and the potential for future fine-tuned versions of SD3, as well as a mention of licensing issues affecting the adaptation of the Pony series model to SD3.
Mindmap
Keywords
💡SD3
💡ComfyUI
💡Photorealistic effect
💡Prompt adherence
💡Text generation
💡Multimodal diffusion Transformer (DIT)
💡Checkpoints
💡Hardware requirements
💡Fine-tuned versions
💡Third-party models
💡License issues
Highlights
Stability AI announced the upcoming release of SD3 in February, expected to be widely used and highly anticipated.
SD3 faced challenges with the company's founder and CEO stepping down and the core research team resigning.
Funding difficulties due to the free open-source business model put Stability AI's financial situation in jeopardy.
SD3 was officially released on June 12th as scheduled, despite the company's internal issues.
SD3 medium model showcases excellent photorealistic effects, making real-world images completely photo-level realistic.
The model demonstrates prompt adherence, understanding complex prompts involving spatial relationships and compositional elements.
SD3 shows an evident improvement in text generation with no artifacts or spelling errors.
The new architecture used by SD3 is the multimodal diffusion Transformer (DIT), responsible for its advantages.
The official recommendation for using SD3 is through ComfyUI.
Three checkpoints of the SD3 model were released, with the smallest being 4.34 GB and requiring separate clip downloads.
An extra triple clip loader node is needed for using the SD3 medium model in ComfyUI, offering flexibility.
The largest SD3 model includes all necessary components, making it a one-stop solution for users.
ComfyUI was updated to support SD3, enhancing user experience.
SD3 medium model and all three clips are recommended for flexibility and functionality.
SD3 correctly generated text on a sheep's hat, demonstrating its text generation ability.
SD3 has been criticized for poor performance in generating human figures.
Despite flaws, SD3's understanding of spatial relationships and prompts is accurate.
The future of SD3 depends on the adoption and speed of third-party models and control mechanisms.
Due to license issues, the Pony series model author cannot adapt to SD3.