Stable Diffusion 3 HANDS ON! How Good Is It Really?

All Your Tech AI
18 Apr 202408:51

TLDRStability AI has just released Stable Diffusion 3 and its Turbo variant, accessible only via API through a partnership with Fireworks AI. Despite the high pricing for the API, the models' prompt adherence and image quality appear to live up to expectations, with the standard model outperforming the Turbo model in terms of detail and resolution. The release maintains Stability AI's commitment to open generative AI, promising model weights for self-hosting with a membership in the near future.

Takeaways

  • 🚀 Stability AI has released Stable Diffusion 3 and Stable Diffusion 3 Turbo, both available via API.
  • 🤝 Stability AI has partnered with Fireworks AI, an API platform for hosting and fast access to models like Stable Diffusion.
  • 💡 The model weights for self-hosting will be made available with a Stability AI membership in the near future.
  • 🌐 The user managed to set up Stable Diffusion 3 beta on Pixel Dojo within 3 hours after release.
  • 💰 The API pricing is relatively high, at about $10 per thousand credits, with Stable Diffusion 3 costing 6-12 credits per image.
  • 📸 Image generation with Stable Diffusion 3 is approximately 32 times more expensive than with Stable Diffusion XL 1.0.
  • 📈 A Pro Plan starting at $9.95 per month offers unlimited usage of Pixel Dojo and image generation.
  • 🎨 The quality of images generated by Stable Diffusion 3 is consistent with the examples displayed on Stability AI's website.
  • 🧐 The text coherence in images generated by AI has been a challenge, but Stable Diffusion 3 shows improvement in this area.
  • 🚀 The Turbo model of Stable Diffusion 3 is faster but produces lower resolution images.
  • 💬 Prompt adherence for positive prompts seems to be very good, reducing the need for negative prompts in image generation.

Q & A

  • What is the name of the latest release by Stability AI?

    -The latest release by Stability AI is called Stable Diffusion 3 and Stable Diffusion 3 Turbo.

  • How are Stable Diffusion 3 and Stable Diffusion 3 Turbo made available to users?

    -Both Stable Diffusion 3 and Stable Diffusion 3 Turbo are available via API and are hosted on an API platform called Fireworks AI.

  • What is the key commitment Stability AI has made regarding their generative AI?

    -Stability AI has committed to making the model weights available for self-hosting with a Stability AI membership in the near future.

  • How quickly was Stable Diffusion 3 beta set up on Pixel Doo after its release?

    -Stable Diffusion 3 beta was set up on Pixel Doo within 3 hours of its release.

  • What is the pricing structure for the Stable Diffusion 3 API?

    -The pricing for the API is based on credits, with about $10 per thousand credits. Generating an image with Stable Diffusion 3 costs 6 to 12 credits per image.

  • How does the cost of generating an image with Stable Diffusion 3 compare to Stable Diffusion XL 1.0?

    -Stable Diffusion 3 is about 32 times more expensive to generate an image than Stable Diffusion XL 1.0.

  • What is the starting price for a Pro Plan on Pixel Doo?

    -The starting price for a Pro Plan on Pixel Doo is $9.95 per month, which includes unlimited image generations.

  • What is the main concern when a new model like Stable Diffusion 3 is released?

    -The main concern is whether the images displayed on their website are cherry-picked to show the best results and not representative of the average output.

  • How does the quality of images generated by Stable Diffusion 3 compare to the images on their website?

    -The quality of images generated by Stable Diffusion 3 is quite good and does not seem to be too far off from the images displayed on their website.

  • What is a challenge that most AI generators have faced when it comes to text in images?

    -Most AI generators have struggled with text coherence, ensuring that the text in generated images is legible and contextually accurate.

  • How did Stable Diffusion 3 perform with text in images during the demonstration?

    -Stable Diffusion 3 showed mixed results with text in images. Some text was accurately represented, while other instances had text that was not fully coherent.

  • What additional feature can users play around with to improve image generation?

    -Users can experiment with negative prompts to refine the image generation process and achieve better results.

Outlines

00:00

🚀 Introduction to Stable Diffusion 3 and Turbo Models

Stability AI has launched Stable Diffusion 3 and its Turbo variant, accessible only via API. They have partnered with Fireworks AI for hosting and fast access. The model weights will be available for self-hosting with a Stability AI membership soon. The API pricing is relatively high, with costs around $10 per thousand credits, making image generation with Stable Diffusion 3 about 32 times more expensive than with Stable Diffusion XL 1.0. The speaker quickly implemented Stable Diffusion 3 beta on Pixel Doo, allowing users to generate images with prompts and negative prompts, and to choose between the two models. The speaker also discusses the cost of the Pro Plan and shares initial generated images to evaluate the model's performance without cherry-picking.

05:02

🖼️ Testing Image Generation and Text Coherence

The speaker tests the image generation capabilities of Stable Diffusion 3 and its Turbo model using various prompts from press releases to ensure the images are not cherry-picked. The results are compared to those on the website, and the speaker notes that the quality is consistent with what is advertised. The speaker is particularly interested in how the models handle text within images, which has been a challenge for AI generators. Despite some initial mishaps, the text coherence appears to be significantly improved compared to previous versions. The Turbo model is noted to be quicker but with lower quality and resolution. The speaker concludes that Stable Diffusion 3 largely meets expectations, with good prompt adherence and image quality, and suggests that negative prompts may be less necessary due to the improved performance.

Mindmap

Keywords

💡Stable Diffusion 3

Stable Diffusion 3 is an advanced AI model developed by Stability AI for generating images from textual descriptions. It represents a significant upgrade from previous versions, offering improved image quality and faster processing times. In the video, the host discusses the capabilities and performance of Stable Diffusion 3, highlighting its ability to generate detailed and coherent images based on prompts.

💡API

API stands for Application Programming Interface, which is a set of rules and protocols that allows software applications to communicate and interact with each other. In the context of the video, Stability AI has made Stable Diffusion 3 available via an API, which means users can access the image generation capabilities programmatically, typically for integration into other software or services.

💡Fireworks AI

Fireworks AI is mentioned as an API platform that partners with Stability AI to provide hosting and fast, stable access to AI models like Stable Diffusion 3. This partnership ensures that users can reliably and efficiently use the AI model for their image generation needs.

💡Model Weights

Model weights refer to the parameters within a machine learning model that are learned from the training data. In the video, it is mentioned that Stability AI plans to make the model weights of Stable Diffusion 3 available for self-hosting to members, which means that users with the appropriate technical skills and resources can run the model independently on their own servers.

💡Pixel Doo

Pixel Doo is a platform mentioned in the video where the host has implemented the Stable Diffusion 3 beta for users to generate images. It serves as an interface for interacting with the AI model without the need for direct API access, making it more accessible for a broader audience.

💡Prompt

A prompt is a textual description or a set of instructions given to an AI model to guide the generation of an image. In the context of the video, the host uses various prompts to demonstrate the capabilities of Stable Diffusion 3, such as generating an anthropomorphic tortoise or a man with a retro TV for a head.

💡Negative Prompt

A negative prompt is a type of prompt that specifies what should be avoided or not included in the generated image. The host mentions the option to provide a negative prompt in Pixel Doo, which can help refine the image generation process by excluding unwanted elements.

💡Credits

In the context of the video, credits refer to a form of virtual currency used to pay for the usage of the Stable Diffusion 3 API. The host discusses the pricing structure, mentioning that it costs about $10 per thousand credits, with different credit costs associated with generating images using Stable Diffusion 3 and its Turbo version.

💡Pro Plan

The Pro Plan is a paid subscription plan mentioned in the video that offers unlimited usage of Pixel Doo, including access to the Stable Diffusion 3 model. It is a way for users to gain full access to the platform's features for a monthly fee.

💡Text Coherence

Text coherence refers to the ability of the AI model to understand and incorporate textual elements accurately into the generated images. The host tests this feature by generating images with text on them, such as a cardboard box with a specific phrase, to evaluate how well the model handles text within its outputs.

💡Cherry Picking

Cherry picking is the practice of selecting only the best or most favorable results to present, often to make a product or service appear better than it is in reality. The host discusses the concern of cherry picking when evaluating the images generated by Stable Diffusion 3, aiming to test whether the images displayed on the website are representative of the typical output of the model.

Highlights

Stability AI has released Stable Diffusion 3 and Stable Diffusion 3 Turbo, available only via API.

Partnership with Fireworks AI for hosting and fast access to the models.

Commitment to open generative AI, with model weights to be made available for self-hosting with a Stability AI membership soon.

Stable Diffusion 3 beta was set up on Pixel Doo within 3 hours.

Users can generate images with prompts, optionally using negative prompts, and choose between Stable Diffusion 3 and Turbo versions.

API pricing is relatively high at about $10 per thousand credits.

Stable Diffusion 3 costs 6 to 12 credits per image, making it 32 times more expensive than Stable Diffusion XL 1.0.

A Pro Plan starting at $9.95 per month offers unlimited usage of Pixel Doo.

The quality of images generated by the model is comparable to those displayed on the website, suggesting minimal cherry-picking.

The model's prompt adherence is strong, producing images that closely match the input prompts.

Text coherence in images is generally good, although some inconsistencies were observed.

Stable Diffusion 3 Turbo model is faster but produces lower quality images compared to the standard model.

The Turbo model struggles slightly more with text in images but still provides reasonable results.

The standard model of Stable Diffusion 3 performs well with complex prompts that include text.

Stable Diffusion 3 seems to live up to the hype, with good prompt adherence and image quality.

Negative prompts were not used in the tests, but could be an area for further exploration.

Pixel Doo offers a Pro membership for $9.95 a month, which includes unlimited generations and access to other Stable Diffusion models.

More features and models will be added to Pixel Doo in the future.