Stable Diffusion: DALL-E 2 For Free, For Everyone!

Two Minute Papers
6 Sept 202210:59

TLDRStable Diffusion, an open-source AI model, revolutionizes image generation by making model weights and source code accessible. It enables users to create and customize high-quality images and videos from text prompts, offering features like dreaming, interpolation, and image inpainting. This breakthrough allows tinkering with internal parameters, leading to unique creations and animations, all achievable on consumer-grade hardware, heralding a new era of democratized AI-driven art.


  • 🤖 Stable Diffusion is an AI model that can generate images and videos from text prompts, offering an alternative to closed solutions like DALL-E 2.
  • 🔍 The model weights and full source code for Stable Diffusion are publicly available, allowing users to modify and experiment with the AI's internal parameters.
  • 🛠️ Users can now 'tinker' with the AI, adjusting parameters to create variations and interpolate between images for smoother transitions.
  • 🎨 Stable Diffusion is capable of producing high-quality images, including fantasy landscapes, tree houses, and realistic human figures.
  • 🖼️ The AI can create collages and perform image inpainting, blending images into a coherent whole even when there is no space between them.
  • 📚 It can generate a series of similar images and stitch them together to create hypnotic 'random noise walks' videos.
  • 👁️ With additional work, Stable Diffusion can produce animations by creating images with slight variations and blending them together.
  • 🖼️‍💻 The AI can generate portraits and interpolate between them, creating videos that show a progression of changes.
  • 🔄 Variant generation allows the AI to repaint an input image in different styles, offering a range of creative possibilities.
  • 🎨 Users can select parts of an image they like and discard the rest, creating unique montages from the AI-generated content.
  • 🌐 The open-source nature of Stable Diffusion democratizes AI-based image generation, making it more accessible to a wider audience.
  • 💡 The development of Stable Diffusion cost approximately $600,000, indicating a significant reduction in the cost of creating advanced AI models compared to the past.

Q & A

  • What is the significance of Stable Diffusion in the context of AI-driven image generation?

    -Stable Diffusion is significant because it offers a free and open-source solution for AI-based image generation, providing access to model weights and full source code, which allows users to modify and experiment with the AI unlike closed solutions like DALL-E 2 and Imagen.

  • How does the diffusion-based model work in creating images from text prompts?

    -Diffusion-based models start with a noise pattern that gradually transforms into an image that matches the text prompt. This process involves slowly adjusting the pixels to resemble the input text prompts more closely, eventually creating the desired image.

  • What are the two main reasons mentioned in the script that make Stable Diffusion so amazing?

    -The two main reasons are: 1) Users can tinker with the internal parameters of the AI, allowing for adjustments and customizations not possible with closed solutions. 2) The availability of the model for personal use, enabling users to experiment and create their own variations of image generation.

  • How can Stable Diffusion be used to create videos from images?

    -By making small changes to the internal parameters and creating a series of similar outputs, these images can be stitched together to form a video, allowing for smoother transitions and exploration of ideas.

  • What is image inpainting, and how does Stable Diffusion utilize this feature?

    -Image inpainting is a process where a selected region of an image is deleted, and the AI fills it in with information based on the surrounding context. Stable Diffusion uses this to create coherent images from separate elements, blending them seamlessly.

  • Can Stable Diffusion generate animations in addition to static images?

    -Yes, with some additional work, Stable Diffusion can generate animations. For example, by creating the same image with different states (like eyes open and closed) and blending them, it can produce an animated sequence.

  • How does Stable Diffusion handle the creation of fantasy imagery, such as landscapes and tree houses?

    -Stable Diffusion demonstrates competence in creating fantasy imagery by generating detailed and imaginative landscapes and tree houses, showcasing its ability to interpret and visualize abstract concepts.

  • What is the 'First Law of Papers' mentioned in the script, and why is it important?

    -The 'First Law of Papers' states that research is a process. It's important because it encourages looking beyond the current state of technology and imagining the potential advancements in the future, as seen in the rapid evolution from DALL-E 1 to DALL-E 2.

  • What does the script suggest about the future of AI-based image generation?

    -The script suggests that the future of AI-based image generation will be characterized by increased democratization and affordability, with open-source competition driving innovation and making the technology more accessible.

  • How much did it cost to train Stable Diffusion, and what does this imply for the future of AI development?

    -It cost approximately 600 thousand dollars to train Stable Diffusion. This implies that the cost of developing AI models has significantly decreased from tens of millions of dollars, indicating a more accessible future for AI development.



🎨 AI-Driven Image Generation with Stable Diffusion

The script introduces the audience to the capabilities of AI in generating images and videos, highlighting the emergence of OpenAI's DALL-E 2 and its counterparts from Google. It emphasizes the novelty of Stable Diffusion, an open-source alternative that provides access to model weights and source code, allowing users to modify internal parameters and create customized outputs. The paragraph showcases the potential of this technology through various examples, such as creating videos from images, interpolating between images for smooth transitions, and generating fantasy imagery. It also touches on the ability to perform image inpainting to create coherent collages from separate images.


🛠 Customizing and Experimenting with AI Image Generation

This paragraph delves into the creative possibilities and technical advantages offered by Stable Diffusion. It discusses the ability to adjust the initial noise patterns to generate a series of similar images, which can be compiled into hypnotic videos. The script also mentions the potential for creating animations by blending images with slight variations, such as open and closed eyes. Furthermore, it covers the capability to interpolate between portraits and generate variant images based on input, emphasizing the user's newfound freedom to experiment with AI image generation. The paragraph concludes with an invitation for viewers to try Stable Diffusion themselves, hinting at the democratization of AI technology and the collaborative potential of open-source solutions.



💡AI-driven image generation

AI-driven image generation refers to the process where artificial intelligence algorithms create images based on textual descriptions or prompts. It's a form of synthetic content creation that has been revolutionized by models like DALL-E 2 and Stable Diffusion. In the video, this concept is central as it discusses the capabilities of these AI models to generate images from text prompts, illustrating the advancement in AI's ability to understand and visualize human language.

💡Diffusion-based models

Diffusion-based models are a type of generative model that starts with a noise pattern and gradually refines it to produce an image that matches a given text description. The script mentions this as the underlying technology for models like Stable Diffusion, emphasizing how these models transform random noise into coherent images that align with the textual input, showcasing the power of AI in image synthesis.

💡Model weights

In the context of AI, model weights are the parameters that the model learns during training to make predictions or generate outputs. The script highlights the significance of Stable Diffusion making its model weights publicly available, which allows users to understand, modify, and customize the AI's behavior, unlike closed solutions where these are kept proprietary.

💡Source code

The source code is the original code, usually written in a programming language, from which a computer program is built. The video emphasizes the importance of having the full source code available for Stable Diffusion, which enables transparency, community contributions, and further development by anyone interested in the technology.

💡Digital wrench

A 'digital wrench' is a metaphor used in the script to describe the ability to manipulate and adjust the internal parameters of a software or system, in this case, the AI model. It signifies the empowerment of users to modify how the AI operates, which is a key advantage of open-source solutions like Stable Diffusion.


In the script, 'dreaming' refers to the process of making slight alterations to the internal parameters of the AI model to generate a series of similar outputs, which can then be compiled into a video. This showcases the creative potential of AI in exploring ideas and generating smooth transitions between images.


Interpolation in this context is the process of creating a transition between two images by gradually morphing one into the other. The video script uses this term to describe how Stable Diffusion can generate a sequence of images that transition smoothly from one scene to another, creating a visual narrative or novel.

💡Image inpainting

Image inpainting is a technique used in image processing to fill in missing or selected parts of an image with new pixels that are consistent with the surrounding image content. The script mentions this feature of Stable Diffusion, which allows users to create a coherent image by blending separate generated images or filling in deleted areas.

💡Variant generation

Variant generation refers to the ability of an AI model to create different versions or variations of an image based on a single input. The video script illustrates this with examples of how Stable Diffusion can generate multiple interpretations of a scene, offering users a range of creative options to choose from.


A montage is a compilation of images or scenes arranged to create a composite whole. In the script, the term is used to describe how users can select parts of different images generated by Stable Diffusion and combine them to create a new, unique composition, demonstrating the flexibility of AI in artistic creation.

💡Consumer graphics card

A consumer graphics card is a type of computer hardware used for rendering images, videos, and animations, typically used in gaming and multimedia applications. The video script mentions the ability to run Stable Diffusion on such a card, indicating that advanced AI models can now be utilized on more accessible and affordable hardware by enthusiasts and creators.


AI has created images and videos of impressive quality that can potentially be run at home.

We are entering the age of AI-driven image generation with text prompts transforming into visual representations.

Diffusion-based models like DALL-E 2, Parti, and Imagen AIs are capable of creating highly creative images.

These models lack model weights and source code, making them closed solutions.

Stable Diffusion offers model weights and full source code, allowing for openness and tinkering.

Stable Diffusion enables adjusting internal parameters for customized image generation.

It allows for creating videos by stitching similar outputs together.

Interpolation between images creates smooth transitions for a novel-like visual experience.

Stable Diffusion excels in generating fantasy imagery, including landscapes and tree houses.

The model can create realistic human images, such as fairy princesses.

Collages can be created and blended into coherent images using image inpainting.

The model is sensitive to initial noise patterns, enabling creative manipulation for unique image sets.

Animations can be generated with additional work by blending images with different features.

Portraits can be created and interpolated to form videos with smooth or jumpy transitions.

Variant generation allows for repainting images in different styles based on input.

Images from Stable Diffusion can be selectively used to create impressive montages.

The open-source nature of Stable Diffusion invites experimentation and collaboration.

Stable Diffusion can be run on consumer graphics cards, making AI image generation more accessible.

The cost of training AI models like Stable Diffusion is decreasing, making advanced AI more democratized.

The development of smaller and cheaper models indicates a future of affordable AI image generation.