* This blog post is a summary of this video.

Stable Video Diffusion: Generating Moving Images from a Single Photo

Author: enigmatic_eTime: 2024-03-23 15:30:01

Table of Contents

Introduction to Stable Video Diffusion

Stable Video Diffusion is an exciting new AI technique that can generate short video clips from a single input image. This technology leverages diffusion models, which have become very popular in recent years for creating realistic and diverse synthetic media.

At a high level, Stable Video Diffusion works by predicting a sequence of video frames that plausibly extend and animate the static input image. The diffusion model is trained on large datasets of videos to learn realistic motions and transformations that it can then apply to novel inputs.

What is Stable Video Diffusion?

Stable Video Diffusion is a novel machine learning method that produces short, seamless video clips from a single input image. It builds upon recent advances in diffusion models and video prediction to generate smooth, diverse motions and textures that continue an input scene. The technology was conceived and developed by stability.ai, a San Francisco-based AI startup. Their researchers trained specialized diffusion models on large datasets of videos scraped from the internet, teaching the models to predict realistic motions and scene dynamics that can be added to still images.

How Does It Work?

Under the hood, Stable Video Diffusion uses a type of neural network called a diffusion model. These models are trained by exposing them to millions of natural videos during the learning process. When given a new still image, the diffusion model analyzes the visual features and patterns, imagination new ways that the scene could plausibly continue as a video. It generates one frame at a time, iteratively refining and validating each frame to ensure temporal consistency and realism. This predictive capability comes from the model's training - by internalizing a deep understanding of how real-world scenes transform over time from its video dataset, the model gains an intuition for generating smooth motions and natural video properties when animating novel input images.

Installing Stable Video Diffusion Models in Comfy UI

To start experimenting with Stable Video Diffusion, you'll first need to set up the models within Comfy UI, a popular interface for generating AI art. The exact steps are:

  1. Download the Stable Diffusion video model files from the links in the video description

  2. Update your Comfy UI manager to the latest version

  3. Restart Comfy UI to refresh the available models

  4. Load the Stable Video Diffusion workflow into your Comfy UI workspace

Downloading the Models

There are a few model variants available for Stable Video Diffusion. For this tutorial, we'll download two options:

  • SVD-XT, which generates 25 video frames
  • SVD, which creates 14-frame videos These models have different tradeoffs in smoothness vs. speed, so it's useful to try both. Simply visit the download links in the video description and save the model files wherever you store your Comfy UI checkpoints.

Updating and Restarting Comfy UI

Before the new models can be used, we need to update Comfy UI's model manager to the latest version. You can trigger an update check via the manager panel. After updating concludes, close and restart the Comfy UI application. This will refresh the set of available models and workflows, allowing the Stable Video Diffusion options to be loaded in the next step.

Loading the Workflow

With the models installed and UI restarted, we can now import the Stable Video Diffusion workflow. This adds preconfigured nodes for running the models with reasonable default settings. Simply drag the JSON file from the video description into your Comfy UI browser window. The workflow will automatically be added to your workspaces. You can now start experimenting with generating AI videos!

Generating a Video from a Single Image

Running Stable Video Diffusion on an image is straightforward with the preloaded workflow. Here are the key steps:

  1. Select a source image from Comfy UI outputs or drag in any JPEG/PNG file

  2. Pick the Stable Video Diffusion model variant to use

  3. Tweak parameters like seed and motion properties (optional)

  4. Execute the video generation process

  5. Review and refine the resulting AI animation

Choosing an Input Image

Stable Video Diffusion can ingest any standard image format like JPG or PNG. Good starting images have defined subjects against relatively simple backgrounds. You can use an existing Comfy UI output or drag in external image files. Pick visually interesting images with a clear focal point for the best animations.

Adjusting Parameters

The workflow comes preloaded with reasonable defaults, but adjusting params like the motion bucket size and random seed can produce widely varying videos. As a start, try increasing the motion bucket setting to encourage more scene transformation. Locking the random seed also helps compare setting changes without other variation.

Experimenting with Different Settings

Don't be afraid to push settings like motion factors and augmentation level to extremes! The outcomes can be surprisingly impressive as the model compensates in creative ways. You can also switch between the 14-frame and 25-frame model variants to explore different smoothness and velocity profiles in the generated motions.

Tips for Creating Better Stable Diffusion Videos

Generating quality AI animations with Stable Video Diffusion requires some experimentation. Here are a few key tips:

  1. Adjust the motion bucket setting for more or less transformation

  2. Modulate augmentation level to control distortion artifacts

  3. Fine-tune noise reduction to balance detail against flicker

  4. Lock random seeds across setting changes for controlled testing

Varying the Motion Bucket

The "motion bucket" parameter directly controls how much scene transformation occurs across video frames. Lower values around 100-200 give subtle motions. Pushing towards 500+ triggers wild mutations - great when you want sudden video effects or transitions. Find a middle ground for smooth scene evolution.

Changing the Augmentation Level

Higher augmentation mixes in more distortion and color shifting during the generation process. This makes motions more varied but can introduce undesirable artifacts. Try lowering from the default 0.4 down to 0.2-0.3 if you notice strange deformations in your videos for cleaner outputs.

Adjusting Denoising

The included workflow features optional video denoising to smooth out flicker across frames. More noise reduction softens sudden changes but blurs intricate textures. Tune this vs. the augmentation level to hit the sweet spot where videos stay sharp without severe frame-to-frame variation.

Locking the Random Seed

Stable Diffusion relies on randomness during video generation, meaning tweaked settings can produce widely varying results. Check the "Lock Seed" toggle before adjusting parameters like motion and augmentation. This isolates their exact effects by fixing variation from the random seed.


Stable Video Diffusion opens new creative frontiers by extending AI image generation into flawless synthetic video. With just a little guidance, these models can produce beautiful, smoothly animated scenes from still inputs.

While the technology is still in its early days, it's clear Stable Diffusion videos will become a mainstream media format as methods continue rapidly improving. The future looks bright - and full of unique AI motion!


Q: What images work best for stable video diffusion?
A: Images with clear foreground elements against simpler backgrounds tend to produce better stable diffusion videos. Try portraits, anime/cartoon art, or images with distinct shapes and forms.

Q: How do I make my video longer?
A: You can generate up to 25 video frames in the default model. Adjust the number of frames parameter to create longer videos.

Q: Why is my video output low quality?
A: Try reducing the augmentation level and denoising parameters. Lower values can help reduce artifacts and noise.

Q: Can I add text prompts?
A: Unfortunately the current workflow does not allow adding text prompts to influence the generated video.

Q: How detailed can the movements be?
A: The motion is randomly generated so the level of detail varies. Adjusting parameters like the motion bucket can increase the complexity of movements.

Q: Do I need a GPU?
A: Yes, stable video diffusion requires a powerful GPU for image generation and processing.

Q: What AI engine is used?
A: The workflow is powered by Automatic1111's Stable Diffusion model.

Q: Can I use my own AI models?
A: Not directly, as the workflow is designed for Stable Diffusion. But the concepts can be adapted to other models.

Q: Is there an online demo?
A: Some sites offer online stable video diffusion, but options are limited compared to running locally.

Q: Can I use DALL-E images as input?
A: Yes, as long as you have an image file you can use it as input for stable video diffusion.