* This blog post is a summary of this video.

Cut Image Generation Time in Half with InvokeAI and Stable Diffusion on M1 Mac

Author: Sam RaziTime: 2024-03-23 00:50:00

Table of Contents

Introduction to Improving InvokeAI Speed and Performance Benchmarks

InvokeAI with Stable Diffusion can generate amazingly realistic images, but it does take time for the AI models to process each image. In this blog post, we'll explore some ways to optimize the configuration to improve InvokeAI speed on an M1 MacBook Pro. Specifically, we'll look at adjusting the cache size to reduce the lag when switching between the base SD model and refiner model.

We ran benchmark tests with a small 10GB cache versus a larger 20GB cache. Doubling the cache size cut the total image generation time in half, from about 1 minute 25 seconds down to only 45 seconds per image! Keep reading to understand the details and see if faster AI image generation would benefit your creative workflows.

Overview of Speed Improvements

When generating images with InvokeAI using Stable Diffusion, there are two main steps:

  1. The base SD model runs for a number of steps to create a draft image
  2. Then the SD refiner model kicks in to enhance and finalize the image The refiner helps make the end results look more coherent and realistic. However, there is a lag whenever InvokeAI has to switch between loading the base model versus the refiner model. If your system doesn't have enough free RAM to keep both models cached and loaded simultaneously,InvokeAI ends up wasting time repeatedly reloading each model. So the key to faster performance is...

Benchmarking InvokeAI Performance

To test the impact of cache size, we ran InvokeAI on an M1 Max MacBook Pro with the following benchmarks:

  • 10GB Max Cache Size: Total time 1 min 25 sec per image
  • 20GB Max Cache Size: Total time 45 sec per image As you can see, simply doubling the cache size cuts the processing time almost in half! The extra free RAM lets InvokeAI keep both models ready to go, eliminating the delays of reloading each one.

Tuning the InvokeAI Configuration File

The InvokeAI cache settings live inside the invokeai.yaml file that gets created automatically upon installation. Here are the two key settings and what they control:

  • max_cache_size: The total memory allotted for caching AI models like base SD and refiner for faster switching

  • max_vram_cache_size: Unused for Apple Silicon Macs which lack a dedicated VRAM GPU cache

Adjusting the Cache Size

The default max_cache_size is far too small at only 2GB. This forces constant reloading of models and hurts performance. For the M1 MacBook Pro 13", try raising it to at least 10GB based on your total RAM capacity. We used 20GB which provides ample extra RAM to cache both base and refiner models simultaneously.

Setting the VRAM Cache Size

MacBook Pros have fast unified memory architecture instead of separate CPU and GPU memory. So there is no dedicated VRAM section to configure. Make sure to always set max_vram_cache_size to 0 on Macs. Leaving this on default can cause out of memory issues during generation.

Comparing Benchmark Results

With the right cache settings tuned in invokeai.yaml, let's look at the huge real-world speed improvement:

10GB Cache Size Test

  • Total image generation time: 1 min 24 sec
  • Base SD model load time: 30 sec
  • Refiner model load time: 10 sec

20GB Cache Size Test

  • Total image generation time: 45 sec (2X speedup)
  • Almost no load time for either model
  • 45 sec combines model steps and generation only

Additional Performance Considerations

Image Resolution and Steps

Beyond cache settings, also consider lowering image resolution or model steps to improve speed if needed and your quality standards allow. 512x512 pixel images strike a good balance of quality and performance. Fore complex images, try 50 steps for base SD and 10 steps for refiner.

Scheduler and Model Choices

The LMS Karras scheduler tends to be faster than DPM Karras on M1 Macs. Also try sd-v1-4 over sd-v2 for potentially better speed without much quality tradeoff.

Conclusion and Summary

Tuning invokeai.yaml settings for ample cache size unlocks significantly faster Stable Diffusion performance on M1 Macs. With updated benchmarks, you can better set expectations around AI image generation speed.

Combined with adjustments to resolution, steps, and models, an optimized InvokeAI setup lets you create with fewer frustrations over slow processing. More iterations and experimentation makes for better end results. Enjoy exploring the creative possibilities!

FAQ

Q: How much faster is InvokeAI with a 20GB cache?
A: With a 20GB cache, InvokeAI was nearly twice as fast, generating images in 45 seconds instead of 1 minute 25 seconds.

Q: What causes the slow performance on M1 Macs?
A: The main performance bottleneck is the switching time between the Stable Diffusion base model and refiner model when the cache size is too small.

Q: Does image resolution impact generation time?
A: Yes, higher image resolutions require longer generation times per image.

Q: Can using different models or schedulers improve speed?
A: Possibly, but the cache tuning makes the biggest impact. Different models and schedulers may yield modest gains.