Cut Image Generation Time in Half with InvokeAI and Stable Diffusion on M1 Mac
Table of Contents
- Introduction
- Tuning the InvokeAI Configuration File
- Comparing Benchmark Results
- Additional Performance Considerations
- Conclusion and Summary
Introduction to Improving InvokeAI Speed and Performance Benchmarks
InvokeAI with Stable Diffusion can generate amazingly realistic images, but it does take time for the AI models to process each image. In this blog post, we'll explore some ways to optimize the configuration to improve InvokeAI speed on an M1 MacBook Pro. Specifically, we'll look at adjusting the cache size to reduce the lag when switching between the base SD model and refiner model.
We ran benchmark tests with a small 10GB cache versus a larger 20GB cache. Doubling the cache size cut the total image generation time in half, from about 1 minute 25 seconds down to only 45 seconds per image! Keep reading to understand the details and see if faster AI image generation would benefit your creative workflows.
Overview of Speed Improvements
When generating images with InvokeAI using Stable Diffusion, there are two main steps:
- The base SD model runs for a number of steps to create a draft image
- Then the SD refiner model kicks in to enhance and finalize the image The refiner helps make the end results look more coherent and realistic. However, there is a lag whenever InvokeAI has to switch between loading the base model versus the refiner model. If your system doesn't have enough free RAM to keep both models cached and loaded simultaneously,InvokeAI ends up wasting time repeatedly reloading each model. So the key to faster performance is...
Benchmarking InvokeAI Performance
To test the impact of cache size, we ran InvokeAI on an M1 Max MacBook Pro with the following benchmarks:
- 10GB Max Cache Size: Total time 1 min 25 sec per image
- 20GB Max Cache Size: Total time 45 sec per image As you can see, simply doubling the cache size cuts the processing time almost in half! The extra free RAM lets InvokeAI keep both models ready to go, eliminating the delays of reloading each one.
Tuning the InvokeAI Configuration File
The InvokeAI cache settings live inside the invokeai.yaml file that gets created automatically upon installation. Here are the two key settings and what they control:
-
max_cache_size: The total memory allotted for caching AI models like base SD and refiner for faster switching
-
max_vram_cache_size: Unused for Apple Silicon Macs which lack a dedicated VRAM GPU cache
Adjusting the Cache Size
The default max_cache_size is far too small at only 2GB. This forces constant reloading of models and hurts performance. For the M1 MacBook Pro 13", try raising it to at least 10GB based on your total RAM capacity. We used 20GB which provides ample extra RAM to cache both base and refiner models simultaneously.
Setting the VRAM Cache Size
MacBook Pros have fast unified memory architecture instead of separate CPU and GPU memory. So there is no dedicated VRAM section to configure. Make sure to always set max_vram_cache_size to 0 on Macs. Leaving this on default can cause out of memory issues during generation.
Comparing Benchmark Results
With the right cache settings tuned in invokeai.yaml, let's look at the huge real-world speed improvement:
10GB Cache Size Test
- Total image generation time: 1 min 24 sec
- Base SD model load time: 30 sec
- Refiner model load time: 10 sec
20GB Cache Size Test
- Total image generation time: 45 sec (2X speedup)
- Almost no load time for either model
- 45 sec combines model steps and generation only
Additional Performance Considerations
Image Resolution and Steps
Beyond cache settings, also consider lowering image resolution or model steps to improve speed if needed and your quality standards allow. 512x512 pixel images strike a good balance of quality and performance. Fore complex images, try 50 steps for base SD and 10 steps for refiner.
Scheduler and Model Choices
The LMS Karras scheduler tends to be faster than DPM Karras on M1 Macs. Also try sd-v1-4 over sd-v2 for potentially better speed without much quality tradeoff.
Conclusion and Summary
Tuning invokeai.yaml settings for ample cache size unlocks significantly faster Stable Diffusion performance on M1 Macs. With updated benchmarks, you can better set expectations around AI image generation speed.
Combined with adjustments to resolution, steps, and models, an optimized InvokeAI setup lets you create with fewer frustrations over slow processing. More iterations and experimentation makes for better end results. Enjoy exploring the creative possibilities!
FAQ
Q: How much faster is InvokeAI with a 20GB cache?
A: With a 20GB cache, InvokeAI was nearly twice as fast, generating images in 45 seconds instead of 1 minute 25 seconds.
Q: What causes the slow performance on M1 Macs?
A: The main performance bottleneck is the switching time between the Stable Diffusion base model and refiner model when the cache size is too small.
Q: Does image resolution impact generation time?
A: Yes, higher image resolutions require longer generation times per image.
Q: Can using different models or schedulers improve speed?
A: Possibly, but the cache tuning makes the biggest impact. Different models and schedulers may yield modest gains.