* This blog post is a summary of this video.

Merging Images with SD and CLIP for Unique AI Art

Author: Scott DetweilerTime: 2024-03-23 11:25:00

Table of Contents

Introduction to Merging Images with AI

This blog post will explore an exciting new AI technique for combining two separate images into a third, merged image. While not a photorealistic merge like you would create in Photoshop, this technique blends the essence and souls of both images into a new conceptual image.

We'll cover the background on how this new AI capability works, as well as outline the requirements and models needed to try it yourself.

How This New AI Technique Works

This new image merging technique utilizes the power of Stable Diffusion models along with CLIP image encoders. Specifically, the SDXL model has capabilities similar to an Upsclip model, allowing it to function with CLIP image encodings as conditional inputs. By encoding two images with CLIP and then adjusting the conditioning strengths, we can control how much of each image's essence flows into the final generated image. Adding text prompts on top provides another way to guide the outcome.

Requirements and Models Needed

To follow along with this image merging technique, you'll need access to the SDXL Stable Diffusion model. You'll also need a CLIP image encoder model. The blog post includes links to download the recommended models into Automatic1111 WebUI. With WebUI set up and these models downloaded, you'll have everything needed to start experimenting with merging your own images.

Step-by-Step Process for Merging Images

Now that we've covered the background on how this new AI image merging works, let's walk through the step-by-step process for setting it up and generating your own merged images.

We'll build the graph in WebUI, configure the Stable Diffusion and CLIP models, encode our input images, adjust the strength conditioning, and optionally add text prompts to help guide the outcome.

Setting Up the SD Graph

First, set up a Stable Diffusion graph in WebUI and configure it to use your SDXL base model. Be sure to use SDXL specifically, as this model has the Upsclip-like capabilities needed. Also load in your downloaded CLIP image encoder model into the Clip Models section of WebUI. We'll be using this model to encode our input images in the next steps.

Adding and Encoding Images

Next, add your two input images to the graph using the Image module. These can be any images you want to merge, whether AI-generated, stock photos, your own creations, etc. Bring out the Clip Encoder module and connect each image into it. This will encode each image individually based on what the CLIP model sees in the images. We'll use this encoded data as conditional inputs.

Configuring Sampling and Settings

With the images encoded, go into the Stable Diffusion K-Sampler module and adjust the sampling settings. Items to configure here include: steps, CFG scale, sampler method (try DPM++ SDE), scheduler, etc. The recommended settings provide a good starting point, but you can tweak based on your preferences and hardware capabilities.

Adjusting Image Weights

A core technique with this merged image approach is controlling the conditioning strength of each input image. This determines how much essence from image A vs image B makes it into the final output. Use the Unclip Strength modules to adjust the conditioning values of each image. For example, set one image to 0.9 strength and the other to 0.7 to have the first image feature more prominently.

Adding Text Prompts (Optional)

You can further guide the merged image by adding text prompts. Attach prompts to the positive prompt input of the Stable Diffusion sampler. The text prompt acts as another conditioning input, along with the two encoded images. Adjust prompt weighting accordingly if needed. Text prompts work well for pushing the image in a certain stylistic direction.

Experimenting with Image Merging

With the graph set up and models configured, now the fun begins! Here are some tips for experimenting with different image combinations and continuing to refine your merged outputs.

Trying Different Image Combinations

Feel free to try merging all kinds of random image combinations to see what intriguing results you can get. Pick images with very different styles or themes and let the AI interpret the essences in new creative ways. You might find unexpected matches that produce something uniquely interesting when put together into the SDXL Stable Diffusion model with this technique.

Adjusting Weights and Prompts

As you experiment, continue tweaking the conditioning input strengths of each encoded image to emphasize different aspects. Adjust text prompts as well to help guide the output. Getting the balance of weights and prompts right can take some trial and error, but have fun with it! Notice how even small adjustments dramatically affect what comes through from each input image.

Conclusion and Next Steps

This new AI technique for merging separate images opens up amazing creative possibilities within Stable Diffusion and CLIP. With the power to combine disparate images into imaginative new blended art, the potential is unlimited.

Keep refining your process, try wild image combinations, and adjust the influence of each input. We're only scratching the surface of what's achievable. Have fun with image merging!


Q: What models do I need for image merging?
A: You need an SD model like sdxl and a CLIP vision model. Links are provided in the post.

Q: Do the images have to be AI generated?
A: No, you can use any images including photos.

Q: How are the images combined?
A: CLIP analyzes and encodes the images which SD then merges based on the encodings and weights.

Q: Can I merge more than 2 images?
A: Yes, you can link as many images as your computer can handle.

Q: Do image sizes matter?
A: No, CLIP will encode the images regardless of size.

Q: Why add text prompts?
A: Adding prompts can help guide the merging process in a specific direction.

Q: How do I adjust merging?
A: Play with the image encoding weights to shift the balance.