InvokeAI 3.4 Release - LCM LoRAs, Multi-Image IP Adapter, SD1.5 High Res, and more

Invoke
22 Nov 202315:34

TLDRThe video discusses the release of version 3.4, highlighting new features such as the LCM technique for image generation, high-resolution fixes, and the ability to use control nets and TOI adapters simultaneously. It also introduces multi-image IP adapters for blending concepts, improvements in speed and efficiency, and acknowledges the contributions of various community members in translations and bug fixes.

Takeaways

  • 🚀 Introduction of LCM (Latent Consistency Model) for optimizing diffusion process with a new scheduler, reducing steps needed to generate images.
  • 📷 Quality trade-off with LCM: While more efficient, there is a slight loss of detail in image generation.
  • 🌐 Showcase of model-generated images before and after applying the LCM scheduler to illustrate quality differences.
  • 🔧 Adjusting CFG scale affects adherence to the prompt and can introduce saturation and quality adjustments in the generated images.
  • 🔄 High-resolution fix feature returns, enabling upscaling of images with reduced repeating patterns.
  • 🎨 Control net and TOI adapter features are now compatible, allowing for more versatile image generation.
  • 🌈 Use of multi-image IP adapters, termed 'instant lauras', for blending concepts and creating complex compositions.
  • 🔧 Workflow editor enhancements with new nodes for advanced users, including the ability to pass multiple images to the same IP adapter.
  • 🌐 Contributor acknowledgments for their work on various features, bug fixes, and translations in the 3.4 release.
  • 📈 Performance improvements in 3.4, particularly in loading times for lauras and other text encoders.
  • 🔜 Tease of more updates to come, encouraging users to stay engaged with the platform and community.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is the release of version 3.4 of a certain software, with a focus on explaining its new features and improvements.

  • What does LCM stand for and what is its purpose?

    -LCM stands for Latent Consistency Modeling, a new technique introduced in version 3.4 to optimize and make the diffusion process more efficient, using a new scheduler called the LCM scheduler.

  • What is the LCM scheduler and how does it work?

    -The LCM scheduler is a new component of the software that reduces the number of steps needed to generate an image, making the generation process extremely efficient. It achieves this by altering the way the software processes images, though it may result in some loss of detail.

  • What is the significance of the LCM Laura and where can it be downloaded?

    -LCM Laura is a tool that works with both SDXL and SD15 models to enhance the performance of the LCM scheduler. It can be downloaded from the Latent Consistency Hugging Face repository.

  • How does changing the CFG scale affect the image generation?

    -Altering the CFG scale impacts the adherence to the prompt and the overall quality of the generated images. Higher values increase adherence but may lead to saturation and quality adjustments, while lower values may result in less detail but a more varied output.

  • What is the high-resolution fix feature and how does it work?

    -The high-resolution fix is a feature that allows users to generate larger images from the linear UI without needing to go into the workflow. It increases the size of the original generation to the selected dimensions by first generating at a lower resolution and then upscaling and denoising at the higher resolution using techniques like ESR Gan.

  • Can control net and TTOI adapter features be used simultaneously in version 3.4?

    -Yes, in version 3.4, the control net and TTOI adapter features are no longer mutually exclusive, meaning they can be used at the same time on the same generation.

  • What is the workflow editor and what new features has it been updated with?

    -The workflow editor is a part of the software where advanced users can manipulate and create complex image generations. It has been updated with new nodes for multi-image IP adapters, allowing users to blend different concepts together more easily.

  • How does the multi-image IP adapter work?

    -The multi-image IP adapter allows users to pass in multiple images of the same concept into the same IP adapter. This helps the software to blend the average of those images into the final generation, creating a more coherent and blended concept in the output image.

  • What are some other smaller features introduced in version 3.4?

    -Other smaller features in version 3.4 include the ability to recall VAE metadata for any generation, the addition of RGBA value fields in the color picker, and numerous speed increases for various functions.

  • How can users stay updated with future releases and improvements?

    -Users are encouraged to stay tuned for more updates, like and subscribe to the software's channel, and join the community on Discord for the latest news and discussions.

Outlines

00:00

🚀 Introduction to Release 3.4 and LCM Scheduler

The video begins with an introduction to the late release of version 3.4 and an overview of the new features. The first feature discussed is the LCM (Latent Consistency Model) scheduler, a new technique for optimizing the diffusion process. The presenter explains that while LCM increases efficiency, it may result in some loss of detail. To illustrate this, four images are generated using the standard process and compared with four others generated using the LCM scheduler with specific settings adjustments. The presenter also mentions the availability of LCM Laura, which can be downloaded from the hugging face repo and works with both SDXL and SD15 models. The segment concludes with a brief mention of other new features in 3.4.

05:01

🌟 High-Resolution Fix and Control Net Features

This paragraph delves into the return of the high-resolution fix feature in 3.4, which allows for the upscaling of images while avoiding common issues like repeating patterns. The presenter demonstrates this by generating a high-resolution image of a cyborg king. Additionally, the paragraph discusses the new ability to use control net and TTOI adapter features simultaneously. The presenter shows how the TTOI color adapter can be used to modify the color of an image, and how adjusting the InStep percentage can mitigate pixelation caused by the control net adapter. The summary also touches on the importance of understanding the impact of these features on image generation.

10:03

🎨 Advanced Workflow Editor Features and Multi-Image IP Adapters

The focus of this paragraph is on the advanced features added to the workflow editor in version 3.4. The presenter introduces multi-image IP adapters, which allow users to blend different concepts by adding multiple images to the same IP adapter. This is demonstrated by combining images of spiders with concept art sketches of a Yeti-like creature. The paragraph also explains how adjusting the weight of each concept can significantly alter the resulting image. The presenter emphasizes the creative potential of blending concepts and provides a detailed breakdown of the workflow setup for achieving this. Additionally, the paragraph mentions other minor features such as the ability to recall vae metadata for generations, the inclusion of RGBA value fields in the color picker, and contributions from various community members.

15:05

🔧 Performance Improvements and Community Contributions

The final paragraph of the video script highlights the various speed improvements made in 3.4, particularly for Lauras and other text encoder loading times. It also mentions backend updates that have increased the efficiency of certain functions within the engine. The presenter expresses gratitude for the community contributions that have made these updates possible, including translations into Dutch, Italian, and Chinese. The video concludes with an invitation for viewers to join the Discord community and stay updated on future releases.

Mindmap

Keywords

💡LCM

LCM stands for Latent Consistency Modulation, a new technique introduced in the video for optimizing the diffusion process in image generation. It involves using a special scheduler called the LCM scheduler, which reduces the number of steps needed to generate an image, making the process more efficient. However, it may result in some loss of detail compared to non-LCM generated images. In the context of the video, the presenter demonstrates how to use LCM and its impact on image quality and generation speed.

💡CFG scale

The CFG scale refers to the Control Flow Graph scale, a parameter that influences the adherence of the generated image to the input prompt. Adjusting the CFG scale can affect how closely the output matches the intended concept. In the video, the presenter experiments with different CFG scale values, noting that higher values increase adherence to the prompt but can also lead to saturation and quality adjustments in the generated images.

💡High-resolution fix

The high-resolution fix is a feature that allows for the upscaling of images generated by the model. It works by first creating the core composition at a lower resolution and then using techniques like ESR Gan or straight resizing to increase the image size to the desired dimensions. This feature is particularly useful for achieving larger images without losing quality and is available on SD 1.5 models.

💡Control net

A control net is a feature that enables the user to have more fine-grained control over the generation process by adjusting various parameters and settings. In the context of the video, the presenter mentions that the control net feature and the T-to-I adapter feature are no longer mutually exclusive, meaning they can be used simultaneously for more advanced and customized image generation.

💡T-to-I adapter

The T-to-I adapter, or Text-to-Image adapter, is a tool that processes text prompts and translates them into visual elements for image generation. It can be used alongside a control net to refine the output based on textual descriptions. In the video, the presenter shows how to use the T-to-I adapter with a color processor to modify the color scheme of generated images.

💡Multi-image IP adapters

Multi-image IP adapters are a feature that allows users to input multiple images into a single IP adapter, blending different concepts together in the generated image. This advanced tool is particularly useful for creating complex and nuanced images by combining various visual elements. The video explains how to use this feature to blend concepts like spider drawings with concept art sketches of a Yeti-like creature.

💡Instant lauras

Instant lauras is a term used in the community to describe the process of passing multiple images of the same concept into an IP adapter, aiming to blend the average essence of those images into the generated content. This technique is showcased in the video as a way to create a more coherent and unified visual representation of a concept by averaging multiple inputs.

💡VAE

VAE stands for Variational Autoencoder, a type of artificial intelligence model used for efficient compression and generation of high-dimensional data. In the context of the video, the presenter mentions updating the VAE when encountering black images in SXL generations, indicating its importance in the image generation process.

💡Workflow editor

The workflow editor is a visual tool used for constructing and editing the process or 'workflow' of image generation. It allows users to add, remove, and connect different nodes and adapters to create a customized pipeline for generating images. The video highlights new nodes and features added to the workflow editor for advanced users, such as multi-image IP adapters.

💡Discord

Discord is a communication platform where users can join various communities, including those related to the software or technology discussed in the video. In this context, the presenter encourages viewers to join the Discord community for Invoke AI to engage with others, share experiences, and stay updated on new features and improvements.

Highlights

Introduction of LCM, a new technique for optimizing the diffusion process using the LCM scheduler.

Reduction in the number of steps needed to generate an image with the LCM scheduler, enabling the creation of visually impressive images seen recently on the internet.

The quality of the model before using the LCM scheduler is demonstrated with four images of a cyborg King.

Adjustment of settings for LCM, including changing the CFG scale and adding the LCM Laura, which can be downloaded from the latent consistency Hugging Face repo.

Comparison of image quality between standard generation and LCM generation, noting a loss of detail with LCM but a significant increase in speed.

Explanation of the impact of CFG scale on adherence to the prompt and the resulting image saturation and quality adjustments.

Recommendation to stay in the lower ranges of the CFG scale for optimal results.

Return of a simple high-resolution fix in version 3.4, allowing for the generation of larger images from the linear UI without complex workflows.

Integration of the high-resolution fix feature in SD 1.5 models, ensuring core composition at a lower resolution before upscaling and denoising at a higher resolution.

Control net feature and T-to-I adapter feature are now compatible, allowing both to be used simultaneously on the same generation.

Demonstration of the T-to-I color adapter, using a specific color to process and adapt the image.

The ability to decrease the InStep percentage to solve for jagged edges caused by the control adapter, resulting in cleaner image edges.

Introduction of multi-image IP adapters in the workflow editor, allowing for the blending of different concepts into a single image.

Explanation of 'instant lauras', a community term for passing multiple images of the same concept into the same IP adapter to blend the average of those images into the final output.

Demonstration of blending two distinct concepts together using multi-image IP adapters, resulting in unique and creative images.

Mention of the ability to recall vae metadata for any generations, thanks to contributor Stefan Tobler.

Addition of RGBA value fields in the Color Picker inside the unified canvas, a contribution by Rines 404.

Acknowledgment of numerous contributors for their work on bug fixes, translations, and other improvements in version 3.4.

Special recognition of the completion of Dutch, Italian, and Chinese translations of the Invoke AI app, almost fully thanks to the contributors' efforts.

Announcement of upcoming speed increases and backend updates for more efficient functions in the engine.

Invitation to join the Discord community for further updates and engagement.