4 Nov 202307:13

TLDRIn this video, the hosts Jack and Ellie discuss the recent developments with TensorRT, an NVIDIA tool that promises to significantly speed up graphics processing. They explain the limitations and installation process, including the need for a static or dynamic model conversion and the impact on tools like controlnet and FreeU. They also cover the integration of lora into models and provide tips for efficient use and removal of TensorRT models. The video is a practical guide for users looking to optimize their graphics performance and save on electricity costs.


  • 📣 Introduction to TensorRT: A feature claimed to double the speed of graph calculations, but initially had complexity in usage and support issues.
  • 🚀 NVIDIA's recent expansion support: The creators of TensorRT have provided an update to make it more accessible and efficient.
  • 🔄 Model Conversion: Users need to convert their models into TensorRT (trt) models, which come with certain limitations.
  • ⚙️ Limitations of TensorRT: Tools like controlnet and FreeU that are specific to U-Net architectures do not work with TensorRT.
  • 💡 Performance Advantage: A 20% increase in graph calculation speed can lead to a 20% reduction in electricity costs.
  • 🔧 SDXL Web UI Requirement: Users must switch to the dev version of the web UI to utilize TensorRT, or wait for the main version update.
  • 🛠️ Installation Process: The installation of the TensorRT expansion requires additional environment packages due to existing bugs.
  • 🔄 Dynamic vs. Static Models: TensorRT supports both dynamic models with adjustable parameters and static models with fixed values at conversion.
  • 📏 VRAM Consumption: Static models consume less VRAM compared to dynamic models, with a difference of approximately 1G in the discussed example.
  • 🔄 Exporting TensorRT Models: The conversion process takes about 3-5 minutes, and users can choose the appropriate model type based on their needs.
  • 🔍 Testing and Comparison: Users are encouraged to test the speed improvements on their own computers, with an example speed increase of 30%.
  • 🔄 LoRA Integration: Instructions on how to incorporate LoRA into models and convert them for use with TensorRT, including potential bugs and the need for base model compatibility.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is an introduction to TensorRT, its benefits, limitations, and a step-by-step guide on how to install and use it for accelerating deep learning models.

  • What is TensorRT's primary function?

    -TensorRT's primary function is to accelerate deep learning models by allowing faster computation, potentially doubling the speed of graph execution.

  • What are the limitations of using TensorRT?

    -The limitations of using TensorRT include the requirement to convert models into a TRT format, incompatibility with certain tools like controlnet and FreeU, and the inability to dynamically adjust certain parameters in static models.

  • How does TensorRT save on electricity costs?

    -By increasing computation speed, TensorRT can save on electricity costs proportionally. For example, a 20% increase in computation speed can result in a 20% reduction in electricity usage.

  • What are the two types of models that TensorRT supports?

    -TensorRT supports dynamic models, which allow adjustable parameters like image dimensions and batch size during computation, and static models, which use fixed values set during conversion.

  • What is the difference in VRAM consumption between dynamic and static models?

    -Static models generally consume less VRAM compared to dynamic models. The video mentions a difference of approximately 1GB in VRAM consumption between the two.

  • How to install the TensorRT extension if there is a bug in the direct installation process?

    -To install the TensorRT extension when there is a bug, one must first install certain environment packages by downloading and executing a batch file provided in the instructions. After that, the extension can be installed through the web UI from a URL.

  • How to enable automatic switching to TensorRT models in the user interface?

    -To enable automatic switching, go to the settings, find the 'SD' section, and under the quick settings list, apply the setting for 'SD_unet'. After applying, the UI will reload and present an option for automatic switching to TensorRT models if available.

  • What is the process for converting a model to TensorRT?

    -To convert a model to TensorRT, go to the TensorRT menu, select the model weights file, and choose to export an engine with either static or dynamic shapes based on the desired parameters. The conversion process takes about 3-5 minutes.

  • How can Lora be used with TensorRT?

    -Lora can be directly integrated into the model before conversion. For dynamic adjustment of Lora weights, one can use the official test version available in the 'lora_v2' branch of the extension. The converted Lora model can then be used in text-to-image and image-to-image tasks.

  • How to remove the TensorRT models if needed?

    -To remove the TensorRT models, go to the SD's models folder and delete the contents of the 'Unet-onnx' and 'Unet-trt' folders. It is advised not to remove only some models unless you understand the contents of the 'model.json' file to avoid additional errors.



🚀 Introduction to TensorRT and Its Benefits

The paragraph introduces the speakers, Jack and Aili, and their return after a hiatus. They discuss TensorRT, a feature developed by NVIDIA that promises to double the speed of graphics processing. Despite its complex usage and support issues, a recent update by NVIDIA has reignited interest. The speakers explain the limitations of TensorRT, such as the incompatibility with U-Net tools like controlnet and FreeU, and the requirement to convert models to trt format. They highlight the potential energy cost savings from increased processing speed and provide guidance on the installation process, including dealing with bugs and the need for updated drivers. The paragraph concludes with a brief mention of the SDXL webui and the anticipation of its update.


🛠️ Utilizing TensorRT with Dynamic and Static Models

This paragraph delves into the specifics of using TensorRT with dynamic and static models. The speakers explain the difference between the two, noting that dynamic models allow adjustable parameters like image dimensions and batch size during processing, while static models use fixed values set during conversion. They guide the audience through the process of converting a model to a static version, including setting parameters and exporting the engine. The paragraph also touches on the benefits of static models, such as lower VRAM consumption, and provides a comparison of VRAM usage between dynamic and static models. Additionally, the speakers discuss the process for using dynamic models and the advantages of not needing to switch models at high resolutions. The section ends with instructions on how to apply settings for automatic model switching in the user interface and a comparison of processing speeds with TensorRT enabled.




TensorRT is an NVIDIA library designed to accelerate deep learning inference. It transforms models into an optimized format for deployment, which can significantly increase the speed of computation. In the video, TensorRT is introduced as a feature that can potentially double the speed of graph computations, although it has some limitations and requires specific steps for installation and model conversion.

💡Model Conversion

Model conversion refers to the process of transforming a pre-trained model into a format that is optimized for inference with TensorRT. This involves converting the original model to a TensorRT engine file, which can be either a dynamic or static model, depending on the user's needs. The conversion process is crucial for utilizing TensorRT's acceleration capabilities.

💡Dynamic Model

A dynamic model in the context of TensorRT is one where certain parameters, such as image dimensions and batch size, can be adjusted during the computation. This flexibility allows for optimization based on the specific requirements of the task at hand, but it may consume more VRAM compared to a static model.

💡Static Model

A static model in TensorRT is one where the parameters, such as the input dimensions, are fixed at the time of conversion. This model type has the advantage of lower VRAM consumption but lacks the flexibility of dynamic models, as it can only use the values set during the conversion process.


Video RAM (VRAM) is the memory used by graphics processing units (GPUs) to store image data that they process. In the context of the video, VRAM consumption is an important consideration when choosing between dynamic and static models, as static models typically use less VRAM, which can be beneficial for users with limited GPU memory.


SDXL refers to a specific version of a software or tool used in the context of the video, likely related to Stable Diffusion XL, which is a type of AI model used for image generation. The script mentions that for SDXL users, the web UI needs to be switched to the development version for TensorRT integration.

💡Web UI

Web UI stands for Web User Interface, which is the visual and interactive part of a web application that allows users to access and use the services provided by the application. In the video, the Web UI is an important component for installing TensorRT and managing the conversion of models.


LoRA, or Low-Rank Adaptation, is a method used to improve the efficiency of large neural networks by reducing their size while maintaining performance. It involves adding a low-rank matrix to the existing weights of a model, which allows for dynamic adjustment during inference. In the video, LoRA is discussed as a feature that can be integrated into models and used with TensorRT.


Installation in this context refers to the process of setting up and preparing software, like TensorRT, for use on a computer. The video outlines the specific steps and requirements for installing the NVIDIA TensorRT expansion, including updating drivers and dealing with potential bugs.


Performance in the context of the video pertains to the speed and efficiency of running AI models, particularly after applying TensorRT optimizations. The improvement in performance is measured by the increased speed of computation and reduced energy consumption.


A bug in software refers to an error, flaw, or unintended behavior that can cause the program to produce incorrect or unexpected results. In the video, the mention of bugs relates to issues encountered during the installation and use of TensorRT, as well as potential problems with LoRA integration.