Skip to content

Releases: NVIDIA/TensorRT

TensorRT OSS v10.5.0

10 Oct 19:47
c8a5043
Compare
Choose a tag to compare

Release 10.5-GA

Key Features and Updates:

  • Demo changes
  • Sample changes
    • None
  • Plugin changes
    • Migrated IPluginV2-descendent versions of bertQKVToContextPlugin (1, 2, 3) to newer versions (4, 5, 6 respectively) which implement IPluginV3.
    • Note:
      • The newer versions preserve the attributes and I/O of the corresponding older plugin version
      • The older plugin versions are deprecated and will be removed in a future release
  • Quickstart guide
    • None
  • Parser changes
    • Added support for real-valued STFT operations
    • Improved error handling in IParser

Known issues:

  • Demos:
    • TensorRT engine might not be build successfully when using --fp8 flag on H100 GPUs.

TensorRT OSS v10.4.0

12 Sep 00:59
866548c
Compare
Choose a tag to compare

10.4.0 GA - 2024-09-11

Key Features and Updates:

  • Demo changes

    • Added Stable Cascade pipeline.
    • Enabled INT8 and FP8 quantization for Stable Diffusion v1.5, v2.0 and v2.1 pipelines.
    • Enabled FP8 quantization for Stable Diffusion XL pipeline.
  • Sample changes

    • Add a new python sample aliased_io_plugin which demonstrates how in-place updates to plugin inputs can be achieved through I/O aliasing.
  • Plugin changes

    • Migrated IPluginV2-descendent versions (a) of the following plugins to newer versions (b) which implement IPluginV3 (a->b):
      • scatterElementsPlugin (1->2)
      • skipLayerNormPlugin (1->5, 2->6, 3->7, 4->8)
      • embLayerNormPlugin (2->4, 3->5)
      • bertQKVToContextPlugin (1->4, 2->5, 3->6)
    • Note
      • The newer versions preserve the corresponding attributes and I/O of the corresponding older plugin version.
      • The older plugin versions are deprecated and will be removed in a future release.
  • Quickstart guide

  • Parser changes

    • Added support for tensor axes for Pad operations.
    • Added support for BlackmanWindow, HammingWindow, and HannWindow operations.
    • Improved error handling in IParserRefitter.
    • Fixed kernel shape inference in multi-input convolutions.
  • Updated tooling

    • polygraphy-extension-trtexec v0.0.9

TensorRT OSS v10.3.0

08 Aug 23:23
c5b9de3
Compare
Choose a tag to compare

10.3.0 GA

Key Features and Updates:

  • Demo changes
  • Plugin changes
    • Deprecated Version 1 of ScatterElements plugin. It is superseded by Version 2, which implements the IPluginV3 interface.
  • Quickstart guide
  • Parser changes
    • Added support for tensor axes inputs for Slice node.
    • Updated ScatterElements importer to use Version 2 of ScatterElements plugin, which implements the IPluginV3 interface.
  • Updated tooling
    • Polygraphy v0.49.13

TensorRT OSS v10.2.0

15 Jul 16:16
2332a71
Compare
Choose a tag to compare

Key Features and Updates:

  • Demo changes
  • Plugin changes
    • Version 3 of the InstanceNormalization plugin (InstanceNormalization_TRT) has been added. This version is based on the IPluginV3 interface and is used by the TensorRT ONNX parser when native InstanceNormalization is disabled.
  • Tooling changes
    • Pytorch Quantization development has transitioned to TensorRT Model Optimizer. All developers are encouraged to use TensorRT Model Optimizer to benefit from the latest advancements on quantization and compression.
  • Build containers
    • Updated default cuda versions to 12.5.0.

TensorRT OSS v10.1.0

18 Jun 00:26
9db1508
Compare
Choose a tag to compare

Key Features and Updates:

  • Parser changes
    • Added supportsModelV2 API
    • Added support for DeformConv operation
    • Added support for PluginV3 TensorRT Plugins
    • Marked all IParser and IParserRefitter APIs as noexcept
  • Plugin changes
    • Added version 2 of ROIAlign_TRT plugin, which implements the IPluginV3 plugin interface. When importing an ONNX model with the RoiAlign op, this new version of the plugin will be inserted to the TRT network.
  • Samples changes
  • Updated tooling
    • Polygraphy v0.49.12
    • ONNX-GraphSurgeon v0.5.3

TensorRT OSS v10.0.1

30 Apr 18:05
d2f4ef7
Compare
Choose a tag to compare

Key Features and Updates:

  • Parser changes
    • Added support for building with protobuf-lite.
    • Fixed issue when parsing and refitting models with nested BatchNormalization nodes.
    • Added support for empty inputs in custom plugin nodes.
  • Demo changes
    • The following demos have been removed: Jasper, Tacotron2, HuggingFace Diffusers notebook
  • Updated tooling
    • Polygraphy v0.49.10
    • ONNX-GraphSurgeon v0.5.2
  • Build Containers
    • Updated default cuda versions to 12.4.0.
    • Added Rocky Linux 8 and Rocky Linux 9 build containers

TensorRT v10.0.0

03 Apr 21:45
Compare
Choose a tag to compare

Key Features and Updates:

  • Samples changes
    • Added a sample showcasing weight-stripped engines.
    • Added a sample demonstrating the use of custom tactics with IPluginV3.
    • Added a sample to showcase plugins with data-dependent output shapes, using IPluginV3.
  • Parser changes
    • Added a new class IParserRefitter that can be used to refit a TensorRT engine with the weights of an ONNX model.
    • kNATIVE_INSTANCENORM is now set to ON by default.
    • Added support for IPluginV3 interfaces from TensorRT.
    • Added support for INT4 quantization.
    • Added support for the reduction attribute in ScatterElements.
    • Added support for wrap padding mode in Pad
  • Plugin changes
    • A new plugin has been added in compliance with ONNX ScatterElements.
    • The TensorRT plugin library no longer has a load-time link dependency on cuBLAS or cuDNN libraries.
    • All plugins which relied on cuBLAS/cuDNN handles passed through IPluginV2Ext::attachToContext() have moved to use cuBLAS/cuDNN resources initialized by the plugin library itself. This works by dynamically loading the required cuBLAS/cuDNN library. Additionally, plugins which independently initialized their cuBLAS/cuDNN resources have also moved to dynamically loading the required library. If the respective library is not discoverable through the library path(s), these plugins will not work.
    • bertQKVToContextPlugin: Version 2 of this plugin now supports head sizes less than or equal to 32.
    • reorgPlugin: Added a version 2 which implements IPluginV2DynamicExt.
    • disentangledAttentionPlugin: Fixed a kernel bug.
  • Demo changes
    • HuggingFace demos have been removed. For all users using TensorRT to accelerate Large Language Model inference, please use TensorRT-LLM.
  • Updated tooling
    • Polygraphy v0.49.9
    • ONNX-GraphSurgeon v0.5.1
    • TensorRT Engine Explorer v0.1.8
  • Build Containers
    • RedHat/CentOS 7.x are no longer officially supported starting with TensorRT 10.0. The corresponding container has been removed from TensorRT-OSS.

TensorRT OSS v9.3.0

09 Feb 22:30
6d1397e
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 9.3.0.1 release.

Updates since TensorRT 9.2.0 release.

Key Features and Updates:

  • Faster Text-to-image using SDXL & INT8 quantization using AMMO
  • Updated Polygraphy v0.49.7

TensorRT OSS v9.2.0

05 Dec 00:30
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 9.2.0.5 release.

Updates since TensorRT 9.1.0 release.

Key Features and Updates:

  • trtexec enhancement: Added --weightless flag to mark the engine as weightless.
  • Parser changes
    • Added support for Hardmax operator.
    • Changes to a few operator importers to ensure that TensorRT preserves the precision of operations when using strongly typed mode.
  • Plugin changes
    • Explicit INT8 support added to bertQKVToContextPlugin.
    • Various bug fixes.
  • Updated HuggingFace demo to use transformers v4.31.0 and PyTorch v2.1.0.

TensorRT OSS v9.1.0

20 Oct 00:34
b8ada01
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 9.1.0.4 GA release.

Updates since TensorRT 8.6.1 GA release.

Key Features and Updates:

  • Update the trt_python_plugin sample.
    • Python plugins API reference is part of the offical TRT Python API.
  • Added samples demonstrating the usage of the progress monitor API.
  • Remove dependencies related to python<3.8 in python samples as we no longer support python<3.8 for python samples.
  • Demo changes
    • Added LAMBADA dataset accuracy checks in the HuggingFace demo.
    • Enabled structured sparsity and FP8 quantized batch matrix multiplication(BMM)s in attention in the NeMo demo.
    • Replaced deprecated APIs in the BERT demo.
  • Updated tooling
    • Polygraphy v0.49.1