TensorRT 9.3 updates (#3661)

* TensorRT 9.3 updates (no submodule updates) Signed-off-by: Michal Guzek <[email protected]> * Update to ONNX-TensorRT 9.3 Signed-off-by: Michal Guzek <[email protected]> --------- Signed-off-by: Michal Guzek <[email protected]> Co-authored-by: Michal Guzek <[email protected]>
NVIDIA · Feb 9, 2024 · 6d1397e · 6d1397e
1 parent 93b6044
commit 6d1397e
Show file tree

Hide file tree

Showing 111 changed files with 5,491 additions and 680 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,15 @@
 # TensorRT OSS Release Changelog
 
-## 9.2.0 GA - 2023-12-04
+## 9.3.0 GA - 2024-02-09
+
+Key Features and Updates:
+
+ - Demo changes
+ - Faster Text-to-image using SDXL & INT8 quantization using AMMO
+ - Updated tooling
+ - Polygraphy v0.49.7
+
+## 9.2.0 GA - 2023-11-27
 
 Key Features and Updates:
 

diff --git a/README.md b/README.md
@@ -26,7 +26,7 @@ You can skip the **Build** section to enjoy TensorRT with Python.
 To build the TensorRT-OSS components, you will first need the following software packages.
 
 **TensorRT GA build**
-* TensorRT v9.2.0.5
+* TensorRT v9.3.0.1
  * Available from direct download links listed below
 
 **System Packages**
@@ -73,16 +73,16 @@ To build the TensorRT-OSS components, you will first need the following software
  If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.
 
  Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
- - [TensorRT 9.2.0.5 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.linux.x86_64-gnu.cuda-11.8.tar.gz)
- - [TensorRT 9.2.0.5 for CUDA 12.2, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.linux.x86_64-gnu.cuda-12.2.tar.gz)
+ - [TensorRT 9.3.0.1 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.3.0/tensorrt-9.3.0.1.linux.x86_64-gnu.cuda-11.8.tar.gz)
+ - [TensorRT 9.3.0.1 for CUDA 12.2, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.3.0/tensorrt-9.3.0.1.linux.x86_64-gnu.cuda-12.2.tar.gz)
 
 
  **Example: Ubuntu 20.04 on x86-64 with cuda-12.2**
 
  ```bash
  cd ~/Downloads
- tar -xvzf tensorrt-9.2.0.5.linux.x86_64-gnu.cuda-12.2.tar.gz
- export TRT_LIBPATH=`pwd`/TensorRT-9.2.0.5
+ tar -xvzf tensorrt-9.3.0.1.linux.x86_64-gnu.cuda-12.2.tar.gz
+ export TRT_LIBPATH=`pwd`/TensorRT-9.3.0.1
  ```
 
 ## Setting Up The Build Environment

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-9.2.0.5
+9.3.0.1
diff --git a/demo/Diffusion/README.md b/demo/Diffusion/README.md
@@ -7,7 +7,7 @@ This demo application ("demoDiffusion") showcases the acceleration of Stable Dif
 ### Clone the TensorRT OSS repository
 
 ```bash
-git clone [email protected]:NVIDIA/TensorRT.git -b release/9.2 --single-branch
+git clone [email protected]:NVIDIA/TensorRT.git -b release/9.3 --single-branch
 cd TensorRT
 ```
 
@@ -16,7 +16,7 @@ cd TensorRT
 Install nvidia-docker using [these intructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker).
 
 ```bash
-docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.07-py3 /bin/bash
+docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.12-py3 /bin/bash
 ```
 
 ### Install latest TensorRT release
@@ -26,7 +26,7 @@ python3 -m pip install --upgrade pip
 python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt
 ```
 
-> NOTE: TensorRT 9.0 is only available as a pre-release
+> NOTE: TensorRT 9.x is only available as a pre-release
 
 Check your installed version using:
 `python3 -c 'import tensorrt;print(tensorrt.__version__)'`
@@ -48,8 +48,8 @@ diffusers 0.23.1
 onnx 1.14.0
 onnx-graphsurgeon 0.3.26
 onnxruntime 1.15.1
-polygraphy 0.49.1
-tensorrt 9.2.0.5
+polygraphy 0.49.7
+tensorrt 9.3.0.1
 tokenizers 0.13.2
 torch 2.1.0
 transformers 4.31.0
@@ -137,6 +137,14 @@ It is also possible to combine multiple LoRAs.
 python3 demo_txt2img_xl.py "Picture of a rustic Italian village with Olive trees and mountains" --version=xl-1.0 --lora-path "ostris/crayon_style_lora_sdxl" "ostris/watercolor_style_lora_sdxl" --lora-scale 0.3 0.7 --onnx-dir onnx-sdxl-lora --engine-dir engine-sdxl-lora --build-enable-refit
 ```
 
+### Faster Text-to-image using SDXL & INT8 quantization using AMMO
+
+```bash
+python3 demo_txt2img_xl.py "a photo of an astronaut riding a horse on mars" --version xl-1.0 --onnx-dir onnx-sdxl --engine-dir engine-sdxl --int8 --quantization-level 3
+```
+
+Note that the calibration process can be quite time-consuming, and will be repeated if `--quantization-level`, `--denoising-steps`, or `--onnx-dir` is changed.
+
 ### Faster Text-to-Image using SDXL + LCM (Latent Consistency Model) LoRA weights
 [LCM-LoRA](https://arxiv.org/abs/2311.05556) produces good quality images in 4 to 8 denoising steps instead of 30+ needed base model. Note that we use LCM scheduler and disable classifier-free-guidance by setting `--guidance-scale` to 0.
 LoRA weights are fused into the ONNX and finalized TensorRT plan files in this example.