Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



6 Commits

Repository files navigation

docker C++

TensorRT CudNN

TensorRT C++ Samples

Real-time inference using TensorRT.

  • convert onnx to trt on target hardware
  • run yolo models on target hardware (and it automatically creates the trt engine file)
  • run engine file on target hardware

Getting Started

  1. Download a yolo model
  2. Update the Makefile with your ARCH_BIN (see #reference for details)
  3. Start the docker container.
make build
make run
  1. Build the code
  • place your images in src/images
# inside the docker container
mkdir build && cd build
cmake ..
make -j -l4
  • alternatively, if you want to build an individual module alone, you can follow these steps
# go to module of interest
$ cd /src/engine
# create build directory
$ mkdir build && cd build
# build the project
$ cmake .. && make -j


the main CMakeLists.txt builds these folders:

converter ----> converts yolo model to tensorRT serialized engine file (trt engine file)
engine    ----> runs a trt engine file
yolo      ----> runs a yolo model (converts to trt engine and runs)

When you build in the main directory here is what the outputs look like

/src/build# tree -L 2 -I 'CMakeFiles'
|-- CMakeCache.txt
|-- Makefile
|-- cmake_install.cmake
|-- converter
|   |-- Makefile
|   |-- cmake_install.cmake
|   `-- onnx2trt                <<<<<<<<<< convert onnx 2 trt
|-- engine
|   |-- Makefile
|   |-- cmake_install.cmake
|   `-- engine                  <<<<<<<<<< run serialized engine file
`-- yolo
    |-- Makefile
    |-- cmake_install.cmake
    |-- detectImage             <<<<<<<<<< run object detection on an image with yolo model
    |-- detectWebcam            <<<<<<<<<< run object detection on an webcam with yolo model
    `-- profile                 <<<<<<<<<< calculate yolo model execution time when doing detection on image



The Dockerfile has an ARG ARCH_BIN that is used to build openCV wth cuda support. You can check nvidia docs to match your gpu and set ARCH_BIN in the Makefile

# here we have GeForce GTX 1050. The docs label it as ARCH_BIN=6.1
$ nvidia-smi
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce GTX 1050        Off | 00000000:01:00.0 Off |                  N/A |

version check

  • check your versions (inside docker container)
# TensorRT version
$ find / -name NvInferVersion.h -type f

# this displays TensorRT version 8.6.1
$ cat /usr/include/x86_64-linux-gnu/NvInferVersion.h | grep NV_TENSORRT | head -n 3
#define NV_TENSORRT_MAJOR 8 //!< TensorRT major version.
#define NV_TENSORRT_MINOR 6 //!< TensorRT minor version.
#define NV_TENSORRT_PATCH 1 //!< TensorRT patch version.

# this displays cudNN version 8.9.1
$ cat /usr/include/x86_64-linux-gnu/cudnn_v*.h | grep CUDNN_MAJOR -A 2 | head -n 3
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 9

tao converter

# make run puts you inside the docker container

# before running this, check the README.txt in /src/scripts/tao-converter and install any dependencies and set paths
/tmp/tao-converter# export MODEL_PATH=~/path/to/folder
/tmp/tao-converter# export MODEL=replace_with_model_name
/tmp/tao-converter# export KEY=replace_with_nvidia_key
/tmp/tao-converter# ./tao-converter -k "${KEY}" -t fp16 -e "${MODEL_PATH}/${MODEL}.engine" -o output "${MODEL_PATH}/${MODEL}.etlt"

[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/filer9wcjU
[INFO] ONNX IR version:  0.0.7
[INFO] Opset version:    13
[INFO] Producer name:    pytorch
[INFO] Producer version: 1.10


  • ask for help
$ /usr/src/tensorrt/bin/trtexec --help
  • profile model speed
# load in a onnx file
$ export MODEL_PATH=/path/to/folder
$ export ONNX_NAME=model.onnx
$ export TRT_NAME=model.engine
$ /usr/src/tensorrt/bin/trtexec --onnx="${MODEL_PATH}/${ONNX_NAME}" --iterations=5 --workspace=4096
# load in a trt engine file
$ /usr/src/tensorrt/bin/trtexec --loadEngine="${MODEL_PATH}/${TRT_NAME}" --fp16 --batch=1 --iterations=50 --workspace=4096
# save logs to a file
$ /usr/src/tensorrt/bin/trtexec --loadEngine="${MODEL_PATH}/${TRT_NAME}" --fp16 --batch=1 --iterations=50 --workspace=4096 > stats.log 
  • model conversion
$ export MODEL_PATH=/path/to/folder
$ export MODEL_NAME=model
# convert the model to FP16 (if supported on hardware)
$ /usr/src/tensorrt/bin/trtexec --onnx="${MODEL_PATH}/${MODEL_NAME}.onnx" --saveEngine="${MODEL_PATH}/${MODEL_NAME}_fp16.engine" --useCudaGraph --fp16 > "${MODEL_NAME}_fp16.log" 


Examples using the TensorRT C++ api






No releases published


No packages published