Questions about polygraphy convertsion #4218

EmmaThompson123 · 2024-10-22T06:50:36Z

I use polygraphy to convert onnx model to tensorrt :

import numpy as np
import os
from polygraphy.backend.trt import CreateConfig as CreateTrtConfig, EngineFromNetwork, EngineFromBytes, NetworkFromOnnxPath, Profile, TrtRunner, SaveEngine
import time
from polygraphy.logger import G_LOGGER
G_LOGGER.severity = G_LOGGER.VERBOSE

def GB_to_bytes(gb):
    return int(gb * 1024 * 1024 * 1024)

def main():
    workspace_GB = 1.7
    dynamic = False
    bs = 4
    if not dynamic:
        assert bs
    fp16 = True
    onnx_path = f"weights/model_bs{bs}.onnx"
    if dynamic:
        suffix1 = "_dynamic"
    else:
        suffix1 = f"_bs{bs}"
    if fp16:
        suffix2 = "_fp16"
    else:
        suffix2 = "_fp32"
    tactic_sources = ['CUBLAS_LT', 'CUDNN']
    tactic_sources_suffix = "_".join(tactic_sources)
    engine_path = f"weights/model{suffix1}{suffix2}_{workspace_GB}GB_{tactic_sources_suffix}.trt"
    print("build engine")
    profiles = [
        Profile().add('audio_seqs__0', min=[1, 1, 80, 16], opt=[bs, 1, 80, 16], max=[2*bs, 1, 80, 16]).add('img_seqs__1', min=[1, 6, 256, 256], opt=[bs, 6, 256, 256], max=[2*bs, 6, 256, 256]) \
            if dynamic else Profile().add('audio_seqs__0', min=[bs, 1, 80, 16], opt=[bs, 1, 80, 16], max=[bs, 1, 80, 16]).add('img_seqs__1', min=[bs, 6, 256, 256], opt=[bs, 6, 256, 256], max=[bs, 6, 256, 256])
    ]
    create_trt_config = CreateTrtConfig(max_workspace_size=GB_to_bytes(workspace_GB), fp16=fp16, profiles=profiles)
    build_engine = EngineFromNetwork(
        NetworkFromOnnxPath(onnx_path), config=create_trt_config)
    build_engine = SaveEngine(build_engine, path=engine_path)
    with TrtRunner(build_engine) as runner:
        audio_seqs__0 = np.random.randn(bs, 1, 80, 16).astype(np.float32)
        img_seqs__1 = np.random.randn(bs, 6, 256, 256).astype(np.float32)
        outputs = runner.infer(feed_dict={"audio_seqs__0": audio_seqs__0, "img_seqs__1": img_seqs__1})

if __name__ == "__main__":
    main()

its log is :

[V] Loaded Module: tensorrt           | Version: 8.0.3.4  | Path: ['/opt/conda/lib/python3.8/site-packages/tensorrt']
build engine
[V] Loaded Module: polygraphy.util    | Path: ['/opt/conda/lib/python3.8/site-packages/polygraphy/util']
[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +503, GPU +0, now: CPU 515, GPU 521 (MiB)
[TensorRT] INFO: ----------------------------------------------------------------
[TensorRT] INFO: Input filename:   weights/model_bs4.onnx
[TensorRT] INFO: ONNX IR version:  0.0.7
[TensorRT] INFO: Opset version:    9
[TensorRT] INFO: Producer name:    pytorch
[TensorRT] INFO: Producer version: 1.10
[TensorRT] INFO: Domain:           
[TensorRT] INFO: Model version:    0
[TensorRT] INFO: Doc string:       
[TensorRT] INFO: ----------------------------------------------------------------
[V]     Setting TensorRT Optimization Profiles
[V]     Input tensor: audio_seqs__0 (dtype=DataType.FLOAT, shape=(4, 1, 80, 16)) | Setting input tensor shapes to: (min=[4, 1, 80, 16], opt=[4, 1, 80, 16], max=[4, 1, 80, 16])
[V]     Input tensor: img_seqs__1 (dtype=DataType.FLOAT, shape=(4, 6, 256, 256)) | Setting input tensor shapes to: (min=[4, 6, 256, 256], opt=[4, 6, 256, 256], max=[4, 6, 256, 256])
[I]     Configuring with profiles: [Profile().add(audio_seqs__0, min=[4, 1, 80, 16], opt=[4, 1, 80, 16], max=[4, 1, 80, 16]).add(img_seqs__1, min=[4, 6, 256, 256], opt=[4, 6, 256, 256], max=[4, 6, 256, 256])]
[I] Building engine with configuration:
    Workspace            | 1825361100 bytes (1740.80 MiB)
    Precision            | TF32: False, FP16: True, INT8: False, Strict Types: False
    Tactic Sources       | ['CUBLAS', 'CUBLAS_LT', 'CUDNN']
    Safety Restricted    | False
    Profiles             | 1 profile(s)
[TensorRT] INFO: [MemUsageSnapshot] Builder begin: CPU 721 MiB, GPU 521 MiB
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +793, GPU +342, now: CPU 1598, GPU 863 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +198, GPU +342, now: CPU 1796, GPU 1205 (MiB)
[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[TensorRT] INFO: Detected 2 inputs and 1 output network tensors.
[TensorRT] INFO: Total Host Persistent Memory: 172288
[TensorRT] INFO: Total Device Persistent Memory: 63409664
[TensorRT] INFO: Total Scratch Memory: 119289856
[TensorRT] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 187 MiB, GPU 4 MiB
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2602, GPU 1685 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2602, GPU 1693 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2602, GPU 1677 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2602, GPU 1659 (MiB)
[TensorRT] INFO: [MemUsageSnapshot] Builder end: CPU 2517 MiB, GPU 1659 MiB
[I] Finished engine building in 89.164 seconds
[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2537, GPU 1535 (MiB)
[TensorRT] INFO: Loaded engine size: 123 MB
[TensorRT] INFO: [MemUsageSnapshot] deserializeCudaEngine begin: CPU 2537 MiB, GPU 1535 MiB
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2538, GPU 1669 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2538, GPU 1677 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2538, GPU 1659 (MiB)
[TensorRT] INFO: [MemUsageSnapshot] deserializeCudaEngine end: CPU 2538 MiB, GPU 1659 MiB
[I] Saving engine to weights/wav2lip/wav2lip_bs4_fp16_1.7GB_CUBLAS_LT_CUDNN.trt
[TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation begin: CPU 2414 MiB, GPU 1659 MiB
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2414, GPU 1669 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2414, GPU 1677 (MiB)
[TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation end: CPU 2414 MiB, GPU 1953 MiB
[V] Loaded Module: numpy              | Version: 1.23.5   | Path: ['/opt/conda/lib/python3.8/site-packages/numpy']
[V] Runner input metadata is: {audio_seqs__0 [dtype=float32, shape=(4, 1, 80, 16)],
     img_seqs__1 [dtype=float32, shape=(4, 6, 256, 256)]}
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2423, GPU 1873 (MiB)
Deactivated tensorrt runner

My first question is why I have set tactic_sources = ['CUBLAS_LT', 'CUDNN'] ( I think my model is fp16, so I don't need CUBLAS, I want to reduce gpu memoey consumption), from the log, it still uses the CUBLAS tactic source ?
Then I use trtexec to run the converted trt model : trtexec --loadEngine=weights/model_bs4_fp16_1.7GB_CUBLAS_LT_CUDNN.trt --shapes=audio_seqs__0:4x1x80x16,img_seqs__1:4x6x256x256, the log is :

[10/21/2024-21:45:10] [I] === Model Options ===
[10/21/2024-21:45:10] [I] Format: *
[10/21/2024-21:45:10] [I] Model: 
[10/21/2024-21:45:10] [I] Output:
[10/21/2024-21:45:10] [I] === Build Options ===
[10/21/2024-21:45:10] [I] Max batch: explicit
[10/21/2024-21:45:10] [I] Workspace: 16 MiB
[10/21/2024-21:45:10] [I] minTiming: 1
[10/21/2024-21:45:10] [I] avgTiming: 8
[10/21/2024-21:45:10] [I] Precision: FP32
[10/21/2024-21:45:10] [I] Calibration: 
[10/21/2024-21:45:10] [I] Refit: Disabled
[10/21/2024-21:45:10] [I] Sparsity: Disabled
[10/21/2024-21:45:10] [I] Safe mode: Disabled
[10/21/2024-21:45:10] [I] Restricted mode: Disabled
[10/21/2024-21:45:10] [I] Save engine: 
[10/21/2024-21:45:10] [I] Load engine: weights/wav2lip/wav2lip_bs4_fp16_1.5GB_CUBLAS_LT_CUDNN.trt
[10/21/2024-21:45:10] [I] NVTX verbosity: 0
[10/21/2024-21:45:10] [I] Tactic sources: Using default tactic sources
[10/21/2024-21:45:10] [I] timingCacheMode: local
[10/21/2024-21:45:10] [I] timingCacheFile: 
[10/21/2024-21:45:10] [I] Input(s)s format: fp32:CHW
[10/21/2024-21:45:10] [I] Output(s)s format: fp32:CHW
[10/21/2024-21:45:10] [I] Input build shape: audio_seqs__0=4x1x80x16+4x1x80x16+4x1x80x16
[10/21/2024-21:45:10] [I] Input build shape: img_seqs__1=4x6x256x256+4x6x256x256+4x6x256x256
[10/21/2024-21:45:10] [I] Input calibration shapes: model
[10/21/2024-21:45:10] [I] === System Options ===
[10/21/2024-21:45:10] [I] Device: 0
[10/21/2024-21:45:10] [I] DLACore: 
[10/21/2024-21:45:10] [I] Plugins:
[10/21/2024-21:45:10] [I] === Inference Options ===
[10/21/2024-21:45:10] [I] Batch: Explicit
[10/21/2024-21:45:10] [I] Input inference shape: img_seqs__1=4x6x256x256
[10/21/2024-21:45:10] [I] Input inference shape: audio_seqs__0=4x1x80x16
[10/21/2024-21:45:10] [I] Iterations: 10
[10/21/2024-21:45:10] [I] Duration: 3s (+ 200ms warm up)
[10/21/2024-21:45:10] [I] Sleep time: 0ms
[10/21/2024-21:45:10] [I] Streams: 1
[10/21/2024-21:45:10] [I] ExposeDMA: Disabled
[10/21/2024-21:45:10] [I] Data transfers: Enabled
[10/21/2024-21:45:10] [I] Spin-wait: Disabled
[10/21/2024-21:45:10] [I] Multithreading: Disabled
[10/21/2024-21:45:10] [I] CUDA Graph: Disabled
[10/21/2024-21:45:10] [I] Separate profiling: Disabled
[10/21/2024-21:45:10] [I] Time Deserialize: Disabled
[10/21/2024-21:45:10] [I] Time Refit: Disabled
[10/21/2024-21:45:10] [I] Skip inference: Disabled
[10/21/2024-21:45:10] [I] Inputs:
[10/21/2024-21:45:10] [I] === Reporting Options ===
[10/21/2024-21:45:10] [I] Verbose: Disabled
[10/21/2024-21:45:10] [I] Averages: 10 inferences
[10/21/2024-21:45:10] [I] Percentile: 99
[10/21/2024-21:45:10] [I] Dump refittable layers:Disabled
[10/21/2024-21:45:10] [I] Dump output: Disabled
[10/21/2024-21:45:10] [I] Profile: Disabled
[10/21/2024-21:45:10] [I] Export timing to JSON file: 
[10/21/2024-21:45:10] [I] Export output to JSON file: 
[10/21/2024-21:45:10] [I] Export profile to JSON file: 
[10/21/2024-21:45:10] [I] 
[10/21/2024-21:45:11] [I] === Device Information ===
[10/21/2024-21:45:11] [I] Selected Device: NVIDIA A30
[10/21/2024-21:45:11] [I] Compute Capability: 8.0
[10/21/2024-21:45:11] [I] SMs: 56
[10/21/2024-21:45:11] [I] Compute Clock Rate: 1.44 GHz
[10/21/2024-21:45:11] [I] Device Global Memory: 24060 MiB
[10/21/2024-21:45:11] [I] Shared Memory per SM: 164 KiB
[10/21/2024-21:45:11] [I] Memory Bus Width: 3072 bits (ECC enabled)
[10/21/2024-21:45:11] [I] Memory Clock Rate: 1.215 GHz
[10/21/2024-21:45:11] [I] 
[10/21/2024-21:45:11] [I] TensorRT version: 8003
[10/21/2024-21:45:16] [I] [TRT] [MemUsageChange] Init CUDA: CPU +502, GPU +0, now: CPU 633, GPU 521 (MiB)
[10/21/2024-21:45:16] [I] [TRT] Loaded engine size: 123 MB
[10/21/2024-21:45:16] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 633 MiB, GPU 521 MiB
[10/21/2024-21:45:24] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +793, GPU +342, now: CPU 1426, GPU 987 (MiB)
[10/21/2024-21:45:30] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +198, GPU +342, now: CPU 1624, GPU 1329 (MiB)
[10/21/2024-21:45:30] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1624, GPU 1311 (MiB)
[10/21/2024-21:45:30] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1624 MiB, GPU 1311 MiB
[10/21/2024-21:45:30] [I] Engine loaded in 19.0488 sec.
[10/21/2024-21:45:30] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1500 MiB, GPU 1311 MiB
[10/21/2024-21:45:30] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +10, now: CPU 1501, GPU 1321 (MiB)
[10/21/2024-21:45:30] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1501, GPU 1329 (MiB)
[10/21/2024-21:45:30] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1501 MiB, GPU 1605 MiB
[10/21/2024-21:45:30] [I] Created input binding for audio_seqs__0 with dimensions 4x1x80x16
[10/21/2024-21:45:30] [I] Created input binding for img_seqs__1 with dimensions 4x6x256x256
[10/21/2024-21:45:30] [I] Created output binding for value__0 with dimensions 4x3x256x256
[10/21/2024-21:45:30] [I] Starting inference
[10/21/2024-21:45:33] [I] Warmup completed 26 queries over 200 ms
[10/21/2024-21:45:33] [I] Timing trace has 607 queries over 3.01887 s
[10/21/2024-21:45:33] [I] 
[10/21/2024-21:45:33] [I] === Trace details ===
[10/21/2024-21:45:33] [I] Trace averages of 10 runs:
[10/21/2024-21:45:33] [I] Average on 10 runs - GPU latency: 6.70789 ms - Host latency: 7.19922 ms (end to end 13.276 ms, enqueue 1.39257 ms)
...
[10/21/2024-21:45:33] [I] Average on 10 runs - GPU latency: 4.63984 ms - Host latency: 5.05623 ms (end to end 8.15396 ms, enqueue 1.21301 ms)
[10/21/2024-21:45:33] [I] 
[10/21/2024-21:45:33] [I] === Performance summary ===
[10/21/2024-21:45:33] [I] Throughput: 201.069 qps
[10/21/2024-21:45:33] [I] Latency: min = 4.92627 ms, max = 7.21841 ms, mean = 5.10661 ms, median = 5.06763 ms, percentile(99%) = 7.19328 ms
[10/21/2024-21:45:33] [I] End-to-End Host Latency: min = 5.026 ms, max = 13.315 ms, mean = 9.14545 ms, median = 9.25452 ms, percentile(99%) = 13.2415 ms
[10/21/2024-21:45:33] [I] Enqueue Time: min = 0.482788 ms, max = 2.88159 ms, mean = 1.24856 ms, median = 1.29077 ms, percentile(99%) = 2.15625 ms
[10/21/2024-21:45:33] [I] H2D Latency: min = 0.258789 ms, max = 0.338821 ms, mean = 0.267017 ms, median = 0.264648 ms, percentile(99%) = 0.321045 ms
[10/21/2024-21:45:33] [I] GPU Compute Time: min = 4.53687 ms, max = 6.72342 ms, mean = 4.71299 ms, median = 4.67749 ms, percentile(99%) = 6.70314 ms
[10/21/2024-21:45:33] [I] D2H Latency: min = 0.124268 ms, max = 0.170563 ms, mean = 0.126614 ms, median = 0.125732 ms, percentile(99%) = 0.168869 ms
[10/21/2024-21:45:33] [I] Total Host Walltime: 3.01887 s
[10/21/2024-21:45:33] [I] Total GPU Compute Time: 2.86078 s
[10/21/2024-21:45:33] [I] Explanations of the performance metrics are printed in the verbose logs.
[10/21/2024-21:45:33] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8003] # trtexec --loadEngine=weights/model_bs4_fp16_1.5GB_CUBLAS_LT_CUDNN.trt --shapes=audio_seqs__0:4x1x80x16,img_seqs__1:4x6x256x256
[10/21/2024-21:45:33] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1501, GPU 1525 (MiB)

Note I set max_workspace_size=1.7G and fp16=True in the polygraphy convertsion pyhton script, but the log of trtexec shows

[10/21/2024-21:45:10] [I] Workspace: 16 MiB
...
[10/21/2024-21:45:10] [I] Precision: FP32

, why they are inconsistent ?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about polygraphy convertsion #4218

Questions about polygraphy convertsion #4218

EmmaThompson123 commented Oct 22, 2024

Questions about polygraphy convertsion #4218

Questions about polygraphy convertsion #4218

Comments

EmmaThompson123 commented Oct 22, 2024