enhancement: Dynamically updating CUDA EP options #256

krishung5 · 2024-06-06T01:31:57Z

This PR is based on #242 to add the flexibility to allow users to provide CUDA EP options in a flexible fashion. Note that this PR doesn't include the update for TRT options.

Before, the CUDA EP options are set within the parameters field:

...
parameters { key: "cudnn_conv_algo_search" value: { string_value: "0" } }
parameters { key: "gpu_mem_limit" value: { string_value: "4294967200" } }
...

After this enhancement, users are able to set any kinds of CUDA EP options just like the TRT EP:

optimization { execution_accelerators {
  gpu_execution_accelerator : [ {
    name : "cuda"
    parameters { key: "cudnn_conv_use_max_workspace" value: "0" }
    parameters { key: "use_ep_level_unified_stream" value: "1" }}
  ]
}}

For backward compatibility, setting CUDA EP options within the parameters field should still work.

Related PRs:
triton-inference-server/backend#100
triton-inference-server/core#368
Testing: triton-inference-server/server#7328

krishung5 · 2024-06-06T06:40:53Z

Updated doc in commit 6ecda12. Can ignore the commit for formatting.

src/onnxruntime.cc

gedoensmax · 2024-06-06T07:28:28Z

Thanks a lot for taking a look at my PR. This looks great to me.

src/onnxruntime.cc

tanmayv25 · 2024-06-06T22:54:11Z

@gedoensmax @krishung5 Great work improving the parameter specifications in ORT backend!

gedoensmax and others added 2 commits June 4, 2024 16:03

dynamic CUDA and TRT options updating

ea76722

Fix up

1d94c6a

This was referenced Jun 6, 2024

test: Add testing for CUDA EP options triton-inference-server/server#7328

Merged

Add kCUDAExecutionAccelerator constant triton-inference-server/core#368

Merged

Add kCUDAExecutionAccelerator constant triton-inference-server/backend#100

Merged

krishung5 requested a review from tanmayv25 June 6, 2024 01:42

krishung5 added 2 commits June 5, 2024 23:33

Add doc

6ecda12

Format

a40b019

krishung5 commented Jun 6, 2024

View reviewed changes

src/onnxruntime.cc Show resolved Hide resolved

tanmayv25 reviewed Jun 6, 2024

View reviewed changes

src/onnxruntime.cc Show resolved Hide resolved

tanmayv25 approved these changes Jun 6, 2024

View reviewed changes

krishung5 merged commit d992c5b into main Jun 6, 2024
3 checks passed

krishung5 mentioned this pull request Jun 10, 2024

[Performance] Regression observed when using CUDA execution provider microsoft/onnxruntime#20712

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhancement: Dynamically updating CUDA EP options #256

enhancement: Dynamically updating CUDA EP options #256

krishung5 commented Jun 6, 2024 •

edited

Loading

krishung5 commented Jun 6, 2024

gedoensmax commented Jun 6, 2024

tanmayv25 commented Jun 6, 2024

enhancement: Dynamically updating CUDA EP options #256

enhancement: Dynamically updating CUDA EP options #256

Conversation

krishung5 commented Jun 6, 2024 • edited Loading

krishung5 commented Jun 6, 2024

gedoensmax commented Jun 6, 2024

tanmayv25 commented Jun 6, 2024

krishung5 commented Jun 6, 2024 •

edited

Loading