Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: Dynamically updating CUDA EP options #256

Merged
merged 4 commits into from
Jun 6, 2024
Merged

Conversation

krishung5
Copy link
Contributor

@krishung5 krishung5 commented Jun 6, 2024

This PR is based on #242 to add the flexibility to allow users to provide CUDA EP options in a flexible fashion. Note that this PR doesn't include the update for TRT options.

Before, the CUDA EP options are set within the parameters field:

...
parameters { key: "cudnn_conv_algo_search" value: { string_value: "0" } }
parameters { key: "gpu_mem_limit" value: { string_value: "4294967200" } }
...

After this enhancement, users are able to set any kinds of CUDA EP options just like the TRT EP:

optimization { execution_accelerators {
  gpu_execution_accelerator : [ {
    name : "cuda"
    parameters { key: "cudnn_conv_use_max_workspace" value: "0" }
    parameters { key: "use_ep_level_unified_stream" value: "1" }}
  ]
}}

For backward compatibility, setting CUDA EP options within the parameters field should still work.

Related PRs:
triton-inference-server/backend#100
triton-inference-server/core#368
Testing: triton-inference-server/server#7328

@krishung5
Copy link
Contributor Author

Updated doc in commit 6ecda12. Can ignore the commit for formatting.

@gedoensmax
Copy link
Contributor

Thanks a lot for taking a look at my PR. This looks great to me.

@tanmayv25
Copy link
Contributor

@gedoensmax @krishung5 Great work improving the parameter specifications in ORT backend!

@krishung5 krishung5 merged commit d992c5b into main Jun 6, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants