-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable calls to GenAI-Perf for profile subcommand #52
Conversation
Great work David! Only some minor tweaks and clarifications. Just a thought, should we be setting a default value for the required arguments? For example, the CC: @rmccorm4 for thoughts. |
I think we're 100% going for a passthrough approach here. This makes the Triton CLI extensible and maintains all logic unique to the tools in their own repositories. If we start moving over logic, then responsibilities are starting to overlap and we could be duplicating code. If there is a required arg for profile and we think it should set a default, Model Analyzer is the right place to fix that. |
This work is currently on hold. I will comment the ticket once that status changes. |
I don't think it's a strong requirement at this time, probably more of a "Nice to have". I added it to CLI because it was easy when starting from scratch and I believe PyTriton supports 3.8+ I don't mind removing the support for now if it's not a simple fix to support or just unwanted. |
If you're okay with it, I pushed a commit dropping it. I understand the desire for greater support though, so if we wanted to support 3.8-3.9, it would just require updating GenAI-Perf to remove the 3.10+ features and then move testing to 3.8. I'll start a conversation about it in the morning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good! Only minor comments
Co-authored-by: Ryan McCormick <[email protected]>
As per the offline discussion, I have removed the 3.8 test for now. @rmccorm4 @fpetrini15 This is ready for another round of review. |
Co-authored-by: Ryan McCormick <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than this fix for test_non_llm
: https://github.com/triton-inference-server/triton_cli/pull/52/files#r1596116006
Nice work David! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳
This pull request makes it so that Triton CLI calls GenAI-Perf for its profile subcommand. All arguments get passed through to GenAI-Perf.
As part of these changes, the previous profiler functionality has been removed from Triton CLI to avoid maintaining this behavior in both places.
Unit tests have successfully passed with these changes in place in an environment with PA and GenAI-Perf.
GenAI-Perf demo below. Apologies for the extra errors beforehand from some arg types. Note: This was a previous iteration,
--task-llm
is no longer used. The reason the output tokens are off is because the mock Python model doesn't actually use the max_token inputs, it just returns the input as output.https://github.com/triton-inference-server/triton_cli/assets/58150256/97056e68-afb8-49e9-9e07-8f6601952c3a