Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable calls to GenAI-Perf for profile subcommand #52

Merged
merged 26 commits into from
May 10, 2024
Merged

Conversation

dyastremsky
Copy link
Contributor

@dyastremsky dyastremsky commented Apr 29, 2024

This pull request makes it so that Triton CLI calls GenAI-Perf for its profile subcommand. All arguments get passed through to GenAI-Perf.

As part of these changes, the previous profiler functionality has been removed from Triton CLI to avoid maintaining this behavior in both places.

Unit tests have successfully passed with these changes in place in an environment with PA and GenAI-Perf.
image

GenAI-Perf demo below. Apologies for the extra errors beforehand from some arg types. Note: This was a previous iteration, --task-llm is no longer used. The reason the output tokens are off is because the mock Python model doesn't actually use the max_token inputs, it just returns the input as output.
https://github.com/triton-inference-server/triton_cli/assets/58150256/97056e68-afb8-49e9-9e07-8f6601952c3a

@dyastremsky dyastremsky marked this pull request as ready for review May 1, 2024 01:18
@dyastremsky dyastremsky self-assigned this May 1, 2024
@dyastremsky dyastremsky requested review from fpetrini15 and rmccorm4 and removed request for fpetrini15 May 1, 2024 15:29
.gitignore Outdated Show resolved Hide resolved
src/triton_cli/parser.py Outdated Show resolved Hide resolved
src/triton_cli/parser.py Outdated Show resolved Hide resolved
src/triton_cli/parser.py Outdated Show resolved Hide resolved
src/triton_cli/parser.py Outdated Show resolved Hide resolved
@fpetrini15
Copy link
Collaborator

Great work David! Only some minor tweaks and clarifications.

Just a thought, should we be setting a default value for the required arguments? For example, the profile subcommand requires concurrency as a parameter. During the arg sanitation process, would it make sense to add a default argument for it if the user didn't provide it. I think this would abstract some complexity away from the user.

CC: @rmccorm4 for thoughts.

@dyastremsky
Copy link
Contributor Author

Great work David! Only some minor tweaks and clarifications.

Just a thought, should we be setting a default value for the required arguments? For example, the profile subcommand requires concurrency as a parameter. During the arg sanitation process, would it make sense to add a default argument for it if the user didn't provide it. I think this would abstract some complexity away from the user.

CC: @rmccorm4 for thoughts.

I think we're 100% going for a passthrough approach here. This makes the Triton CLI extensible and maintains all logic unique to the tools in their own repositories. If we start moving over logic, then responsibilities are starting to overlap and we could be duplicating code. If there is a required arg for profile and we think it should set a default, Model Analyzer is the right place to fix that.

tests/test_e2e.py Outdated Show resolved Hide resolved
@dyastremsky
Copy link
Contributor Author

This work is currently on hold. I will comment the ticket once that status changes.

@dyastremsky dyastremsky marked this pull request as draft May 2, 2024 17:02
@rmccorm4
Copy link
Collaborator

rmccorm4 commented May 9, 2024

I don't think it's a strong requirement at this time, probably more of a "Nice to have". I added it to CLI because it was easy when starting from scratch and I believe PyTriton supports 3.8+

I don't mind removing the support for now if it's not a simple fix to support or just unwanted.

@dyastremsky
Copy link
Contributor Author

dyastremsky commented May 9, 2024

I don't think it's a strong requirement at this time. I added it to CLI because it was easy and I believe PyTriton supported 3.8+

I don't mind removing the support for now if it's not a simple fix to support or just unwanted.

If you're okay with it, I pushed a commit dropping it. I understand the desire for greater support though, so if we wanted to support 3.8-3.9, it would just require updating GenAI-Perf to remove the 3.10+ features and then move testing to 3.8. I'll start a conversation about it in the morning.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/triton_cli/parser.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good! Only minor comments

README.md Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
src/triton_cli/parser.py Show resolved Hide resolved
tests/test_cli.py Show resolved Hide resolved
tests/test_e2e.py Show resolved Hide resolved
src/triton_cli/profile.py Outdated Show resolved Hide resolved
src/triton_cli/profile.py Outdated Show resolved Hide resolved
@dyastremsky
Copy link
Contributor Author

As per the offline discussion, I have removed the 3.8 test for now.

@rmccorm4 @fpetrini15 This is ready for another round of review.

src/triton_cli/__init__.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than this fix for test_non_llm: https://github.com/triton-inference-server/triton_cli/pull/52/files#r1596116006

Nice work David! 🚀

Copy link
Collaborator

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@dyastremsky dyastremsky merged commit 085ad83 into main May 10, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants