Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profile subcommand to run perf analyzer #13

Merged
merged 3 commits into from
Dec 15, 2023
Merged

Conversation

matthewkotila
Copy link
Contributor

@matthewkotila matthewkotila commented Dec 15, 2023

Example output:

$ triton model profile -m llama
pull engine()
run_server()
profile()
Warming up...
Warmed up, profiling now...
[ BENCHMARK SUMMARY ]
Prompt size: --
  * Max first token latency: -- ms
  * Min first token latency: -- ms
  * Avg first token latency: -- ms
  * p50 first token latency: -- ms
  * p90 first token latency: -- ms
  * p95 first token latency: -- ms
  * p99 first token latency: -- ms
  * Max generation latency: -- ms
  * Min generation latency: -- ms
  * Avg generation latency: -- ms
  * p50 generation latency: -- ms
  * p90 generation latency: -- ms
  * p95 generation latency: -- ms
  * p99 generation latency: -- ms
  * Avg output token latency: -- ms/output token
  * Avg total token-to-token latency: -- ms
  * Max end-to-end latency: -- ms
  * Min end-to-end latency: -- ms
  * Avg end-to-end latency: -- ms
  * p50 end-to-end latency: -- ms
  * p90 end-to-end latency: -- ms
  * p95 end-to-end latency: -- ms
  * p99 end-to-end latency: -- ms
  * Max end-to-end throughput: -- tokens/s
  * Min end-to-end throughput: -- tokens/s
  * Avg end-to-end throughput: -- tokens/s
  * p50 end-to-end throughput: -- tokens/s
  * p90 end-to-end throughput: -- tokens/s
  * p95 end-to-end throughput: -- tokens/s
  * p99 end-to-end throughput: -- tokens/s
  * Max generation throughput: -- output tokens/s
  * Min generation throughput: -- output tokens/s
  * Avg generation throughput: -- output tokens/s
  * p50 generation throughput: -- output tokens/s
  * p90 generation throughput: -- output tokens/s
  * p95 generation throughput: -- output tokens/s
  * p99 generation throughput: -- output tokens/s

cc @nv-hwoo

@matthewkotila matthewkotila force-pushed the hwoo-triton-profile branch 2 times, most recently from 8058cab to 733d5b3 Compare December 15, 2023 01:57
input_data = {
"data": [
{
"prompt": [""],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be updated to new text_input based inputs etc.

@rmccorm4 rmccorm4 merged commit c85480a into main Dec 15, 2023
3 checks passed
@matthewkotila matthewkotila deleted the hwoo-triton-profile branch October 21, 2024 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants