Add profile subcommand to run perf analyzer #13

matthewkotila · 2023-12-15T01:38:47Z

Example output:

$ triton model profile -m llama
pull engine()
run_server()
profile()
Warming up...
Warmed up, profiling now...
[ BENCHMARK SUMMARY ]
Prompt size: --
  * Max first token latency: -- ms
  * Min first token latency: -- ms
  * Avg first token latency: -- ms
  * p50 first token latency: -- ms
  * p90 first token latency: -- ms
  * p95 first token latency: -- ms
  * p99 first token latency: -- ms
  * Max generation latency: -- ms
  * Min generation latency: -- ms
  * Avg generation latency: -- ms
  * p50 generation latency: -- ms
  * p90 generation latency: -- ms
  * p95 generation latency: -- ms
  * p99 generation latency: -- ms
  * Avg output token latency: -- ms/output token
  * Avg total token-to-token latency: -- ms
  * Max end-to-end latency: -- ms
  * Min end-to-end latency: -- ms
  * Avg end-to-end latency: -- ms
  * p50 end-to-end latency: -- ms
  * p90 end-to-end latency: -- ms
  * p95 end-to-end latency: -- ms
  * p99 end-to-end latency: -- ms
  * Max end-to-end throughput: -- tokens/s
  * Min end-to-end throughput: -- tokens/s
  * Avg end-to-end throughput: -- tokens/s
  * p50 end-to-end throughput: -- tokens/s
  * p90 end-to-end throughput: -- tokens/s
  * p95 end-to-end throughput: -- tokens/s
  * p99 end-to-end throughput: -- tokens/s
  * Max generation throughput: -- output tokens/s
  * Min generation throughput: -- output tokens/s
  * Avg generation throughput: -- output tokens/s
  * p50 generation throughput: -- output tokens/s
  * p90 generation throughput: -- output tokens/s
  * p95 generation throughput: -- output tokens/s
  * p99 generation throughput: -- output tokens/s

cc @nv-hwoo

rmccorm4 · 2023-12-15T04:23:27Z

src/triton_cli/profiler.py

+        input_data = {
+            "data": [
+                {
+                    "prompt": [""],


needs to be updated to new text_input based inputs etc.

matthewkotila requested a review from rmccorm4 December 15, 2023 01:38

matthewkotila force-pushed the hwoo-triton-profile branch 2 times, most recently from 8058cab to 733d5b3 Compare December 15, 2023 01:57

Add profile subcommand to run perf analyzer

e415cd3

matthewkotila force-pushed the hwoo-triton-profile branch from 733d5b3 to e415cd3 Compare December 15, 2023 02:16

rmccorm4 reviewed Dec 15, 2023

View reviewed changes

nv-hwoo added 2 commits December 15, 2023 04:53

Fix precommit hooks

b5829d0

Revert back the input tensor names

2e205cb

rmccorm4 approved these changes Dec 15, 2023

View reviewed changes

rmccorm4 merged commit c85480a into main Dec 15, 2023
3 checks passed

matthewkotila deleted the hwoo-triton-profile branch October 21, 2024 23:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add profile subcommand to run perf analyzer #13

Add profile subcommand to run perf analyzer #13

matthewkotila commented Dec 15, 2023 •

edited

Loading

rmccorm4 Dec 15, 2023

Add profile subcommand to run perf analyzer #13

Add profile subcommand to run perf analyzer #13

Conversation

matthewkotila commented Dec 15, 2023 • edited Loading

rmccorm4 Dec 15, 2023

Choose a reason for hiding this comment

matthewkotila commented Dec 15, 2023 •

edited

Loading