Fix vLLM profiler bug, add fallback logic to server start, cleanup #20

rmccorm4 · 2024-01-10T00:07:45Z

Fix vLLM profiler bug
- profiler script was using hard-coded TRT LLM backend and sending the wrong payload (sampling parameters) to vLLM backend, causing issues with calculations and expected number of tokens
- vLLM has a default "max_tokens" of 16, and we weren't correctly overriding it if backend == "trtllm"
- Added logic to get model backend and pass it to profiler script for context
Add fallback logic to triton server start
- Rather than deciding between default of "local" or "docker", which has a better default depending on environment, I moved to a "fallback" logic by default. So it will try "local" first, and if it fails to find "tritonserver" binary then it will try "docker" mode.
- If you explicitly specify a mode, it will only try that one.
Unify logic and helper functions between "bench" and other commands
Add more defaults to argparse help texts

Locally fixed and verified that "all-in-one" bench workflow, and individual subcommand workflows behave the same:

triton bench -m gpt2

and

triton repo clear
triton repo add -m gpt2
triton server start
triton model profile -m gpt2

…when no mode is specified, unify server/profile code to helper functions, add more argparse default info

src/triton_cli/parser.py

fpetrini15 · 2024-01-10T02:10:18Z

LGTM, except for the nit and undefined server error. A lot of great adds here!

…onflicts

Co-authored-by: Francesco Petrini <[email protected]>

src/triton_cli/parser.py

Fix vLLM profiler throughput bug, add fallback logic to server start …

901187d

…when no mode is specified, unify server/profile code to helper functions, add more argparse default info

rmccorm4 requested review from fpetrini15 and nv-hwoo January 10, 2024 00:07

Clarify help text on server start --mode

7bef17e

fpetrini15 reviewed Jan 10, 2024

View reviewed changes

src/triton_cli/parser.py Show resolved Hide resolved

fpetrini15 reviewed Jan 10, 2024

View reviewed changes

src/triton_cli/parser.py Outdated Show resolved Hide resolved

rmccorm4 and others added 2 commits January 9, 2024 18:19

Cleanup fallback server error logging, remove outdated logs on port c…

390f481

…onflicts

Fix typo

8625747

Co-authored-by: Francesco Petrini <[email protected]>

rmccorm4 requested a review from fpetrini15 January 10, 2024 02:23

fpetrini15 approved these changes Jan 10, 2024

View reviewed changes

nv-hwoo reviewed Jan 10, 2024

View reviewed changes

src/triton_cli/parser.py Show resolved Hide resolved

nv-hwoo approved these changes Jan 10, 2024

View reviewed changes

rmccorm4 merged commit 45d2f36 into main Jan 10, 2024
3 checks passed

rmccorm4 deleted the rmccormick-vllm branch January 10, 2024 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix vLLM profiler bug, add fallback logic to server start, cleanup #20

Fix vLLM profiler bug, add fallback logic to server start, cleanup #20

rmccorm4 commented Jan 10, 2024 •

edited

Loading

fpetrini15 commented Jan 10, 2024

Fix vLLM profiler bug, add fallback logic to server start, cleanup #20

Fix vLLM profiler bug, add fallback logic to server start, cleanup #20

Conversation

rmccorm4 commented Jan 10, 2024 • edited Loading

fpetrini15 commented Jan 10, 2024

rmccorm4 commented Jan 10, 2024 •

edited

Loading