Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix vLLM profiler bug, add fallback logic to server start, cleanup #20

Merged
merged 4 commits into from
Jan 10, 2024

Conversation

rmccorm4
Copy link
Collaborator

@rmccorm4 rmccorm4 commented Jan 10, 2024

  • Fix vLLM profiler bug
    • profiler script was using hard-coded TRT LLM backend and sending the wrong payload (sampling parameters) to vLLM backend, causing issues with calculations and expected number of tokens
    • vLLM has a default "max_tokens" of 16, and we weren't correctly overriding it if backend == "trtllm"
    • Added logic to get model backend and pass it to profiler script for context
  • Add fallback logic to triton server start
    • Rather than deciding between default of "local" or "docker", which has a better default depending on environment, I moved to a "fallback" logic by default. So it will try "local" first, and if it fails to find "tritonserver" binary then it will try "docker" mode.
    • If you explicitly specify a mode, it will only try that one.
  • Unify logic and helper functions between "bench" and other commands
  • Add more defaults to argparse help texts

Locally fixed and verified that "all-in-one" bench workflow, and individual subcommand workflows behave the same:

triton bench -m gpt2

and

triton repo clear
triton repo add -m gpt2
triton server start
triton model profile -m gpt2

…when no mode is specified, unify server/profile code to helper functions, add more argparse default info
@fpetrini15
Copy link
Collaborator

LGTM, except for the nit and undefined server error. A lot of great adds here!

@rmccorm4 rmccorm4 merged commit 45d2f36 into main Jan 10, 2024
3 checks passed
@rmccorm4 rmccorm4 deleted the rmccormick-vllm branch January 10, 2024 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants