[Hardware][Tenstorrent] Modify offline_inference_tt.py to include max_tokens arg #25

milank94 · 2024-10-21T11:17:39Z

Modify offline_inference_tt.py to include the following changes:

add max_tokens as an input argument for the user to set the desired output length
set max_tokens default length to 128
change TTFT output stats to per user from batch reporting

github-actions · 2024-10-21T11:17:50Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

examples/offline_inference_tt.py

milank94 requested a review from skhorasganiTT October 21, 2024 11:17

milank94 requested a review from cglagovichTT October 21, 2024 11:21

cglagovichTT approved these changes Oct 21, 2024

View reviewed changes

skhorasganiTT reviewed Oct 21, 2024

View reviewed changes

examples/offline_inference_tt.py Outdated Show resolved Hide resolved

Modify offline_inference_tt.py to include max_tokens arg

f8e8324

milank94 force-pushed the mkordic/tt_offline_inference branch from f69748e to f8e8324 Compare October 21, 2024 15:00

skhorasganiTT approved these changes Oct 21, 2024

View reviewed changes

milank94 merged commit 18f0e91 into dev Oct 21, 2024

milank94 deleted the mkordic/tt_offline_inference branch October 21, 2024 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hardware][Tenstorrent] Modify offline_inference_tt.py to include max_tokens arg #25

[Hardware][Tenstorrent] Modify offline_inference_tt.py to include max_tokens arg #25

milank94 commented Oct 21, 2024

github-actions bot commented Oct 21, 2024

[Hardware][Tenstorrent] Modify offline_inference_tt.py to include max_tokens arg #25

[Hardware][Tenstorrent] Modify offline_inference_tt.py to include max_tokens arg #25

Conversation

milank94 commented Oct 21, 2024

github-actions bot commented Oct 21, 2024