HabanaAI / vllm-fork Public

forked from vllm-project/vllm

Notifications You must be signed in to change notification settings
Fork 51
Star 40

Code
Issues 18
Pull requests 28
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: HabanaAI/vllm-fork

Labels 12 Milestones 0

New pull request New

28 Open 380 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[BugFix][Habana_main][Multistep]Fix multistep deepcopy overhead

#452 opened Nov 1, 2024 by xuechendi

Loading…

Config hidden layer number to run in 1 lazy graph

#451 opened Nov 1, 2024 by libinta

Loading…

[CI] Add Llama2 to torch compile tests

#446 opened Oct 30, 2024 by anko-intel

Loading…

Add multi step scheduling scenario to jenkins CI

#445 opened Oct 30, 2024 by afierka-intel • Draft

Fix branch version in README_GAUDI.md

#444 opened Oct 29, 2024 by michalkuligowski

Loading…

to make repetition penalty faster

#442 opened Oct 29, 2024 by ccrhx4

Loading…

Add models-tiny CI step with Llama3.2-1B

#440 opened Oct 28, 2024 by kzawora-intel • Draft

Oct 28 rebase

#439 opened Oct 28, 2024 by kzawora-intel

Loading…

Resolved alibi bias issue by reverting to PA v1.

#438 opened Oct 28, 2024 by tannervoas742

Loading…

Resolved alibi bias issue due to porting flat PA pr

#437 opened Oct 28, 2024 by tannervoas742

Loading…

Add HPU information to collect_env script

#430 opened Oct 25, 2024 by michalkuligowski

Loading…

fix profiler end for prepare_input_tensor

#422 opened Oct 24, 2024 by jikunshang

Loading…

GPTQ Support

#421 opened Oct 23, 2024 by maktukmak

Loading…

Add DeepSeek-V2-Lite/DeepSeek-V2-Lite-Chat model support

#410 opened Oct 21, 2024 by hlin99

Loading…

[PoC] Add max padding ratio to padding aware scheduler

#407 opened Oct 18, 2024 by kzawora-intel • Draft

Create run-lm-eval-mmlu.sh

#399 opened Oct 16, 2024 by michalkuligowski • Draft

WA for OOM in qwen 2 - sync after loading weights

#398 opened Oct 16, 2024 by michalkuligowski

Loading…

[bucketing overhaul 2/n] Delegate bucket management to HPUBucketingContext

#395 opened Oct 15, 2024 by kzawora-intel

Loading…

fix hpu destructors flow and remove finish_measurements

#379 opened Oct 10, 2024 by nirda7

Loading…

[New Feature][Habana-Main] speculative_decoding HPU support

#375 opened Oct 8, 2024 by xuechendi

Loading…

Add bucket calibration, allow reading/writing bucketing configs to file

#345 opened Sep 27, 2024 by kzawora-intel

Loading…

[DO NOT MERGE] Upstream test PR

#322 opened Sep 23, 2024 by kzawora-intel

Loading…

Optimize LoRA mask creation habana

Issues or PRs submitted by Habana Labs

#285 opened Sep 13, 2024 by SanjuCSudhakaran • Draft

Draft: Add max-num-prefill-seqs parameter habana

Issues or PRs submitted by Habana Labs

#253 opened Sep 6, 2024 by kzawora-intel • Draft

enabling multi-node serving on Gaudi ray cluster intel

Issues or PRs submitted by Intel

#218 opened Aug 29, 2024 by vishnumadhu365

Loading…

Previous 1 2 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-10-01.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly