forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 51
Pull requests: HabanaAI/vllm-fork
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[BugFix][Habana_main][Multistep]Fix multistep deepcopy overhead
#452
opened Nov 1, 2024 by
xuechendi
Loading…
Resolved alibi bias issue due to porting flat PA pr
#437
opened Oct 28, 2024 by
tannervoas742
Loading…
Add DeepSeek-V2-Lite/DeepSeek-V2-Lite-Chat model support
#410
opened Oct 21, 2024 by
hlin99
Loading…
[PoC] Add max padding ratio to padding aware scheduler
#407
opened Oct 18, 2024 by
kzawora-intel
•
Draft
WA for OOM in qwen 2 - sync after loading weights
#398
opened Oct 16, 2024 by
michalkuligowski
Loading…
[bucketing overhaul 2/n] Delegate bucket management to HPUBucketingContext
#395
opened Oct 15, 2024 by
kzawora-intel
Loading…
[New Feature][Habana-Main] speculative_decoding HPU support
#375
opened Oct 8, 2024 by
xuechendi
Loading…
Add bucket calibration, allow reading/writing bucketing configs to file
#345
opened Sep 27, 2024 by
kzawora-intel
Loading…
Optimize LoRA mask creation
habana
Issues or PRs submitted by Habana Labs
#285
opened Sep 13, 2024 by
SanjuCSudhakaran
•
Draft
Draft: Add max-num-prefill-seqs parameter
habana
Issues or PRs submitted by Habana Labs
#253
opened Sep 6, 2024 by
kzawora-intel
•
Draft
enabling multi-node serving on Gaudi ray cluster
intel
Issues or PRs submitted by Intel
#218
opened Aug 29, 2024 by
vishnumadhu365
Loading…
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-10-01.