Releases: OpenCSGs/llm-inference
Releases · OpenCSGs/llm-inference
v0.1.0
What's Changed
- Format python code using autopep8 and add pylint by @jasonhe258 in #12
- Fix default pipeline output problem by @jasonhe258 in #17
- Enable multiple workers cooperated on batch prompt by @depenglee1707 in #19
- Push image to opencsg registry by @SeanHH86 in #25
- Refine some parameters for initialization(will keep refining...) and fix Qwen issues by @depenglee1707 in #24
- Remove initializer: transformerpipeline by @depenglee1707 in #26
- Update warmup mechanism by @depenglee1707 in #27
- Fix output pipeline format not work by @depenglee1707 in #29
- Fix issue from_pretrain has no parameter device by @depenglee1707 in #28
- Enable warmup for defaulttransformers by @depenglee1707 in #30
- Break non text generation model by warmup by @depenglee1707 in #31
- Fix bug for set pad_token by @SeanHH86 in #32
- Devicemap not work on mps, since put data in wrong device, fix it by @depenglee1707 in #34
- Refine config for model: deepseek-coder-1.3b by @depenglee1707 in #35
- Refactor the "defaulttransformers" to meet the common design of class "pipeline" by @depenglee1707 in #36
- Fix broken yamls by @depenglee1707 in #39
- Add opencsg-deepseek-coder-1.3b by @SeanHH86 in #38
- Remove some abandoned implements by @depenglee1707 in #40
- Fix issue caused by huggingface pipeline with text-generation by @depenglee1707 in #41
- Update config.py for new model by @SeanHH86 in #42
- Update deepseek yaml file by @SeanHH86 in #43
- Add Qwen1.5-72B-chat by @SeanHH86 in #44
- Add parameter for timeout by @SeanHH86 in #46
- Fix max-token conflict w/ DS by @depenglee1707 in #49
- Fix output issue 4 ui by @depenglee1707 in #50
- Enable "use_bettertransformer" and "torch_compile" by @depenglee1707 in #51
- Enable chat template for huggingface transformer by @depenglee1707 in #54
- Update ray to 2.9.3 by @SeanHH86 in #56
- Enable prompt template for gguf format inference by @depenglee1707 in #57
- Refactor the solution of vllm integration by @depenglee1707 in #60
- Fix load json data with '\n' failed by @SeanHH86 in #62
- Fix json format issue for "transformerpipeline" by @depenglee1707 in #63
- Enable chat template applied for vllm integration by @depenglee1707 in #65
- Add streaming API support by @SeanHH86 in #66
- Make scale out policy consistent between deployments by @depenglee1707 in #70
- Add Qwen1.5-72B-GGUF yaml and fix load json input error by @SeanHH86 in #71
- Correct vllm version by @depenglee1707 in #73
- Fix generate bug for stream api of llamacpp by @SeanHH86 in #74
- Fix stream without prompt format by @SeanHH86 in #75
- Fix path params issue, make interface consistent by @depenglee1707 in #78
- Enhance name of router for comparation scenario by @depenglee1707 in #79
- Fix issue: stream generation is slow by @depenglee1707 in #80
- Fix prompt is not string bug by @SeanHH86 in #81
- Refactor streaming by @depenglee1707 in #82
- Enhance llamacpp integration to share soma logic between streaming and predict by @depenglee1707 in #83
- Fix issue: non-support streaming pipeline cannot work when call it as stream by @depenglee1707 in #84
New Contributors
- @pulltheflower made their first contribution in #9
Full Changelog: v0.0.1...v0.1.0
v0.0.1 tag release
First tag release for the project.
What's Changed
- Set min replica to 1 for opt-125m by @SeanHH86 in #1
- replace / to -- in model id by @SeanHH86 in #2
- enhance model if for cli by @jasonhe258 in #4
- Fix loading issue for non text-generation models by @depenglee1707 in #5
- Fix output issue for default transformers pipeline by @jasonhe258 in #6
- Add cn readme and license by @jasonhe258 in #7
New Contributors
- @SeanHH86 made their first contribution in #1
- @jasonhe258 made their first contribution in #4
- @depenglee1707 made their first contribution in #5
Full Changelog: https://github.com/OpenCSGs/llm-inference/commits/v0.0.1