Release v0.1.0 · OpenCSGs/llm-inference

What's Changed

Format python code using autopep8 and add pylint by @jasonhe258 in #12
Fix default pipeline output problem by @jasonhe258 in #17
Enable multiple workers cooperated on batch prompt by @depenglee1707 in #19
Push image to opencsg registry by @SeanHH86 in #25
Refine some parameters for initialization(will keep refining...) and fix Qwen issues by @depenglee1707 in #24
Remove initializer: transformerpipeline by @depenglee1707 in #26
Update warmup mechanism by @depenglee1707 in #27
Fix output pipeline format not work by @depenglee1707 in #29
Fix issue from_pretrain has no parameter device by @depenglee1707 in #28
Enable warmup for defaulttransformers by @depenglee1707 in #30
Break non text generation model by warmup by @depenglee1707 in #31
Fix bug for set pad_token by @SeanHH86 in #32
Devicemap not work on mps, since put data in wrong device, fix it by @depenglee1707 in #34
Refine config for model: deepseek-coder-1.3b by @depenglee1707 in #35
Refactor the "defaulttransformers" to meet the common design of class "pipeline" by @depenglee1707 in #36
Fix broken yamls by @depenglee1707 in #39
Add opencsg-deepseek-coder-1.3b by @SeanHH86 in #38
Remove some abandoned implements by @depenglee1707 in #40
Fix issue caused by huggingface pipeline with text-generation by @depenglee1707 in #41
Update config.py for new model by @SeanHH86 in #42
Update deepseek yaml file by @SeanHH86 in #43
Add Qwen1.5-72B-chat by @SeanHH86 in #44
Add parameter for timeout by @SeanHH86 in #46
Fix max-token conflict w/ DS by @depenglee1707 in #49
Fix output issue 4 ui by @depenglee1707 in #50
Enable "use_bettertransformer" and "torch_compile" by @depenglee1707 in #51
Enable chat template for huggingface transformer by @depenglee1707 in #54
Update ray to 2.9.3 by @SeanHH86 in #56
Enable prompt template for gguf format inference by @depenglee1707 in #57
Refactor the solution of vllm integration by @depenglee1707 in #60
Fix load json data with '\n' failed by @SeanHH86 in #62
Fix json format issue for "transformerpipeline" by @depenglee1707 in #63
Enable chat template applied for vllm integration by @depenglee1707 in #65
Add streaming API support by @SeanHH86 in #66
Make scale out policy consistent between deployments by @depenglee1707 in #70
Add Qwen1.5-72B-GGUF yaml and fix load json input error by @SeanHH86 in #71
Correct vllm version by @depenglee1707 in #73
Fix generate bug for stream api of llamacpp by @SeanHH86 in #74
Fix stream without prompt format by @SeanHH86 in #75
Fix path params issue, make interface consistent by @depenglee1707 in #78
Enhance name of router for comparation scenario by @depenglee1707 in #79
Fix issue: stream generation is slow by @depenglee1707 in #80
Fix prompt is not string bug by @SeanHH86 in #81
Refactor streaming by @depenglee1707 in #82
Enhance llamacpp integration to share soma logic between streaming and predict by @depenglee1707 in #83
Fix issue: non-support streaming pipeline cannot work when call it as stream by @depenglee1707 in #84

New Contributors

@pulltheflower made their first contribution in #9

Full Changelog: v0.0.1...v0.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0

What's Changed

New Contributors

Contributors