v0.1.5

github-actions released this 21 Jun 22:54

· 82 commits to main since this release

Major changes

added stream options to include usage info in response
fix multiple gpu cuda graph capture issue

What's Changed

feat: added include_usage into stream options for stream scenarios by @guocuimi in #243
feat: added unittests for openai server by @guocuimi in #244
[minor] use available memory to caculate cache_size by default. by @liutongxuan in #245
refactor: only do sampling in driver worker (rank=0) by @guocuimi in #247
fix multiple devices cuda graph capture issue by @guocuimi in #248
revert torch.cuda.empty_cache change by @guocuimi in #249
ci: added release workflow by @guocuimi in #250
fix workflow by @guocuimi in #251
fix: pass in secrets for workflow calls. by @guocuimi in #252

Full Changelog: v0.1.4...v0.1.5

Contributors

liutongxuan and guocuimi

Assets 26