v0.1.5
Major changes
- added stream options to include usage info in response
- fix multiple gpu cuda graph capture issue
What's Changed
- feat: added include_usage into stream options for stream scenarios by @guocuimi in #243
- feat: added unittests for openai server by @guocuimi in #244
- [minor] use available memory to caculate cache_size by default. by @liutongxuan in #245
- refactor: only do sampling in driver worker (rank=0) by @guocuimi in #247
- fix multiple devices cuda graph capture issue by @guocuimi in #248
- revert torch.cuda.empty_cache change by @guocuimi in #249
- ci: added release workflow by @guocuimi in #250
- fix workflow by @guocuimi in #251
- fix: pass in secrets for workflow calls. by @guocuimi in #252
Full Changelog: v0.1.4...v0.1.5