Skip to content

Releases: feifeibear/LLMSpeculativeSampling

0.1.1

21 Sep 03:53
1da8d0a
Compare
Choose a tag to compare

Add serving logic. You can launch bloom-based LLM speculative sampling as a server.

0.1.0

19 Sep 08:07
c3e97f5
Compare
Choose a tag to compare

Demonstrate Speculative Sampling using bloom 560m and 7b1 models.
Support KV Cache Optimization.
Only works for batch size as 1.