Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LayerSkip] Self-Speculative Decoding #642

Open
mostafaelhoushi opened this issue Jul 8, 2024 · 0 comments
Open

[LayerSkip] Self-Speculative Decoding #642

mostafaelhoushi opened this issue Jul 8, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@mostafaelhoushi
Copy link

Describe the solution you would like:
Implement self-speculative decoding as described in this paper where the earlier layers act as the draft stage and remaining layers act as the verification stage.

Describe the alternatives you have considered:
There are different options to implement that:

  • Implement regular Speculative Decoding where the draft stage is a separate model, and then Self-Speculative Decoding could be implemented by providing a subset of the layers as the draft model (e.g., this implementation)
    • If we use this setup, we can add some flags to inform earlier layers if they are running the draft stage or verification stage
  • Directly implement Self-Speculative Decoding as done here

Additional Context:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant