Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sliding window attention to Mistral and Phi 3 #1741

Merged
merged 2 commits into from
Sep 24, 2024
Merged

Add sliding window attention to Mistral and Phi 3 #1741

merged 2 commits into from
Sep 24, 2024

Conversation

rasbt
Copy link
Collaborator

@rasbt rasbt commented Sep 24, 2024

The original Mistral v0.1 and Phi 3 models have sliding window attention. Adding it here for consistency with the original models.

Fixes #1598

@rasbt rasbt requested a review from lantiga as a code owner September 24, 2024 19:45
@rasbt rasbt merged commit be8b28d into main Sep 24, 2024
8 of 9 checks passed
@rasbt rasbt deleted the rasbt-patch-1 branch September 24, 2024 20:47
intermediate_size=86,
)

T = 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not reassign T.
Otherwise, an input will be too small to show the difference between sliding window of half the context size and the global window.
In this case you'll get false positive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, this must have been a copy & paste error. Thanks! Addressing it here #1742

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apply Sliding Window Attention to Mistral
2 participants