Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Support for Llama 70-B #409

Merged
merged 27 commits into from
Aug 19, 2023
Merged

Support for Llama 70-B #409

merged 27 commits into from
Aug 19, 2023

Conversation

AmineDiro
Copy link
Contributor

@AmineDiro AmineDiro commented Aug 17, 2023

Hello,
Solves #402 .

This is a temporary fix for supporting the Llama-2 70B model. I wanted to open a draft PR to get your feedbacks on this implementation for supporting the n_gqa params :

  • Added n_gqa as a optional param in ModelParameters
  • Added LlamaModelVersion enum akin to other e_model union in llama.cpp
  • Modified self-attention evaluation to use the n_head_kv for K and V instead of n_head

Here is the llama-2-70B--chat.ggmlv3.q4_0.bin model loaded on A100 GPU :
Annotation 2023-08-17 203513

@AmineDiro AmineDiro marked this pull request as draft August 17, 2023 14:48
@AmineDiro AmineDiro marked this pull request as ready for review August 17, 2023 18:36
@@ -332,6 +332,7 @@ fn enable_cublas(build: &mut cc::Build, out_dir: &Path) {
.arg("static")
.arg("--generate-code=arch=compute_52,code=[compute_52,sm_52]")
.arg("--generate-code=arch=compute_61,code=[compute_61,sm_61]")
.arg("--generate-code=arch=compute_75,code=[compute_75,sm_75]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for newer cards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I had issues with my card RTX2070, I guess we should also support 8.x architectures like A100

Comment on lines 491 to 498
enum LlamaModelVersion {
Model3b,
Model7b,
Model13b,
Model30b,
Model65b,
Model70b,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version hints at the actual llama model version (1 or 2), maybe rename it into LlamaModelType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, I changed this in [a16caba] (a16caba)

@LLukas22
Copy link
Contributor

Looks good, some small nitpicks but if the CI passes it should be good to go 👍

@AmineDiro
Copy link
Contributor Author

@LLukas22 Thanks for the review 👍🏼 !

@LLukas22
Copy link
Contributor

Thanks for implementing this :D

@LLukas22 LLukas22 merged commit 2f6ffd4 into rustformers:main Aug 19, 2023
14 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants