-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen2.5 #1863
Qwen2.5 #1863
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1863
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 9271c5c with merge base 09c2619 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm interested to get more opinions here, but I think it would be clearer to have a fully separate 2.5 folder even if a few of the models have the exact same definitions. For the .5, 1.5, and 7B models you can directly have them call the qwen2 builders if you want. But this way we have separate tokenizers (without introducing the concept of tokenizer versions) and separate recipes.
Additionally, I want to see more information on how the new models were checked for accuracy and more configs for the 2.5 model.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1863 +/- ##
==========================================
+ Coverage 67.30% 69.32% +2.02%
==========================================
Files 304 311 +7
Lines 16000 16227 +227
==========================================
+ Hits 10768 11250 +482
+ Misses 5232 4977 -255 ☔ View full report in Codecov by Sentry. |
Hi thank you for your great work! I wonder whether it will support qwen2.5-math (instead of qwen2.5)? |
@fzyzcjy Yep! It's the same model just with different weights. So for example, run For best results, you should preprocess your dataset to match the expected data format of Qwen2.5-Math. They recommend CoT or TIR reasoning, so a system prompt like "Please reason step by step, and put your final answer within \boxed{}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a bunch of comments, also wanna +1 @pbontrager's comment here. We probably wanna at least have some E2E runs with eval results for a couple of different model sizes
def __call__( | ||
self, | ||
messages: List[Message], | ||
inference: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is in the interface, but I don't see it used here (or in any of our other prompt templates). Can we remove it? @pbontrager
oh @ebsmothers I just realized what happened, I was looking at the qwen2.5 instruct configs and you were looking at the base model configs. I just assumed they would be the same models with different params, since instruct should just be a finetune of the base model. I'll create separate model builders for the instruct and base models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome, I just left a few minor comments. Will let @ebsmothers comment more on the model builders.
|
||
The extra text will still get tokenized as normal text, not as special tokens. | ||
Default: None | ||
errors (str): Paradigm to follow when decoding bytes to UTF-8. Defaults to "replace". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would make this parameter name a bit more descriptive, like decode_error_handler
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're missing some default configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not too difficult, would be awesome if you can expose packed
and compile
in all the configs.
I'm also a bit confused on why some model builders have a base and instruct version and some don't. Would also be great if you could add to the docstring why the instruct is different from the base (seems like different seq len?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Legendary PR, thanks for adding this model and all its variants
By default, we set the cache size equals to size of the official Qwen2 tokenizer. | ||
|
||
Example: | ||
>>> tokenizer = Qwen2Tokenizer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update this example?
Context
What is the purpose of this PR? Is it to
Issue #1624
Changelog
Test plan
Checked tokenizer and chat template parity with the current official Qwen2.5 implementation (huggingface).
Ran Qwen2.5 finetunes
Checklist
pre-commit install
)pytest tests
pytest tests -m integration_test