Qwen2.5 #1863

calvinpelletier · 2024-10-18T04:58:02Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Issue #1624

Changelog

updated the list of special tokens
created a prompt template
added model builders
added a unit test for Qwen2.5 tokenization
moved special token logic from template to tokenizer
created separate qwen2/2.5 tokenizers
separated base and instruct model builders
created various configs

Test plan

Checked tokenizer and chat template parity with the current official Qwen2.5 implementation (huggingface).

Ran Qwen2.5 finetunes

did a sweep through each Qwen2.5 config to make sure each runs without issue and starts to improve the loss
comparison run between Qwen versions: Qwen2.5 7B lora single-gpu: https://api.wandb.ai/links/torchtune/vraolx0j (vs Qwen2 equivalent: https://api.wandb.ai/links/torchtune/h6l8s9ou )

Checklist

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (see above for links to loss curves)

pytorch-bot · 2024-10-18T04:58:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1863

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9271c5c with merge base 09c2619 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pbontrager

I'm interested to get more opinions here, but I think it would be clearer to have a fully separate 2.5 folder even if a few of the models have the exact same definitions. For the .5, 1.5, and 7B models you can directly have them call the qwen2 builders if you want. But this way we have separate tokenizers (without introducing the concept of tokenizer versions) and separate recipes.

Additionally, I want to see more information on how the new models were checked for accuracy and more configs for the 2.5 model.

recipes/configs/qwen2_5/3B_full.yaml

torchtune/models/qwen2/_prompt_template.py

torchtune/models/qwen2/_tokenizer.py

torchtune/models/qwen2/_prompt_template.py

codecov-commenter · 2024-10-20T19:49:51Z

Codecov Report

Attention: Patch coverage is 12.08791% with 80 lines in your changes missing coverage. Please review.

Project coverage is 69.32%. Comparing base (c70ad29) to head (9595a46).
Report is 13 commits behind head on main.

Files with missing lines	Patch %	Lines
torchtune/models/qwen2_5/_model_builders.py	0.00%	41 Missing ⚠️
torchtune/models/qwen2_5/_prompt_template.py	0.00%	36 Missing ⚠️
torchtune/models/qwen2_5/__init__.py	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1863      +/-   ##
==========================================
+ Coverage   67.30%   69.32%   +2.02%     
==========================================
  Files         304      311       +7     
  Lines       16000    16227     +227     
==========================================
+ Hits        10768    11250     +482     
+ Misses       5232     4977     -255

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fzyzcjy · 2024-10-22T06:43:31Z

Hi thank you for your great work!

I wonder whether it will support qwen2.5-math (instead of qwen2.5)?

calvinpelletier · 2024-10-22T15:30:39Z

Hi thank you for your great work!

I wonder whether it will support qwen2.5-math (instead of qwen2.5)?

@fzyzcjy Yep! It's the same model just with different weights. So for example, run tune download Qwen/Qwen2.5-Math-7B-Instruct --output-dir /tmp/Qwen2_5-Math-7B-Instruct --ignore-patterns None then use the qwen2_5/7B_lora_single_device config with the checkpointer.checkpoint_dir updated to /tmp/Qwen2_5-Math-7B-Instruct.

For best results, you should preprocess your dataset to match the expected data format of Qwen2.5-Math. They recommend CoT or TIR reasoning, so a system prompt like "Please reason step by step, and put your final answer within \boxed{}."

ebsmothers

Left a bunch of comments, also wanna +1 @pbontrager's comment here. We probably wanna at least have some E2E runs with eval results for a couple of different model sizes

torchtune/models/qwen2_5/__init__.py

torchtune/models/qwen2_5/_model_builders.py

torchtune/models/qwen2_5/_prompt_template.py

ebsmothers · 2024-10-24T00:15:32Z

torchtune/models/qwen2_5/_prompt_template.py

+    def __call__(
+        self,
+        messages: List[Message],
+        inference: bool = False,


I know this is in the interface, but I don't see it used here (or in any of our other prompt templates). Can we remove it? @pbontrager

torchtune/models/qwen2_5/_prompt_template.py

calvinpelletier · 2024-10-24T17:34:29Z

oh @ebsmothers I just realized what happened, I was looking at the qwen2.5 instruct configs and you were looking at the base model configs. I just assumed they would be the same models with different params, since instruct should just be a finetune of the base model. I'll create separate model builders for the instruct and base models.

RdoubleA

Looks awesome, I just left a few minor comments. Will let @ebsmothers comment more on the model builders.

torchtune/models/qwen2/_tokenizer.py

RdoubleA · 2024-10-28T14:18:21Z

torchtune/models/qwen2_5/_tokenizer.py

+
+            The extra text will still get tokenized as normal text, not as special tokens.
+            Default: None
+        errors (str): Paradigm to follow when decoding bytes to UTF-8. Defaults to "replace".


would make this parameter name a bit more descriptive, like decode_error_handler?

torchtune/models/qwen2_5/_tokenizer.py

recipes/configs/qwen2_5/32B_lora.yaml

recipes/configs/qwen2_5/72B_lora.yaml

joecummings

I think you're missing some default configs.

RdoubleA

If it's not too difficult, would be awesome if you can expose packed and compile in all the configs.

I'm also a bit confused on why some model builders have a base and instruct version and some don't. Would also be great if you could add to the docstring why the instruct is different from the base (seems like different seq len?)

recipes/configs/qwen2_5/0_5B_full.yaml

recipes/configs/qwen2_5/32B_lora.yaml

tests/torchtune/models/qwen2_5/test_tokenizer.py

torchtune/models/qwen2_5/_model_builders.py

torchtune/models/qwen2_5/_tokenizer.py

RdoubleA

Legendary PR, thanks for adding this model and all its variants

RdoubleA · 2024-10-31T14:11:49Z

torchtune/models/qwen2_5/_tokenizer.py

+            By default, we set the cache size equals to size of the official Qwen2 tokenizer.
+
+    Example:
+        >>> tokenizer = Qwen2Tokenizer(


Could you update this example?

calvinpelletier added 11 commits October 15, 2024 04:01

additional special tokens

5b2acbb

chat template

5c18e35

.

6c7c0a9

tool response

faa9249

qwen2.5 model builders

0af29a1

qwen2.5 lora builders

28cef3a

configs

d4a937c

docstrings

d5ba267

fix

0c9eff7

various

ab8d452

lint

590a89c

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 18, 2024

pbontrager reviewed Oct 18, 2024

View reviewed changes

recipes/configs/qwen2_5/3B_full.yaml Outdated Show resolved Hide resolved

recipes/configs/qwen2_5/3B_full.yaml Outdated Show resolved Hide resolved

torchtune/models/qwen2/_prompt_template.py Outdated Show resolved Hide resolved

RdoubleA reviewed Oct 18, 2024

View reviewed changes

torchtune/models/qwen2/_tokenizer.py Outdated Show resolved Hide resolved

RdoubleA reviewed Oct 18, 2024

View reviewed changes

torchtune/models/qwen2/_prompt_template.py Outdated Show resolved Hide resolved

separating qwen2 and qwen2.5

9595a46

unit test for qwen2.5 tokenizer

9522452

Merge remote-tracking branch 'origin/main' into qwen2.5

d938e3b

ebsmothers reviewed Oct 24, 2024

View reviewed changes

calvinpelletier added 6 commits October 24, 2024 13:15

separate model builders for base and instruct models

70bbe96

moving chat template logic into tokenizer

a110bbb

tool call special tokens

ac221f3

separate qwen2/2.5 tokenizers

a17ba7a

configs

50232be

Merge remote-tracking branch 'origin/main' into qwen2.5

b13de5a

RdoubleA reviewed Oct 28, 2024

View reviewed changes

calvinpelletier added 2 commits October 29, 2024 20:23

various

ec905e7

Merge remote-tracking branch 'origin/main' into qwen2.5

c064d1d

joecummings mentioned this pull request Oct 30, 2024

v0.4.0 release tracker #1747

Open

34 tasks

joecummings reviewed Oct 30, 2024

View reviewed changes

recipes/configs/qwen2_5/32B_lora.yaml Show resolved Hide resolved

joecummings reviewed Oct 30, 2024

View reviewed changes

recipes/configs/qwen2_5/72B_lora.yaml Show resolved Hide resolved

joecummings reviewed Oct 30, 2024

View reviewed changes

RdoubleA reviewed Oct 30, 2024

View reviewed changes

calvinpelletier added 2 commits October 30, 2024 20:31

addressing comments

e813076

adding base/instruct explanations in docstrings

c35ec53

RdoubleA approved these changes Oct 31, 2024

View reviewed changes

registry fix

9271c5c

RdoubleA merged commit 73d0821 into pytorch:main Oct 31, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2.5 #1863

Qwen2.5 #1863

calvinpelletier commented Oct 18, 2024 •

edited

Loading

pytorch-bot bot commented Oct 18, 2024 •

edited

Loading

pbontrager left a comment •

edited

Loading

codecov-commenter commented Oct 20, 2024 •

edited

Loading

fzyzcjy commented Oct 22, 2024

calvinpelletier commented Oct 22, 2024

ebsmothers left a comment

ebsmothers Oct 24, 2024

calvinpelletier commented Oct 24, 2024

RdoubleA left a comment

RdoubleA Oct 28, 2024

joecummings left a comment

RdoubleA left a comment

RdoubleA left a comment

RdoubleA Oct 31, 2024

Qwen2.5 #1863

Qwen2.5 #1863

Conversation

calvinpelletier commented Oct 18, 2024 • edited Loading

Context

Changelog

Test plan

pytorch-bot bot commented Oct 18, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1863

✅ No Failures

pbontrager left a comment • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Oct 20, 2024 • edited Loading

Codecov Report

fzyzcjy commented Oct 22, 2024

calvinpelletier commented Oct 22, 2024

ebsmothers left a comment

Choose a reason for hiding this comment

ebsmothers Oct 24, 2024

Choose a reason for hiding this comment

calvinpelletier commented Oct 24, 2024

RdoubleA left a comment

Choose a reason for hiding this comment

RdoubleA Oct 28, 2024

Choose a reason for hiding this comment

joecummings left a comment

Choose a reason for hiding this comment

RdoubleA left a comment

Choose a reason for hiding this comment

RdoubleA left a comment

Choose a reason for hiding this comment

RdoubleA Oct 31, 2024

Choose a reason for hiding this comment

calvinpelletier commented Oct 18, 2024 •

edited

Loading

pytorch-bot bot commented Oct 18, 2024 •

edited

Loading

pbontrager left a comment •

edited

Loading

codecov-commenter commented Oct 20, 2024 •

edited

Loading