Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOUNTY] Add Qwen1.5 0.5B #37

Merged
merged 13 commits into from
Sep 3, 2024

Conversation

@JushBJJ
Copy link
Contributor Author

JushBJJ commented Apr 14, 2024

With all these patches, we internally have Qwen1.5 0.5B working on a Grayskull e150 (thanks @JonathanALevine). We're waiting for @marty1885 to test all of this in his e75. Should we convert this out of a draft PR?

@nvukobratTT @milank94

@JushBJJ
Copy link
Contributor Author

JushBJJ commented Apr 14, 2024

A lot of optimizations will need to be made to speed this model up, stay tuned. I gotta do my uni assignments which are due tonight 😭

@nvukobratTT
Copy link
Contributor

Great work @JushBJJ @marty1885 @JonathanALevine! Let us know once you get silicon results out of E75 :)))

We're working out some internal checking, as soon as we finalize those on our side we're going to complete code reviews and start with PR approvals.

Once again great progress and thanks for hunting these bounties! :)))

@marty1885
Copy link

marty1885 commented Apr 15, 2024

I think there's some problems with the e75 - I ran inference for an hour but it never finished. Hope this doesn't affect the bounty as it works on the e150. It feels more like a low level compilation bug.

NVM. IT WORKS!! Just it's so slow that running it with the default settings takes forever. I can get it finish in a reasonable time if I set max_length to something short. Like 2.
image

@JushBJJ
Copy link
Contributor Author

JushBJJ commented Jul 26, 2024

@nvukobratTT @milank94 Update :)

@milank94
Copy link
Collaborator

@JushBJJ way to go!

As we don't have automated CI setup yet for this project, we'll manually test the model.

Before that, there are a few miscellaneous components that should be added in such as update support table, add test case, add license header, etc. You can see a breakdown here: https://github.com/tenstorrent/tt-buda-demos/blob/main/CONTRIBUTING.md#adding-models

Mind adding these details in and we'll continue the review process?

@JushBJJ
Copy link
Contributor Author

JushBJJ commented Jul 26, 2024

Sure I'll do it too with the Phi 2 bounty as well, let me know if you guys had any issue running it.
Also before manually testing it, add these workarounds:
JushBJJ/tt-buda@f765838

I raised an issue (tenstorrent/tt-buda#42) because I'm not quite sure how a proper implementation would look like for DynamicCaching so it would be great if anyone in the TT team familiar with it can make a proper fix :)

@milank94
Copy link
Collaborator

Sure I'll do it too with the Phi 2 bounty as well, let me know if you guys had any issue running it. Also before manually testing it, add these workarounds: JushBJJ/tt-buda@f765838

I raised an issue (tenstorrent/tt-buda#42) because I'm not quite sure how a proper implementation would look like for DynamicCaching so it would be great if anyone in the TT team familiar with it can make a proper fix :)

Thanks for the update. We'll get someone to review and integrate the work arounds. We can look to package this as an alpha release @staylorTT

FYI: @Shubhamsaboo for bounty tracking

@JushBJJ
Copy link
Contributor Author

JushBJJ commented Jul 28, 2024

Made some changes:

  • Cleaned up code and added batching like the other examples
  • Renamed filename for qwen
  • Adjusted parameters to reduce weird token generation
  • Added chat version of qwen1.5
  • Added test case for qwen1.5

I also tested on previous transformers versions, seems like v4.42.0 is some magic number??? Any version below that Qwen does not work at all.

@milank94
Copy link
Collaborator

Made some changes:

  • Cleaned up code and added batching like the other examples
  • Renamed filename for qwen
  • Adjusted parameters to reduce weird token generation
  • Added chat version of qwen1.5
  • Added test case for qwen1.5

I also tested on previous transformers versions, seems like v4.42.0 is some magic number??? Any version below that Qwen does not work at all.

Thanks @JushBJJ . I will have a look.

That's good to note. The latest release of Buda is on transformers==4.41.0, so will need to uplift it in tt-buda or add it here as a temporary requirement.

@milank94 milank94 mentioned this pull request Jul 29, 2024
@JushBJJ
Copy link
Contributor Author

JushBJJ commented Jul 30, 2024

See tenstorrent/tt-buda#42 (comment)

I'll rewrite Qwen's file with a custom wrapper/generate forward function

@milank94 milank94 self-assigned this Jul 31, 2024
@JushBJJ
Copy link
Contributor Author

JushBJJ commented Aug 1, 2024

This should be good now for final review and testing.

See tenstorrent/tt-buda#42 (comment) for my comment on disabling DynamicCache on new models

@JushBJJ
Copy link
Contributor Author

JushBJJ commented Aug 9, 2024

Any updates on this on TT's side? @milank94

@milank94
Copy link
Collaborator

milank94 commented Aug 9, 2024

Any updates on this on TT's side? @milank94

Hey @JushBJJ -- we're currently working on integrating all of necessary changes into a new Buda release. This should add the support for this model and the Phi-2, along with some additional features.

Once we have that RC ready, we'll do the test and merging of these models.

Thanks for the patience.

@milank94
Copy link
Collaborator

Hey @JushBJJ can you switch the target branch to be: mkordic/rc_20240830?

@JushBJJ JushBJJ changed the base branch from main to mkordic/rc_20240830 August 31, 2024 00:38
@JushBJJ
Copy link
Contributor Author

JushBJJ commented Aug 31, 2024

Done

@milank94
Copy link
Collaborator

milank94 commented Sep 3, 2024

Looks great @JushBJJ - can you address the merge conflict around model_demos/pyproject.toml?

I merged the changes for #117, I think this branch just needs a rebase onto mkordic/rc_20240830.

The plan is to merge these two models into a RC branch (mkordic/rc_20240830), that we are using to updated the changes for the next TT-Buda release, then merge into main and complete the bounties.

@JushBJJ
Copy link
Contributor Author

JushBJJ commented Sep 3, 2024

@milank94 Rebased it, i normally dont do it so I hope i did it right 😅

@milank94 milank94 merged commit 2c6b89e into tenstorrent:mkordic/rc_20240830 Sep 3, 2024
anirudTT pushed a commit that referenced this pull request Sep 24, 2024
* Qwen1.5 0.5B pybuda implementation

* remove unneeded requirement

* Update env vars and compiler configs

* remove undefined device_map

* Remove misleading and unnecessary environment variables

* Refine qwen solution

* Rename qwen file

* Rename qwen filename and added qwen1.5-chat

* Add qwen1.5 test case

* Fix typo in pyproject.toml

* Disable dynamic caching

* Add extra whitespace below model title commment

* Fix typo "moadl" to "model"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants