Replies: 1 comment
-
Thanks so much for taking a look at torchtune and starting this discussion. This is a great topic for this forum. We’re quite excited about the options available in the PyTorch ecosystem for fine-tuning LLMs. Not just lit-gpt, but libraries like Axolotl, Unsloth and of course PEFT have really pushed the community forward and we encourage users to use the set of libraries and tools that work best for them. In our discussions with users, we found some gaps in the ecosystem which torchtune tries to plug, both in our current alpha release and in future releases. Just write PyTorchOur goal is to showcase the use of native PyTorch for fine-tuning LLMs. Where needed we provide utilities (eg: checkpointing, logging etc), but these are available in the repo and should be easy to use out-of-the-box, or clone-and-modify if they need to be customized. Staying away from abstractions and trainers is a very conscious design principle. This does lead to code duplication but improves code readability. In fact, we go to the extreme of “no implementation inheritance” in the code base to ensure this lean abstraction-free design. We use inheritance only for interfaces. Longer term, we view torchtune as the repo which showcases all of the latest relevant features from PyTorch and how to use them for fine-tuning LLMs. These include compatibility with torch compile, new distributed APIs, work in PyTorch on memory efficiency etc. One area PyTorch lacks in a bit is clean documentation on how users can use these APIs from PyTorch and with torchtune we’ll showcase well documented examples. Capabilities over Modelstorchtune provides first-class support for models that introduce core capabilities and align with community needs i.e. we take a capabilities-driven approach to how we add support for models. This does mean that we won't have 100% model coverage in the repo, but we will try to provide support for popular features while ensuring correctness. For example, we recently landed sample packing with in-sample position encoding and masking. Based on our research, torchtune is the first library to enable proper masking across samples which has been a heavily requested feature from the community. Moving forward we’ll continue this focus on core capabilities instead of support for every possible model OOTB released by the community. That said, we designed our code to be very easy to understand and re-use so that community members can add models themselves. We have some interesting WIP features, which we’ll share very soon. Interoperability with the EcosystemA big point of emphasis for us is showing how users can go through the LLM lifecycle using native PyTorch. This means interoperability with PyTorch repos like torchtitan and gpt-fast but also with community tools and libraries like EleutherAI’s Eval Harness, bitsandbytes, transformers and PEFT etc. This isn’t unique to torchtune but overtime this does help build a more inclusive ecosystem and it’s a very conscious focus for the library. Hope this helps. |
Beta Was this translation helpful? Give feedback.
-
What are the differences between torchtune and litgpt? To my understanding, litgpt uses fabric and requires converting the model into a GPT class. torchtune also has a TransformerDecoder class. Aside from the different ownership groups, I don't see many differences. Could you elaborate on the specific differences?
Beta Was this translation helpful? Give feedback.
All reactions