The v0.7.0
release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:
🐠 NeMo PPO and SFT support
This release introduces NeMo-backed PPO and SFT implementations for capabilities and improved system performance under large-scale training.
- NeMo PPO by @cat-state in #472
- Add Supervised Fine-Tuning (SFT) support for NeMo backend by @jon-tow in #353
🦆 PEFT Migration
trlx
now supports parameter-efficient tuning methods via the peft
library, which we hope will provide greater access to RLHF training in low-resource settings.
Fixes and mores!
- Set pad_token for all tokenizers in tests by @cat-state in #414
- Convert tensors in the stats dict into scalars by @ZHAOTING in #417
- Add Translation Finetuning Example with T5 by @alexandremuzio in #392
- set torch dependency to version 2.0.0 for CUDA in installation instru… by @cauyxy in #409
- [fix] add
position_ids
toLlamaModelBranch
by @jon-tow in #418 - fix(CI): use pinned deps for CI testing by @jon-tow in #423
- Minibatch impl by @Dahoas in #364
- [feat] Support tying metadata to each prompt by @maxreciprocate in #421
- feat(examples): revamp simulacra example by @maxreciprocate in #430
- [fix] update pairwise dataloader. by @Chen9154 in #395
- fix(sft_trainer):
total_steps
calculation when running distributed by @maxreciprocate in #432 - fix(base_trainer): gather weights in
save_pretrained
under zero3 by @maxreciprocate in #429 - fix(offline_pipeline): ILQL negative indexing under truncation by @maxreciprocate in #435
- fix(ppo_trainer): compute mean KL sequence-wise by @maxreciprocate in #441
- Create Example training scripts to run in Stability cluster by @alexandremuzio in #419
- Upgrade official released Ray instead of an unstable one. by @jovany-wang in #455
- Pin transformers<=4.27.1 by @jovany-wang in #458
- fix(ppo_gpt): prevent position_ids being None by @li-plus in #451
- fix(trainer): init self.generate_sweep_kwarg at self.init by @mymusise in #460
- Ensure trailing EOS token is added correctly for shorter generated outputs by @mikljohansson in #420
- Pad prompts to the right in T5 examples and add EOS token to seq2seq prompts by @mikljohansson in #422
- docs(base_trainer): fill in missing
prepare_learning
method by @maxreciprocate in #449 - fix(modeling_ppo): invert padding percentage calculation by @maxreciprocate in #450
- fix(base_trainer): flatten tag list for tensorboard hparams logging by @maxreciprocate in #444
- feat(requirements.txt): upgrade dependencies by @maxreciprocate in #465
- fix(offline_pipeline): force
drop_last
only for distributed by @maxreciprocate in #475 - hotfix(bnb): install
scipy
withbitsanbytes
to avoidModuleNotFoundError
by @jon-tow in #492 - fix type hint in PromptPipeline.init by @g-simmons in #496
- fix(modeling_ilql): single q-head indexing by @maxreciprocate in #471
- Fix deprecated arguments for Accelerate >= v0.20.0 by @iwiwi in #506
- Fix PPO log_ratio bug by @TobiasNorlund in #509
- fix(ppo_trainer): default gen kwargs by @maxreciprocate in #510
New Contributors
- @ZHAOTING made their first contribution in #417
- @cauyxy made their first contribution in #409
- @Chen9154 made their first contribution in #395
- @jovany-wang made their first contribution in #455
- @li-plus made their first contribution in #451
- @mymusise made their first contribution in #460
- @mikljohansson made their first contribution in #420
- @g-simmons made their first contribution in #496
- @iwiwi made their first contribution in #506
- @TobiasNorlund made their first contribution in #509
- @glerzing made their first contribution in #486
Full Changelog: v0.6.0...v0.7.0