Release v0.7.0: NeMo PPO, PEFT Migration, and Fixes · CarperAI/trlx

The v0.7.0 release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:

🐠 NeMo PPO and SFT support

This release introduces NeMo-backed PPO and SFT implementations for capabilities and improved system performance under large-scale training.

NeMo PPO by @cat-state in #472
Add Supervised Fine-Tuning (SFT) support for NeMo backend by @jon-tow in #353

🦆 PEFT Migration

trlx now supports parameter-efficient tuning methods via the peft library, which we hope will provide greater access to RLHF training in low-resource settings.

peft to opendelta migration (#434) + memory optimization (#320) by @glerzing in #486

Fixes and mores!

Set pad_token for all tokenizers in tests by @cat-state in #414
Convert tensors in the stats dict into scalars by @ZHAOTING in #417
Add Translation Finetuning Example with T5 by @alexandremuzio in #392
set torch dependency to version 2.0.0 for CUDA in installation instru… by @cauyxy in #409
[fix] add position_ids to LlamaModelBranch by @jon-tow in #418
fix(CI): use pinned deps for CI testing by @jon-tow in #423
Minibatch impl by @Dahoas in #364
[feat] Support tying metadata to each prompt by @maxreciprocate in #421
feat(examples): revamp simulacra example by @maxreciprocate in #430
[fix] update pairwise dataloader. by @Chen9154 in #395
fix(sft_trainer): total_steps calculation when running distributed by @maxreciprocate in #432
fix(base_trainer): gather weights in save_pretrained under zero3 by @maxreciprocate in #429
fix(offline_pipeline): ILQL negative indexing under truncation by @maxreciprocate in #435
fix(ppo_trainer): compute mean KL sequence-wise by @maxreciprocate in #441
Create Example training scripts to run in Stability cluster by @alexandremuzio in #419
Upgrade official released Ray instead of an unstable one. by @jovany-wang in #455
Pin transformers<=4.27.1 by @jovany-wang in #458
fix(ppo_gpt): prevent position_ids being None by @li-plus in #451
fix(trainer): init self.generate_sweep_kwarg at self.init by @mymusise in #460
Ensure trailing EOS token is added correctly for shorter generated outputs by @mikljohansson in #420
Pad prompts to the right in T5 examples and add EOS token to seq2seq prompts by @mikljohansson in #422
docs(base_trainer): fill in missing prepare_learning method by @maxreciprocate in #449
fix(modeling_ppo): invert padding percentage calculation by @maxreciprocate in #450
fix(base_trainer): flatten tag list for tensorboard hparams logging by @maxreciprocate in #444
feat(requirements.txt): upgrade dependencies by @maxreciprocate in #465
fix(offline_pipeline): force drop_last only for distributed by @maxreciprocate in #475
hotfix(bnb): install scipy with bitsanbytes to avoid ModuleNotFoundError by @jon-tow in #492
fix type hint in PromptPipeline.init by @g-simmons in #496
fix(modeling_ilql): single q-head indexing by @maxreciprocate in #471
Fix deprecated arguments for Accelerate >= v0.20.0 by @iwiwi in #506
Fix PPO log_ratio bug by @TobiasNorlund in #509
fix(ppo_trainer): default gen kwargs by @maxreciprocate in #510

New Contributors

@ZHAOTING made their first contribution in #417
@cauyxy made their first contribution in #409
@Chen9154 made their first contribution in #395
@jovany-wang made their first contribution in #455
@li-plus made their first contribution in #451
@mymusise made their first contribution in #460
@mikljohansson made their first contribution in #420
@g-simmons made their first contribution in #496
@iwiwi made their first contribution in #506
@TobiasNorlund made their first contribution in #509
@glerzing made their first contribution in #486

Full Changelog: v0.6.0...v0.7.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.0: NeMo PPO, PEFT Migration, and Fixes

🐠 NeMo PPO and SFT support

🦆 PEFT Migration

Fixes and mores!

New Contributors

Contributors