trlx

https://github.com/carperai/trlx - v0.7.0: NeMo PPO, PEFT Migration, and Fixes

The v0.7.0 release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:

🐠 NeMo PPO and SFT support

This release introduces NeMo-backed PPO and SFT implementations for capabilities and improved system performance under large-scale training.

NeMo PPO by @cat-state in https://github.com/CarperAI/trlx/pull/472
Add Supervised Fine-Tuning (SFT) support for NeMo backend by @jon-tow in https://github.com/CarperAI/trlx/pull/353

🦆 PEFT Migration

trlx now supports parameter-efficient tuning methods via the peft library, which we hope will provide greater access to RLHF training in low-resource settings.

peft to opendelta migration (#434) + memory optimization (#320) by @glerzing in https://github.com/CarperAI/trlx/pull/486

Fixes and mores!

Set pad_token for all tokenizers in tests by @cat-state in https://github.com/CarperAI/trlx/pull/414
Convert tensors in the stats dict into scalars by @ZHAOTING in https://github.com/CarperAI/trlx/pull/417
Add Translation Finetuning Example with T5 by @alexandremuzio in https://github.com/CarperAI/trlx/pull/392
set torch dependency to version 2.0.0 for CUDA in installation instru… by @cauyxy in https://github.com/CarperAI/trlx/pull/409
[fix] add position_ids to LlamaModelBranch by @jon-tow in https://github.com/CarperAI/trlx/pull/418
fix(CI): use pinned deps for CI testing by @jon-tow in https://github.com/CarperAI/trlx/pull/423
Minibatch impl by @Dahoas in https://github.com/CarperAI/trlx/pull/364
[feat] Support tying metadata to each prompt by @maxreciprocate in https://github.com/CarperAI/trlx/pull/421
feat(examples): revamp simulacra example by @maxreciprocate in https://github.com/CarperAI/trlx/pull/430
[fix] update pairwise dataloader. by @Chen9154 in https://github.com/CarperAI/trlx/pull/395
fix(sfttrainer): `totalsteps` calculation when running distributed by @maxreciprocate in https://github.com/CarperAI/trlx/pull/432
fix(basetrainer): gather weights in `savepretrained` under zero3 by @maxreciprocate in https://github.com/CarperAI/trlx/pull/429
fix(offline_pipeline): ILQL negative indexing under truncation by @maxreciprocate in https://github.com/CarperAI/trlx/pull/435
fix(ppo_trainer): compute mean KL sequence-wise by @maxreciprocate in https://github.com/CarperAI/trlx/pull/441
Create Example training scripts to run in Stability cluster by @alexandremuzio in https://github.com/CarperAI/trlx/pull/419
Upgrade official released Ray instead of an unstable one. by @jovany-wang in https://github.com/CarperAI/trlx/pull/455
Pin transformers<=4.27.1 by @jovany-wang in https://github.com/CarperAI/trlx/pull/458
fix(ppogpt): prevent positionids being None by @li-plus in https://github.com/CarperAI/trlx/pull/451
fix(trainer): init self.generatesweepkwarg at self.init by @mymusise in https://github.com/CarperAI/trlx/pull/460
Ensure trailing EOS token is added correctly for shorter generated outputs by @mikljohansson in https://github.com/CarperAI/trlx/pull/420
Pad prompts to the right in T5 examples and add EOS token to seq2seq prompts by @mikljohansson in https://github.com/CarperAI/trlx/pull/422
docs(basetrainer): fill in missing `preparelearning` method by @maxreciprocate in https://github.com/CarperAI/trlx/pull/449
fix(modeling_ppo): invert padding percentage calculation by @maxreciprocate in https://github.com/CarperAI/trlx/pull/450
fix(base_trainer): flatten tag list for tensorboard hparams logging by @maxreciprocate in https://github.com/CarperAI/trlx/pull/444
feat(requirements.txt): upgrade dependencies by @maxreciprocate in https://github.com/CarperAI/trlx/pull/465
fix(offlinepipeline): force `droplast` only for distributed by @maxreciprocate in https://github.com/CarperAI/trlx/pull/475
hotfix(bnb): install scipy with bitsanbytes to avoid ModuleNotFoundError by @jon-tow in https://github.com/CarperAI/trlx/pull/492
fix type hint in PromptPipeline.init by @g-simmons in https://github.com/CarperAI/trlx/pull/496
fix(modeling_ilql): single q-head indexing by @maxreciprocate in https://github.com/CarperAI/trlx/pull/471
Fix deprecated arguments for Accelerate >= v0.20.0 by @iwiwi in https://github.com/CarperAI/trlx/pull/506
Fix PPO log_ratio bug by @TobiasNorlund in https://github.com/CarperAI/trlx/pull/509
fix(ppo_trainer): default gen kwargs by @maxreciprocate in https://github.com/CarperAI/trlx/pull/510

New Contributors

@ZHAOTING made their first contribution in https://github.com/CarperAI/trlx/pull/417
@cauyxy made their first contribution in https://github.com/CarperAI/trlx/pull/409
@Chen9154 made their first contribution in https://github.com/CarperAI/trlx/pull/395
@jovany-wang made their first contribution in https://github.com/CarperAI/trlx/pull/455
@li-plus made their first contribution in https://github.com/CarperAI/trlx/pull/451
@mymusise made their first contribution in https://github.com/CarperAI/trlx/pull/460
@mikljohansson made their first contribution in https://github.com/CarperAI/trlx/pull/420
@g-simmons made their first contribution in https://github.com/CarperAI/trlx/pull/496
@iwiwi made their first contribution in https://github.com/CarperAI/trlx/pull/506
@TobiasNorlund made their first contribution in https://github.com/CarperAI/trlx/pull/509
@glerzing made their first contribution in https://github.com/CarperAI/trlx/pull/486

Full Changelog: https://github.com/CarperAI/trlx/compare/v0.6.0...v0.7.0

- Python
Published by jon-tow almost 3 years ago

https://github.com/carperai/trlx - v0.6.0: LLaMa (Alpaca), Benchmark Util, T5 ILQL, Tests

The v0.6.0 release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:

📏 Benchmarking and Improved Unit Tests

This release introduces a new benchmark util to more easily track regressions in our training pipeline along with improved unit tests with the help of the hypothesis package: * [feat] Add benchmark tools by @reciprocated in https://github.com/CarperAI/trlx/pull/357 * Add hypothesis tests for ILQL and fix edge cases by @cat-state in https://github.com/CarperAI/trlx/pull/370

🦙 LLaMa and Alpaca PPO/SFT Support

PPO support and examples for LLaMa are now available and we’ve baked in an example for instruction fine-tuning models with the Alpaca dataset using our SFT trainer: * [feat] Add LLaMa Model support for PPO by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/375 * Add Alpaca by @cat-state in https://github.com/CarperAI/trlx/pull/400

5️⃣ T5 ILQL Support

T5 models can now be fine-tuned with ILQL: * Support ILQL for T5 model, Fix PPO T5 for refactored code by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/290

Fixes

Remove example usage of deprecating trlx.train dataset arg by @jon-tow in https://github.com/CarperAI/trlx/pull/331
Remove logit_mask unused argument by @cat-state in https://github.com/CarperAI/trlx/pull/332
[fix] Convert the rest of configs from ymls by @reciprocated in https://github.com/CarperAI/trlx/pull/346
fix defaultilqlconfig in notebook by @xu-song in https://github.com/CarperAI/trlx/pull/350
hot-fix: update PPOConfig import in examples by @jon-tow in https://github.com/CarperAI/trlx/pull/352
[fix] Update AdaptiveKLController with correct KL by @reciprocated in https://github.com/CarperAI/trlx/pull/361
[fix] Drop <eos> from ILQL sample's phrases by @reciprocated in https://github.com/CarperAI/trlx/pull/362
fixes half exp not implemented error by @Dahoas in https://github.com/CarperAI/trlx/pull/363
[fix] ILQL total_steps calculation when running distributed by @reciprocated in https://github.com/CarperAI/trlx/pull/374
[fix] split for validation by @hzwer in https://github.com/CarperAI/trlx/pull/369
fix(docs): Update incorrect PPORLElement logprob tensor shape hint by @jon-tow in https://github.com/CarperAI/trlx/pull/377
[fix] Enable HF downloads from a revision by @reciprocated in https://github.com/CarperAI/trlx/pull/382
[fix] Fix ILQL head sync under ZeRO3 by @reciprocated in https://github.com/CarperAI/trlx/pull/387
[fix] Preserve <eos> token and in-place it after trimming by @reciprocated in https://github.com/CarperAI/trlx/pull/401
Nemo ILQL fixes by @cat-state in https://github.com/CarperAI/trlx/pull/404

What's Changed

Move to Python config classes instead of ymls by @cat-state in https://github.com/CarperAI/trlx/pull/306
Add intermediate checkpointing to accelerate trainers by @jon-tow in https://github.com/CarperAI/trlx/pull/349
Enable infinite dataloader for prompt_dataloader in PPO Trainer by @alexandremuzio in https://github.com/CarperAI/trlx/pull/358
[feat] Add optional dependency list by @reciprocated in https://github.com/CarperAI/trlx/pull/381
Add some synchronization to the db download in the simulacra example by @dakinggg in https://github.com/CarperAI/trlx/pull/406 ## New Contributors
@xu-song made their first contribution in https://github.com/CarperAI/trlx/pull/350
@hzwer made their first contribution in https://github.com/CarperAI/trlx/pull/369
@alexandremuzio made their first contribution in https://github.com/CarperAI/trlx/pull/358
@dakinggg made their first contribution in https://github.com/CarperAI/trlx/pull/406

Full Changelog: https://github.com/CarperAI/trlx/compare/v0.5.0...v0.6.0

- Python
Published by jon-tow about 3 years ago

https://github.com/carperai/trlx - v0.5.0: Initial NeMo integration, HH example, and improved Hugging Face integration

Highlights

Initial NeMo ILQL integration leading way to large-scale RLHF efforts. See https://github.com/CarperAI/trlx/blob/main/trlx/models/README.md to get started.
In-depth example showcasing trlx usage on AnthropicAI's Helpful & Harmless dataset https://github.com/CarperAI/trlx/tree/main/examples/hh
Improved ILQL modeling integration with Hugging Face transformers. Users can now work with AutoModelForCausalLMWithILQLHeads objects to generate samples and save/load fine-tuned ILQL models that can be quickly pushed to the Hub.

What's Changed

Add wandb group naming by @jon-tow in https://github.com/CarperAI/trlx/pull/188
Update reward_fn signatures in examples by @jon-tow in https://github.com/CarperAI/trlx/pull/190
Add tokenizer config by @reciprocated in https://github.com/CarperAI/trlx/pull/189
Fix extraction of mixed_precision option for deepspeed by @reciprocated in https://github.com/CarperAI/trlx/pull/197
Fix summarize_rlhf inference checkpoint paths by @jon-tow in https://github.com/CarperAI/trlx/pull/194
Make the config loading consistent across all example scripts. by @shermansiu in https://github.com/CarperAI/trlx/pull/192
Make Trainer.save_pretrained sub-directory optional by @jon-tow in https://github.com/CarperAI/trlx/pull/201
Update Readme to include T5 models by @aaronrmm in https://github.com/CarperAI/trlx/pull/198
Make make_head accept dtype parameter by @reciprocated in https://github.com/CarperAI/trlx/pull/213
Enable training with Tensorboard tracking by @marcobellagente93 in https://github.com/CarperAI/trlx/pull/209
Support nested updates in merge by @cat-state in https://github.com/CarperAI/trlx/pull/219
Fix typo reward normalize summarize by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/221
Update stale comment from results table by @jon-tow in https://github.com/CarperAI/trlx/pull/222
Fix undefined trackers property by @alan-cooney in https://github.com/CarperAI/trlx/pull/224
Fix tokenizer missing form config.to_dict() by @alan-cooney in https://github.com/CarperAI/trlx/pull/228
Make experiment tracking optional by @jon-tow in https://github.com/CarperAI/trlx/pull/226
read tokenizer path from config correctly by @JustinAWei in https://github.com/CarperAI/trlx/pull/230
Add devcontainer support by @alan-cooney in https://github.com/CarperAI/trlx/pull/196
fix: change loraa:float to lorar:int by @aaronrmm in https://github.com/CarperAI/trlx/pull/235
Bump isort to hotfix CI code quality workflow by @jon-tow in https://github.com/CarperAI/trlx/pull/237
Fix optional tracking in accelerator.log by @jon-tow in https://github.com/CarperAI/trlx/pull/233
Improve documentation/comments on the random walk example by @alan-cooney in https://github.com/CarperAI/trlx/pull/208
Update link to "Learning to Summarize from Human Feedback" by @jon-tow in https://github.com/CarperAI/trlx/pull/241
Fix deepspeed state saving under save_best condition by @reciprocated in https://github.com/CarperAI/trlx/pull/242
added colab notebook by @smellslikeml in https://github.com/CarperAI/trlx/pull/244
[style] Increase black's line length by @reciprocated in https://github.com/CarperAI/trlx/pull/250
Add help string to getadvantagesand_returns by @pesvut in https://github.com/CarperAI/trlx/pull/225
Filter out empty responses by @reciprocated in https://github.com/CarperAI/trlx/pull/265
NeMo Integrate by @cat-state in https://github.com/CarperAI/trlx/pull/125
Add multi-process logger utility for status monitoring by @jon-tow in https://github.com/CarperAI/trlx/pull/254
Add NeMo support info to README by @jon-tow in https://github.com/CarperAI/trlx/pull/275
Fix distributed dataloaders & deduplicate eval by @reciprocated in https://github.com/CarperAI/trlx/pull/276
Improve PPO readability by @alan-cooney in https://github.com/CarperAI/trlx/pull/210
Add T5 to delta modifier map by @aaronrmm in https://github.com/CarperAI/trlx/pull/234
[fix] Set deepspeed's fp16 auto_cast to false by @reciprocated in https://github.com/CarperAI/trlx/pull/279
Rename remaining logprobs_from_logits call by @jon-tow in https://github.com/CarperAI/trlx/pull/281
[feat] Add Accelerate SFT Trainer by @reciprocated in https://github.com/CarperAI/trlx/pull/280
Add Colab Notebook for Sentiment by @zswitten in https://github.com/CarperAI/trlx/pull/285
Remove pylance installs from devcontainer by @jon-tow in https://github.com/CarperAI/trlx/pull/296
Move notebooks to examples dir by @jon-tow in https://github.com/CarperAI/trlx/pull/294
[fix] Summarize config discrepancy by @reciprocated in https://github.com/CarperAI/trlx/pull/293
Make Git check optional by @cat-state in https://github.com/CarperAI/trlx/pull/299
refactor: remove orchestrator abstraction from API by @jon-tow in https://github.com/CarperAI/trlx/pull/289
Set add_special_tokens=False to not add EOS unexpectedly by @cat-state in https://github.com/CarperAI/trlx/pull/287
[feat] Gather experience samples by @reciprocated in https://github.com/CarperAI/trlx/pull/305
[fix] Make gather_for_metrics usage more strict by @reciprocated in https://github.com/CarperAI/trlx/pull/315
Add helpful and harmless example by @reciprocated in https://github.com/CarperAI/trlx/pull/128
Adopt PreTrainedModelWrapper for Hugging Face models by @jon-tow in https://github.com/CarperAI/trlx/pull/215

New Contributors

@shermansiu made their first contribution in https://github.com/CarperAI/trlx/pull/192
@aaronrmm made their first contribution in https://github.com/CarperAI/trlx/pull/198
@marcobellagente93 made their first contribution in https://github.com/CarperAI/trlx/pull/209
@alan-cooney made their first contribution in https://github.com/CarperAI/trlx/pull/224
@JustinAWei made their first contribution in https://github.com/CarperAI/trlx/pull/230
@smellslikeml made their first contribution in https://github.com/CarperAI/trlx/pull/244
@pesvut made their first contribution in https://github.com/CarperAI/trlx/pull/225
@zswitten made their first contribution in https://github.com/CarperAI/trlx/pull/285

Full Changelog: https://github.com/CarperAI/trlx/compare/v0.4...v0.5.0

- Python
Published by jon-tow over 3 years ago

https://github.com/carperai/trlx - v0.4

Summary of release notes:

Along with many improvements to experiment tracking, rollout logging, and configuration flexibility, new highlight features include:

Support for T5-based student models. Check out this example, where we show how to fine-tune a FLAN-T5 model on CNN/DailyMail for summarization.
Support for parameter-efficient tuning methods. Some of our preliminary results have shown LoRA to be a promising technique in scaling RLHF under low-resource settings and hope users get the chance to explore its potential. We've seen a ~30% reduction in memory usage and ~20% reduction in wallclock time for the same performance (quick report here)
Out-of-the-box support for 8-bit Adam(W) optimizers via TimDettmers/bitsandbytes, leading to a 15% decrease in memory allocation in one of our baseline examples (related report).

Other interesting examples are in the works, so stay tuned!

What's Changed

ILQL indicies on wrong device by @cat-state in https://github.com/CarperAI/trlx/pull/105
Fix ppo ratio inaccuracy by @reciprocated in https://github.com/CarperAI/trlx/pull/108
Set RNG seeds across multiple dependencies by @jon-tow in https://github.com/CarperAI/trlx/pull/113
Set seed after default config instantiation by @jon-tow in https://github.com/CarperAI/trlx/pull/114
Move queries on the device by @reciprocated in https://github.com/CarperAI/trlx/pull/115
Add ppo randomwalks example by @reciprocated in https://github.com/CarperAI/trlx/pull/119
Add unit tests to ensure valid example configs by @jon-tow in https://github.com/CarperAI/trlx/pull/120
updating gptj-config by @Dahoas in https://github.com/CarperAI/trlx/pull/109
Fix get distributed config by @reciprocated in https://github.com/CarperAI/trlx/pull/122
Add local rollout logging by @thomfoster in https://github.com/CarperAI/trlx/pull/124
Add support for more CausalLMs by @jon-tow in https://github.com/CarperAI/trlx/pull/103
Add hydra head support for GPTNeo by @jon-tow in https://github.com/CarperAI/trlx/pull/126
Add BloomModel hydra support by @jon-tow in https://github.com/CarperAI/trlx/pull/129
Simplifying logic to merge configs by @leshanbog in https://github.com/CarperAI/trlx/pull/134
add: load function for AccelerateRLModel by @dongs0104 in https://github.com/CarperAI/trlx/pull/136
Add OptimizerConfig and SchedulerConfig by @jon-tow in https://github.com/CarperAI/trlx/pull/135
Remove incorrect default config settings by @jon-tow in https://github.com/CarperAI/trlx/pull/137
Update TRL acknowledgement by @osanseviero in https://github.com/CarperAI/trlx/pull/138
Fix context overflow by @reciprocated in https://github.com/CarperAI/trlx/pull/131
Fix seeding per process by @reciprocated in https://github.com/CarperAI/trlx/pull/141
Set device-specific seeding with global rank by @jon-tow in https://github.com/CarperAI/trlx/pull/143
Freeze hydra model branches by @jon-tow in https://github.com/CarperAI/trlx/pull/140
Refactor RL model wrapper into a trainer module by @jon-tow in https://github.com/CarperAI/trlx/pull/144
Logging learning rate by @leshanbog in https://github.com/CarperAI/trlx/pull/147
Fix instantiating base transformer from a custom config by @reciprocated in https://github.com/CarperAI/trlx/pull/149
Linear LR scheduler by @leshanbog in https://github.com/CarperAI/trlx/pull/150
Update pre-commit version and add isort by @jon-tow in https://github.com/CarperAI/trlx/pull/152
fix: configure flake8, fix errors, add trackers config by @Mistobaan in https://github.com/CarperAI/trlx/pull/157
Features/use-python-3.8-in-ci by @Mistobaan in https://github.com/CarperAI/trlx/pull/159
Add bitsandbytes optimizer support by @aicrumb in https://github.com/CarperAI/trlx/pull/133
initial commit for trlx LORA support by @ethankim00 in https://github.com/CarperAI/trlx/pull/110
Fix default delta_kwargs handling by @jon-tow in https://github.com/CarperAI/trlx/pull/171
Add T5 model by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/145
Fix wandb.errors.RequireError as reported in #162 by @ayulockin in https://github.com/CarperAI/trlx/pull/167
Update README.md by @LouisCastricato in https://github.com/CarperAI/trlx/pull/180
Update ILQL details by @reciprocated in https://github.com/CarperAI/trlx/pull/156
Add OpenAI Summarize RLHF with trlX by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/175
Fix HuggingFace model.save_pretrained for DDP by @jon-tow in https://github.com/CarperAI/trlx/pull/181
Update generation utilities by @reciprocated in https://github.com/CarperAI/trlx/pull/172

New Contributors

@thomfoster made their first contribution in https://github.com/CarperAI/trlx/pull/124
@leshanbog made their first contribution in https://github.com/CarperAI/trlx/pull/134
@dongs0104 made their first contribution in https://github.com/CarperAI/trlx/pull/136
@osanseviero made their first contribution in https://github.com/CarperAI/trlx/pull/138
@Mistobaan made their first contribution in https://github.com/CarperAI/trlx/pull/157
@aicrumb made their first contribution in https://github.com/CarperAI/trlx/pull/133
@ethankim00 made their first contribution in https://github.com/CarperAI/trlx/pull/110
@PhungVanDuy made their first contribution in https://github.com/CarperAI/trlx/pull/145

Full Changelog: https://github.com/CarperAI/trlx/compare/v0.3...v0.4

- Python
Published by LouisCastricato over 3 years ago

https://github.com/carperai/trlx - Pre alpha v0.3

What's Changed

Download simulacra by @reciprocated in https://github.com/CarperAI/trlx/pull/62
Update documentation (first review) by @simoninithomas in https://github.com/CarperAI/trlx/pull/64
Add ckpt/ to gitignore by @ayulockin in https://github.com/CarperAI/trlx/pull/70
change version in package to match lib by @cat-state in https://github.com/CarperAI/trlx/pull/73
Docs by @shahbuland in https://github.com/CarperAI/trlx/pull/71
[fix] Remove stale options from ppo_gptj.yml by @jon-tow in https://github.com/CarperAI/trlx/pull/77
Add entity name config for wandb logging by @jon-tow in https://github.com/CarperAI/trlx/pull/78
EXAMPLE : Interpreter grounded Neural Program Synthesis [WIP] by @reshinthadithyan in https://github.com/CarperAI/trlx/pull/81
Update TrainConfig optimizer hyperparameters by @jon-tow in https://github.com/CarperAI/trlx/pull/82
Add examples tip to contribution guide by @jon-tow in https://github.com/CarperAI/trlx/pull/84
Fix pipeline's context overflow by @reciprocated in https://github.com/CarperAI/trlx/pull/87
Refactor PPO objective function by @jon-tow in https://github.com/CarperAI/trlx/pull/88
Fix slow ilql eval by @reciprocated in https://github.com/CarperAI/trlx/pull/91
rerun https://github.com/CarperAI/trlx/pull/89 by @cat-state in https://github.com/CarperAI/trlx/pull/92
Hyperparameter Optimization with Ray Tune and Weights and Biases by @ayulockin in https://github.com/CarperAI/trlx/pull/76
Update readme instructions by @reciprocated in https://github.com/CarperAI/trlx/pull/93
Update README to align nomenclature correctness by @ayulockin in https://github.com/CarperAI/trlx/pull/97
Add optional reward scaling by @reciprocated in https://github.com/CarperAI/trlx/pull/95
Force class registry via imports by @jon-tow in https://github.com/CarperAI/trlx/pull/100
Add optional normalization (cont.) by @reciprocated in https://github.com/CarperAI/trlx/pull/98
Restructure sweeps for reuse by @reciprocated in https://github.com/CarperAI/trlx/pull/102

New Contributors

@simoninithomas made their first contribution in https://github.com/CarperAI/trlx/pull/64
@ayulockin made their first contribution in https://github.com/CarperAI/trlx/pull/70
@reshinthadithyan made their first contribution in https://github.com/CarperAI/trlx/pull/81

Full Changelog: https://github.com/CarperAI/trlx/compare/v0.2...v0.3

- Python
Published by LouisCastricato over 3 years ago

https://github.com/carperai/trlx - Alpha v0.2

Complete revamp of our initial release.

New features: - Hydra models, 20x faster than vanilla PPO with minimal performance hits at large scales - Massively revamped API, significantly less boiler plate. - Save/load callbacks. - Greatly improved orchestrator. - Better commented RL code, easier to understand whats going on. - Cool examples, including architext and simulacra. - Better extendability, and standardized styling.

Features coming soon: - Megatron support! we're already working on this. - More interesting examples that are relevant to production use cases of TRLX. - Better integration of W&B, including sweeps. - Evaluation and benchmarking.

Autogenerated release notes below:

What's Changed

Fix typo by @mrm8488 in https://github.com/CarperAI/trlx/pull/2
Create LICENSE by @LouisCastricato in https://github.com/CarperAI/trlx/pull/3
QOL fixes by @LouisCastricato in https://github.com/CarperAI/trlx/pull/5
stage ilql by @reciprocated in https://github.com/CarperAI/trlx/pull/6
Adds style file and reward function capabilities to ppo orchestrator by @LouisCastricato in https://github.com/CarperAI/trlx/pull/8
Update ppo value head + print logs by @Dahoas in https://github.com/CarperAI/trlx/pull/11
Make ilql respect the config & remove sin by @reciprocated in https://github.com/CarperAI/trlx/pull/22
Docs by @shahbuland in https://github.com/CarperAI/trlx/pull/31
Implemented hydra heads + adaptive kl by @Dahoas in https://github.com/CarperAI/trlx/pull/33
Add pre-commit with black by @cat-state in https://github.com/CarperAI/trlx/pull/36
[update] Improve package setup by @jon-tow in https://github.com/CarperAI/trlx/pull/42
Add initial issue templates by @jon-tow in https://github.com/CarperAI/trlx/pull/45
Some readme improvements by @thedch in https://github.com/CarperAI/trlx/pull/44
Add initial GitHub workflows by @jon-tow in https://github.com/CarperAI/trlx/pull/43
[docs] Add CONTRIBUTING.md by @jon-tow in https://github.com/CarperAI/trlx/pull/52
Simplify api by @reciprocated in https://github.com/CarperAI/trlx/pull/24

New Contributors

@mrm8488 made their first contribution in https://github.com/CarperAI/trlx/pull/2
@LouisCastricato made their first contribution in https://github.com/CarperAI/trlx/pull/3
@reciprocated made their first contribution in https://github.com/CarperAI/trlx/pull/6
@Dahoas made their first contribution in https://github.com/CarperAI/trlx/pull/11
@shahbuland made their first contribution in https://github.com/CarperAI/trlx/pull/31
@cat-state made their first contribution in https://github.com/CarperAI/trlx/pull/36
@jon-tow made their first contribution in https://github.com/CarperAI/trlx/pull/42
@thedch made their first contribution in https://github.com/CarperAI/trlx/pull/44

Full Changelog: https://github.com/CarperAI/trlx/commits/v0.2

- Python
Published by LouisCastricato over 3 years ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Recent Releases of https://github.com/carperai/trlx

https://github.com/carperai/trlx - v0.7.0: NeMo PPO, PEFT Migration, and Fixes

🐠 NeMo PPO and SFT support

🦆 PEFT Migration

Fixes and mores!

New Contributors

https://github.com/carperai/trlx - v0.6.0: LLaMa (Alpaca), Benchmark Util, T5 ILQL, Tests

📏 Benchmarking and Improved Unit Tests

🦙 LLaMa and Alpaca PPO/SFT Support

5️⃣ T5 ILQL Support

Fixes

What's Changed

https://github.com/carperai/trlx - v0.5.0: Initial NeMo integration, HH example, and improved Hugging Face integration

Highlights

What's Changed

New Contributors

https://github.com/carperai/trlx - v0.4

Summary of release notes:

What's Changed

New Contributors

https://github.com/carperai/trlx - Pre alpha v0.3

What's Changed

New Contributors

https://github.com/carperai/trlx - Alpha v0.2

What's Changed

New Contributors