Recent Releases of https://github.com/carperai/trlx

https://github.com/carperai/trlx - v0.7.0: NeMo PPO, PEFT Migration, and Fixes

The v0.7.0 release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:

🐠 NeMo PPO and SFT support

This release introduces NeMo-backed PPO and SFT implementations for capabilities and improved system performance under large-scale training.

  • NeMo PPO by @cat-state in https://github.com/CarperAI/trlx/pull/472
  • Add Supervised Fine-Tuning (SFT) support for NeMo backend by @jon-tow in https://github.com/CarperAI/trlx/pull/353

🦆 PEFT Migration

trlx now supports parameter-efficient tuning methods via the peft library, which we hope will provide greater access to RLHF training in low-resource settings.

  • peft to opendelta migration (#434) + memory optimization (#320) by @glerzing in https://github.com/CarperAI/trlx/pull/486

Fixes and mores!

  • Set pad_token for all tokenizers in tests by @cat-state in https://github.com/CarperAI/trlx/pull/414
  • Convert tensors in the stats dict into scalars by @ZHAOTING in https://github.com/CarperAI/trlx/pull/417
  • Add Translation Finetuning Example with T5 by @alexandremuzio in https://github.com/CarperAI/trlx/pull/392
  • set torch dependency to version 2.0.0 for CUDA in installation instru… by @cauyxy in https://github.com/CarperAI/trlx/pull/409
  • [fix] add position_ids to LlamaModelBranch by @jon-tow in https://github.com/CarperAI/trlx/pull/418
  • fix(CI): use pinned deps for CI testing by @jon-tow in https://github.com/CarperAI/trlx/pull/423
  • Minibatch impl by @Dahoas in https://github.com/CarperAI/trlx/pull/364
  • [feat] Support tying metadata to each prompt by @maxreciprocate in https://github.com/CarperAI/trlx/pull/421
  • feat(examples): revamp simulacra example by @maxreciprocate in https://github.com/CarperAI/trlx/pull/430
  • [fix] update pairwise dataloader. by @Chen9154 in https://github.com/CarperAI/trlx/pull/395
  • fix(sfttrainer): `totalsteps` calculation when running distributed by @maxreciprocate in https://github.com/CarperAI/trlx/pull/432
  • fix(basetrainer): gather weights in `savepretrained` under zero3 by @maxreciprocate in https://github.com/CarperAI/trlx/pull/429
  • fix(offline_pipeline): ILQL negative indexing under truncation by @maxreciprocate in https://github.com/CarperAI/trlx/pull/435
  • fix(ppo_trainer): compute mean KL sequence-wise by @maxreciprocate in https://github.com/CarperAI/trlx/pull/441
  • Create Example training scripts to run in Stability cluster by @alexandremuzio in https://github.com/CarperAI/trlx/pull/419
  • Upgrade official released Ray instead of an unstable one. by @jovany-wang in https://github.com/CarperAI/trlx/pull/455
  • Pin transformers<=4.27.1 by @jovany-wang in https://github.com/CarperAI/trlx/pull/458
  • fix(ppogpt): prevent positionids being None by @li-plus in https://github.com/CarperAI/trlx/pull/451
  • fix(trainer): init self.generatesweepkwarg at self.init by @mymusise in https://github.com/CarperAI/trlx/pull/460
  • Ensure trailing EOS token is added correctly for shorter generated outputs by @mikljohansson in https://github.com/CarperAI/trlx/pull/420
  • Pad prompts to the right in T5 examples and add EOS token to seq2seq prompts by @mikljohansson in https://github.com/CarperAI/trlx/pull/422
  • docs(basetrainer): fill in missing `preparelearning` method by @maxreciprocate in https://github.com/CarperAI/trlx/pull/449
  • fix(modeling_ppo): invert padding percentage calculation by @maxreciprocate in https://github.com/CarperAI/trlx/pull/450
  • fix(base_trainer): flatten tag list for tensorboard hparams logging by @maxreciprocate in https://github.com/CarperAI/trlx/pull/444
  • feat(requirements.txt): upgrade dependencies by @maxreciprocate in https://github.com/CarperAI/trlx/pull/465
  • fix(offlinepipeline): force `droplast` only for distributed by @maxreciprocate in https://github.com/CarperAI/trlx/pull/475
  • hotfix(bnb): install scipy with bitsanbytes to avoid ModuleNotFoundError by @jon-tow in https://github.com/CarperAI/trlx/pull/492
  • fix type hint in PromptPipeline.init by @g-simmons in https://github.com/CarperAI/trlx/pull/496
  • fix(modeling_ilql): single q-head indexing by @maxreciprocate in https://github.com/CarperAI/trlx/pull/471
  • Fix deprecated arguments for Accelerate >= v0.20.0 by @iwiwi in https://github.com/CarperAI/trlx/pull/506
  • Fix PPO log_ratio bug by @TobiasNorlund in https://github.com/CarperAI/trlx/pull/509
  • fix(ppo_trainer): default gen kwargs by @maxreciprocate in https://github.com/CarperAI/trlx/pull/510

New Contributors

  • @ZHAOTING made their first contribution in https://github.com/CarperAI/trlx/pull/417
  • @cauyxy made their first contribution in https://github.com/CarperAI/trlx/pull/409
  • @Chen9154 made their first contribution in https://github.com/CarperAI/trlx/pull/395
  • @jovany-wang made their first contribution in https://github.com/CarperAI/trlx/pull/455
  • @li-plus made their first contribution in https://github.com/CarperAI/trlx/pull/451
  • @mymusise made their first contribution in https://github.com/CarperAI/trlx/pull/460
  • @mikljohansson made their first contribution in https://github.com/CarperAI/trlx/pull/420
  • @g-simmons made their first contribution in https://github.com/CarperAI/trlx/pull/496
  • @iwiwi made their first contribution in https://github.com/CarperAI/trlx/pull/506
  • @TobiasNorlund made their first contribution in https://github.com/CarperAI/trlx/pull/509
  • @glerzing made their first contribution in https://github.com/CarperAI/trlx/pull/486

Full Changelog: https://github.com/CarperAI/trlx/compare/v0.6.0...v0.7.0

- Python
Published by jon-tow over 2 years ago

https://github.com/carperai/trlx - v0.6.0: LLaMa (Alpaca), Benchmark Util, T5 ILQL, Tests

The v0.6.0 release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:

📏 Benchmarking and Improved Unit Tests

This release introduces a new benchmark util to more easily track regressions in our training pipeline along with improved unit tests with the help of the hypothesis package: * [feat] Add benchmark tools by @reciprocated in https://github.com/CarperAI/trlx/pull/357 * Add hypothesis tests for ILQL and fix edge cases by @cat-state in https://github.com/CarperAI/trlx/pull/370

🦙 LLaMa and Alpaca PPO/SFT Support

PPO support and examples for LLaMa are now available and we’ve baked in an example for instruction fine-tuning models with the Alpaca dataset using our SFT trainer: * [feat] Add LLaMa Model support for PPO by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/375 * Add Alpaca by @cat-state in https://github.com/CarperAI/trlx/pull/400

5️⃣ T5 ILQL Support

T5 models can now be fine-tuned with ILQL: * Support ILQL for T5 model, Fix PPO T5 for refactored code by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/290

Fixes

  • Remove example usage of deprecating trlx.train dataset arg by @jon-tow in https://github.com/CarperAI/trlx/pull/331
  • Remove logit_mask unused argument by @cat-state in https://github.com/CarperAI/trlx/pull/332
  • [fix] Convert the rest of configs from ymls by @reciprocated in https://github.com/CarperAI/trlx/pull/346
  • fix defaultilqlconfig in notebook by @xu-song in https://github.com/CarperAI/trlx/pull/350
  • hot-fix: update PPOConfig import in examples by @jon-tow in https://github.com/CarperAI/trlx/pull/352
  • [fix] Update AdaptiveKLController with correct KL by @reciprocated in https://github.com/CarperAI/trlx/pull/361
  • [fix] Drop <eos> from ILQL sample's phrases by @reciprocated in https://github.com/CarperAI/trlx/pull/362
  • fixes half exp not implemented error by @Dahoas in https://github.com/CarperAI/trlx/pull/363
  • [fix] ILQL total_steps calculation when running distributed by @reciprocated in https://github.com/CarperAI/trlx/pull/374
  • [fix] split for validation by @hzwer in https://github.com/CarperAI/trlx/pull/369
  • fix(docs): Update incorrect PPORLElement logprob tensor shape hint by @jon-tow in https://github.com/CarperAI/trlx/pull/377
  • [fix] Enable HF downloads from a revision by @reciprocated in https://github.com/CarperAI/trlx/pull/382
  • [fix] Fix ILQL head sync under ZeRO3 by @reciprocated in https://github.com/CarperAI/trlx/pull/387
  • [fix] Preserve <eos> token and in-place it after trimming by @reciprocated in https://github.com/CarperAI/trlx/pull/401
  • Nemo ILQL fixes by @cat-state in https://github.com/CarperAI/trlx/pull/404

What's Changed

  • Move to Python config classes instead of ymls by @cat-state in https://github.com/CarperAI/trlx/pull/306
  • Add intermediate checkpointing to accelerate trainers by @jon-tow in https://github.com/CarperAI/trlx/pull/349
  • Enable infinite dataloader for prompt_dataloader in PPO Trainer by @alexandremuzio in https://github.com/CarperAI/trlx/pull/358
  • [feat] Add optional dependency list by @reciprocated in https://github.com/CarperAI/trlx/pull/381
  • Add some synchronization to the db download in the simulacra example by @dakinggg in https://github.com/CarperAI/trlx/pull/406 ## New Contributors
  • @xu-song made their first contribution in https://github.com/CarperAI/trlx/pull/350
  • @hzwer made their first contribution in https://github.com/CarperAI/trlx/pull/369
  • @alexandremuzio made their first contribution in https://github.com/CarperAI/trlx/pull/358
  • @dakinggg made their first contribution in https://github.com/CarperAI/trlx/pull/406

Full Changelog: https://github.com/CarperAI/trlx/compare/v0.5.0...v0.6.0

- Python
Published by jon-tow almost 3 years ago

https://github.com/carperai/trlx - v0.5.0: Initial NeMo integration, HH example, and improved Hugging Face integration

Highlights

  • Initial NeMo ILQL integration leading way to large-scale RLHF efforts. See https://github.com/CarperAI/trlx/blob/main/trlx/models/README.md to get started.
  • In-depth example showcasing trlx usage on AnthropicAI's Helpful & Harmless dataset https://github.com/CarperAI/trlx/tree/main/examples/hh
  • Improved ILQL modeling integration with Hugging Face transformers. Users can now work with AutoModelForCausalLMWithILQLHeads objects to generate samples and save/load fine-tuned ILQL models that can be quickly pushed to the Hub.

What's Changed

  • Add wandb group naming by @jon-tow in https://github.com/CarperAI/trlx/pull/188
  • Update reward_fn signatures in examples by @jon-tow in https://github.com/CarperAI/trlx/pull/190
  • Add tokenizer config by @reciprocated in https://github.com/CarperAI/trlx/pull/189
  • Fix extraction of mixed_precision option for deepspeed by @reciprocated in https://github.com/CarperAI/trlx/pull/197
  • Fix summarize_rlhf inference checkpoint paths by @jon-tow in https://github.com/CarperAI/trlx/pull/194
  • Make the config loading consistent across all example scripts. by @shermansiu in https://github.com/CarperAI/trlx/pull/192
  • Make Trainer.save_pretrained sub-directory optional by @jon-tow in https://github.com/CarperAI/trlx/pull/201
  • Update Readme to include T5 models by @aaronrmm in https://github.com/CarperAI/trlx/pull/198
  • Make make_head accept dtype parameter by @reciprocated in https://github.com/CarperAI/trlx/pull/213
  • Enable training with Tensorboard tracking by @marcobellagente93 in https://github.com/CarperAI/trlx/pull/209
  • Support nested updates in merge by @cat-state in https://github.com/CarperAI/trlx/pull/219
  • Fix typo reward normalize summarize by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/221
  • Update stale comment from results table by @jon-tow in https://github.com/CarperAI/trlx/pull/222
  • Fix undefined trackers property by @alan-cooney in https://github.com/CarperAI/trlx/pull/224
  • Fix tokenizer missing form config.to_dict() by @alan-cooney in https://github.com/CarperAI/trlx/pull/228
  • Make experiment tracking optional by @jon-tow in https://github.com/CarperAI/trlx/pull/226
  • read tokenizer path from config correctly by @JustinAWei in https://github.com/CarperAI/trlx/pull/230
  • Add devcontainer support by @alan-cooney in https://github.com/CarperAI/trlx/pull/196
  • fix: change loraa:float to lorar:int by @aaronrmm in https://github.com/CarperAI/trlx/pull/235
  • Bump isort to hotfix CI code quality workflow by @jon-tow in https://github.com/CarperAI/trlx/pull/237
  • Fix optional tracking in accelerator.log by @jon-tow in https://github.com/CarperAI/trlx/pull/233
  • Improve documentation/comments on the random walk example by @alan-cooney in https://github.com/CarperAI/trlx/pull/208
  • Update link to "Learning to Summarize from Human Feedback" by @jon-tow in https://github.com/CarperAI/trlx/pull/241
  • Fix deepspeed state saving under save_best condition by @reciprocated in https://github.com/CarperAI/trlx/pull/242
  • added colab notebook by @smellslikeml in https://github.com/CarperAI/trlx/pull/244
  • [style] Increase black's line length by @reciprocated in https://github.com/CarperAI/trlx/pull/250
  • Add help string to getadvantagesand_returns by @pesvut in https://github.com/CarperAI/trlx/pull/225
  • Filter out empty responses by @reciprocated in https://github.com/CarperAI/trlx/pull/265
  • NeMo Integrate by @cat-state in https://github.com/CarperAI/trlx/pull/125
  • Add multi-process logger utility for status monitoring by @jon-tow in https://github.com/CarperAI/trlx/pull/254
  • Add NeMo support info to README by @jon-tow in https://github.com/CarperAI/trlx/pull/275
  • Fix distributed dataloaders & deduplicate eval by @reciprocated in https://github.com/CarperAI/trlx/pull/276
  • Improve PPO readability by @alan-cooney in https://github.com/CarperAI/trlx/pull/210
  • Add T5 to delta modifier map by @aaronrmm in https://github.com/CarperAI/trlx/pull/234
  • [fix] Set deepspeed's fp16 auto_cast to false by @reciprocated in https://github.com/CarperAI/trlx/pull/279
  • Rename remaining logprobs_from_logits call by @jon-tow in https://github.com/CarperAI/trlx/pull/281
  • [feat] Add Accelerate SFT Trainer by @reciprocated in https://github.com/CarperAI/trlx/pull/280
  • Add Colab Notebook for Sentiment by @zswitten in https://github.com/CarperAI/trlx/pull/285
  • Remove pylance installs from devcontainer by @jon-tow in https://github.com/CarperAI/trlx/pull/296
  • Move notebooks to examples dir by @jon-tow in https://github.com/CarperAI/trlx/pull/294
  • [fix] Summarize config discrepancy by @reciprocated in https://github.com/CarperAI/trlx/pull/293
  • Make Git check optional by @cat-state in https://github.com/CarperAI/trlx/pull/299
  • refactor: remove orchestrator abstraction from API by @jon-tow in https://github.com/CarperAI/trlx/pull/289
  • Set add_special_tokens=False to not add EOS unexpectedly by @cat-state in https://github.com/CarperAI/trlx/pull/287
  • [feat] Gather experience samples by @reciprocated in https://github.com/CarperAI/trlx/pull/305
  • [fix] Make gather_for_metrics usage more strict by @reciprocated in https://github.com/CarperAI/trlx/pull/315
  • Add helpful and harmless example by @reciprocated in https://github.com/CarperAI/trlx/pull/128
  • Adopt PreTrainedModelWrapper for Hugging Face models by @jon-tow in https://github.com/CarperAI/trlx/pull/215

New Contributors

  • @shermansiu made their first contribution in https://github.com/CarperAI/trlx/pull/192
  • @aaronrmm made their first contribution in https://github.com/CarperAI/trlx/pull/198
  • @marcobellagente93 made their first contribution in https://github.com/CarperAI/trlx/pull/209
  • @alan-cooney made their first contribution in https://github.com/CarperAI/trlx/pull/224
  • @JustinAWei made their first contribution in https://github.com/CarperAI/trlx/pull/230
  • @smellslikeml made their first contribution in https://github.com/CarperAI/trlx/pull/244
  • @pesvut made their first contribution in https://github.com/CarperAI/trlx/pull/225
  • @zswitten made their first contribution in https://github.com/CarperAI/trlx/pull/285

Full Changelog: https://github.com/CarperAI/trlx/compare/v0.4...v0.5.0

- Python
Published by jon-tow about 3 years ago

https://github.com/carperai/trlx - v0.4

Summary of release notes:

Along with many improvements to experiment tracking, rollout logging, and configuration flexibility, new highlight features include:

  • Support for T5-based student models. Check out this example, where we show how to fine-tune a FLAN-T5 model on CNN/DailyMail for summarization.

  • Support for parameter-efficient tuning methods. Some of our preliminary results have shown LoRA to be a promising technique in scaling RLHF under low-resource settings and hope users get the chance to explore its potential. We've seen a ~30% reduction in memory usage and ~20% reduction in wallclock time for the same performance (quick report here)

  • Out-of-the-box support for 8-bit Adam(W) optimizers via TimDettmers/bitsandbytes, leading to a 15% decrease in memory allocation in one of our baseline examples (related report).

Other interesting examples are in the works, so stay tuned!

What's Changed

  • ILQL indicies on wrong device by @cat-state in https://github.com/CarperAI/trlx/pull/105
  • Fix ppo ratio inaccuracy by @reciprocated in https://github.com/CarperAI/trlx/pull/108
  • Set RNG seeds across multiple dependencies by @jon-tow in https://github.com/CarperAI/trlx/pull/113
  • Set seed after default config instantiation by @jon-tow in https://github.com/CarperAI/trlx/pull/114
  • Move queries on the device by @reciprocated in https://github.com/CarperAI/trlx/pull/115
  • Add ppo randomwalks example by @reciprocated in https://github.com/CarperAI/trlx/pull/119
  • Add unit tests to ensure valid example configs by @jon-tow in https://github.com/CarperAI/trlx/pull/120
  • updating gptj-config by @Dahoas in https://github.com/CarperAI/trlx/pull/109
  • Fix get distributed config by @reciprocated in https://github.com/CarperAI/trlx/pull/122
  • Add local rollout logging by @thomfoster in https://github.com/CarperAI/trlx/pull/124
  • Add support for more CausalLMs by @jon-tow in https://github.com/CarperAI/trlx/pull/103
  • Add hydra head support for GPTNeo by @jon-tow in https://github.com/CarperAI/trlx/pull/126
  • Add BloomModel hydra support by @jon-tow in https://github.com/CarperAI/trlx/pull/129
  • Simplifying logic to merge configs by @leshanbog in https://github.com/CarperAI/trlx/pull/134
  • add: load function for AccelerateRLModel by @dongs0104 in https://github.com/CarperAI/trlx/pull/136
  • Add OptimizerConfig and SchedulerConfig by @jon-tow in https://github.com/CarperAI/trlx/pull/135
  • Remove incorrect default config settings by @jon-tow in https://github.com/CarperAI/trlx/pull/137
  • Update TRL acknowledgement by @osanseviero in https://github.com/CarperAI/trlx/pull/138
  • Fix context overflow by @reciprocated in https://github.com/CarperAI/trlx/pull/131
  • Fix seeding per process by @reciprocated in https://github.com/CarperAI/trlx/pull/141
  • Set device-specific seeding with global rank by @jon-tow in https://github.com/CarperAI/trlx/pull/143
  • Freeze hydra model branches by @jon-tow in https://github.com/CarperAI/trlx/pull/140
  • Refactor RL model wrapper into a trainer module by @jon-tow in https://github.com/CarperAI/trlx/pull/144
  • Logging learning rate by @leshanbog in https://github.com/CarperAI/trlx/pull/147
  • Fix instantiating base transformer from a custom config by @reciprocated in https://github.com/CarperAI/trlx/pull/149
  • Linear LR scheduler by @leshanbog in https://github.com/CarperAI/trlx/pull/150
  • Update pre-commit version and add isort by @jon-tow in https://github.com/CarperAI/trlx/pull/152
  • fix: configure flake8, fix errors, add trackers config by @Mistobaan in https://github.com/CarperAI/trlx/pull/157
  • Features/use-python-3.8-in-ci by @Mistobaan in https://github.com/CarperAI/trlx/pull/159
  • Add bitsandbytes optimizer support by @aicrumb in https://github.com/CarperAI/trlx/pull/133
  • initial commit for trlx LORA support by @ethankim00 in https://github.com/CarperAI/trlx/pull/110
  • Fix default delta_kwargs handling by @jon-tow in https://github.com/CarperAI/trlx/pull/171
  • Add T5 model by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/145
  • Fix wandb.errors.RequireError as reported in #162 by @ayulockin in https://github.com/CarperAI/trlx/pull/167
  • Update README.md by @LouisCastricato in https://github.com/CarperAI/trlx/pull/180
  • Update ILQL details by @reciprocated in https://github.com/CarperAI/trlx/pull/156
  • Add OpenAI Summarize RLHF with trlX by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/175
  • Fix HuggingFace model.save_pretrained for DDP by @jon-tow in https://github.com/CarperAI/trlx/pull/181
  • Update generation utilities by @reciprocated in https://github.com/CarperAI/trlx/pull/172

New Contributors

  • @thomfoster made their first contribution in https://github.com/CarperAI/trlx/pull/124
  • @leshanbog made their first contribution in https://github.com/CarperAI/trlx/pull/134
  • @dongs0104 made their first contribution in https://github.com/CarperAI/trlx/pull/136
  • @osanseviero made their first contribution in https://github.com/CarperAI/trlx/pull/138
  • @Mistobaan made their first contribution in https://github.com/CarperAI/trlx/pull/157
  • @aicrumb made their first contribution in https://github.com/CarperAI/trlx/pull/133
  • @ethankim00 made their first contribution in https://github.com/CarperAI/trlx/pull/110
  • @PhungVanDuy made their first contribution in https://github.com/CarperAI/trlx/pull/145

Full Changelog: https://github.com/CarperAI/trlx/compare/v0.3...v0.4

- Python
Published by LouisCastricato about 3 years ago

https://github.com/carperai/trlx - Pre alpha v0.3

What's Changed

  • Download simulacra by @reciprocated in https://github.com/CarperAI/trlx/pull/62
  • Update documentation (first review) by @simoninithomas in https://github.com/CarperAI/trlx/pull/64
  • Add ckpt/ to gitignore by @ayulockin in https://github.com/CarperAI/trlx/pull/70
  • change version in package to match lib by @cat-state in https://github.com/CarperAI/trlx/pull/73
  • Docs by @shahbuland in https://github.com/CarperAI/trlx/pull/71
  • [fix] Remove stale options from ppo_gptj.yml by @jon-tow in https://github.com/CarperAI/trlx/pull/77
  • Add entity name config for wandb logging by @jon-tow in https://github.com/CarperAI/trlx/pull/78
  • EXAMPLE : Interpreter grounded Neural Program Synthesis [WIP] by @reshinthadithyan in https://github.com/CarperAI/trlx/pull/81
  • Update TrainConfig optimizer hyperparameters by @jon-tow in https://github.com/CarperAI/trlx/pull/82
  • Add examples tip to contribution guide by @jon-tow in https://github.com/CarperAI/trlx/pull/84
  • Fix pipeline's context overflow by @reciprocated in https://github.com/CarperAI/trlx/pull/87
  • Refactor PPO objective function by @jon-tow in https://github.com/CarperAI/trlx/pull/88
  • Fix slow ilql eval by @reciprocated in https://github.com/CarperAI/trlx/pull/91
  • rerun https://github.com/CarperAI/trlx/pull/89 by @cat-state in https://github.com/CarperAI/trlx/pull/92
  • Hyperparameter Optimization with Ray Tune and Weights and Biases by @ayulockin in https://github.com/CarperAI/trlx/pull/76
  • Update readme instructions by @reciprocated in https://github.com/CarperAI/trlx/pull/93
  • Update README to align nomenclature correctness by @ayulockin in https://github.com/CarperAI/trlx/pull/97
  • Add optional reward scaling by @reciprocated in https://github.com/CarperAI/trlx/pull/95
  • Force class registry via imports by @jon-tow in https://github.com/CarperAI/trlx/pull/100
  • Add optional normalization (cont.) by @reciprocated in https://github.com/CarperAI/trlx/pull/98
  • Restructure sweeps for reuse by @reciprocated in https://github.com/CarperAI/trlx/pull/102

New Contributors

  • @simoninithomas made their first contribution in https://github.com/CarperAI/trlx/pull/64
  • @ayulockin made their first contribution in https://github.com/CarperAI/trlx/pull/70
  • @reshinthadithyan made their first contribution in https://github.com/CarperAI/trlx/pull/81

Full Changelog: https://github.com/CarperAI/trlx/compare/v0.2...v0.3

- Python
Published by LouisCastricato over 3 years ago

https://github.com/carperai/trlx - Alpha v0.2

Complete revamp of our initial release.

New features: - Hydra models, 20x faster than vanilla PPO with minimal performance hits at large scales - Massively revamped API, significantly less boiler plate. - Save/load callbacks. - Greatly improved orchestrator. - Better commented RL code, easier to understand whats going on. - Cool examples, including architext and simulacra. - Better extendability, and standardized styling.

Features coming soon: - Megatron support! we're already working on this. - More interesting examples that are relevant to production use cases of TRLX. - Better integration of W&B, including sweeps. - Evaluation and benchmarking.

:)

Autogenerated release notes below:

What's Changed

  • Fix typo by @mrm8488 in https://github.com/CarperAI/trlx/pull/2
  • Create LICENSE by @LouisCastricato in https://github.com/CarperAI/trlx/pull/3
  • QOL fixes by @LouisCastricato in https://github.com/CarperAI/trlx/pull/5
  • stage ilql by @reciprocated in https://github.com/CarperAI/trlx/pull/6
  • Adds style file and reward function capabilities to ppo orchestrator by @LouisCastricato in https://github.com/CarperAI/trlx/pull/8
  • Update ppo value head + print logs by @Dahoas in https://github.com/CarperAI/trlx/pull/11
  • Make ilql respect the config & remove sin by @reciprocated in https://github.com/CarperAI/trlx/pull/22
  • Docs by @shahbuland in https://github.com/CarperAI/trlx/pull/31
  • Implemented hydra heads + adaptive kl by @Dahoas in https://github.com/CarperAI/trlx/pull/33
  • Add pre-commit with black by @cat-state in https://github.com/CarperAI/trlx/pull/36
  • [update] Improve package setup by @jon-tow in https://github.com/CarperAI/trlx/pull/42
  • Add initial issue templates by @jon-tow in https://github.com/CarperAI/trlx/pull/45
  • Some readme improvements by @thedch in https://github.com/CarperAI/trlx/pull/44
  • Add initial GitHub workflows by @jon-tow in https://github.com/CarperAI/trlx/pull/43
  • [docs] Add CONTRIBUTING.md by @jon-tow in https://github.com/CarperAI/trlx/pull/52
  • Simplify api by @reciprocated in https://github.com/CarperAI/trlx/pull/24

New Contributors

  • @mrm8488 made their first contribution in https://github.com/CarperAI/trlx/pull/2
  • @LouisCastricato made their first contribution in https://github.com/CarperAI/trlx/pull/3
  • @reciprocated made their first contribution in https://github.com/CarperAI/trlx/pull/6
  • @Dahoas made their first contribution in https://github.com/CarperAI/trlx/pull/11
  • @shahbuland made their first contribution in https://github.com/CarperAI/trlx/pull/31
  • @cat-state made their first contribution in https://github.com/CarperAI/trlx/pull/36
  • @jon-tow made their first contribution in https://github.com/CarperAI/trlx/pull/42
  • @thedch made their first contribution in https://github.com/CarperAI/trlx/pull/44

Full Changelog: https://github.com/CarperAI/trlx/commits/v0.2

- Python
Published by LouisCastricato over 3 years ago