Recent Releases of https://github.com/carperai/trlx
https://github.com/carperai/trlx - v0.7.0: NeMo PPO, PEFT Migration, and Fixes
The v0.7.0 release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:
🐠 NeMo PPO and SFT support
This release introduces NeMo-backed PPO and SFT implementations for capabilities and improved system performance under large-scale training.
- NeMo PPO by @cat-state in https://github.com/CarperAI/trlx/pull/472
- Add Supervised Fine-Tuning (SFT) support for NeMo backend by @jon-tow in https://github.com/CarperAI/trlx/pull/353
🦆 PEFT Migration
trlx now supports parameter-efficient tuning methods via the peft library, which we hope will provide greater access to RLHF training in low-resource settings.
- peft to opendelta migration (#434) + memory optimization (#320) by @glerzing in https://github.com/CarperAI/trlx/pull/486
Fixes and mores!
- Set pad_token for all tokenizers in tests by @cat-state in https://github.com/CarperAI/trlx/pull/414
- Convert tensors in the stats dict into scalars by @ZHAOTING in https://github.com/CarperAI/trlx/pull/417
- Add Translation Finetuning Example with T5 by @alexandremuzio in https://github.com/CarperAI/trlx/pull/392
- set torch dependency to version 2.0.0 for CUDA in installation instru… by @cauyxy in https://github.com/CarperAI/trlx/pull/409
- [fix] add
position_idstoLlamaModelBranchby @jon-tow in https://github.com/CarperAI/trlx/pull/418 - fix(CI): use pinned deps for CI testing by @jon-tow in https://github.com/CarperAI/trlx/pull/423
- Minibatch impl by @Dahoas in https://github.com/CarperAI/trlx/pull/364
- [feat] Support tying metadata to each prompt by @maxreciprocate in https://github.com/CarperAI/trlx/pull/421
- feat(examples): revamp simulacra example by @maxreciprocate in https://github.com/CarperAI/trlx/pull/430
- [fix] update pairwise dataloader. by @Chen9154 in https://github.com/CarperAI/trlx/pull/395
- fix(sfttrainer): `totalsteps` calculation when running distributed by @maxreciprocate in https://github.com/CarperAI/trlx/pull/432
- fix(basetrainer): gather weights in `savepretrained` under zero3 by @maxreciprocate in https://github.com/CarperAI/trlx/pull/429
- fix(offline_pipeline): ILQL negative indexing under truncation by @maxreciprocate in https://github.com/CarperAI/trlx/pull/435
- fix(ppo_trainer): compute mean KL sequence-wise by @maxreciprocate in https://github.com/CarperAI/trlx/pull/441
- Create Example training scripts to run in Stability cluster by @alexandremuzio in https://github.com/CarperAI/trlx/pull/419
- Upgrade official released Ray instead of an unstable one. by @jovany-wang in https://github.com/CarperAI/trlx/pull/455
- Pin transformers<=4.27.1 by @jovany-wang in https://github.com/CarperAI/trlx/pull/458
- fix(ppogpt): prevent positionids being None by @li-plus in https://github.com/CarperAI/trlx/pull/451
- fix(trainer): init self.generatesweepkwarg at self.init by @mymusise in https://github.com/CarperAI/trlx/pull/460
- Ensure trailing EOS token is added correctly for shorter generated outputs by @mikljohansson in https://github.com/CarperAI/trlx/pull/420
- Pad prompts to the right in T5 examples and add EOS token to seq2seq prompts by @mikljohansson in https://github.com/CarperAI/trlx/pull/422
- docs(basetrainer): fill in missing `preparelearning` method by @maxreciprocate in https://github.com/CarperAI/trlx/pull/449
- fix(modeling_ppo): invert padding percentage calculation by @maxreciprocate in https://github.com/CarperAI/trlx/pull/450
- fix(base_trainer): flatten tag list for tensorboard hparams logging by @maxreciprocate in https://github.com/CarperAI/trlx/pull/444
- feat(requirements.txt): upgrade dependencies by @maxreciprocate in https://github.com/CarperAI/trlx/pull/465
- fix(offlinepipeline): force `droplast` only for distributed by @maxreciprocate in https://github.com/CarperAI/trlx/pull/475
- hotfix(bnb): install
scipywithbitsanbytesto avoidModuleNotFoundErrorby @jon-tow in https://github.com/CarperAI/trlx/pull/492 - fix type hint in PromptPipeline.init by @g-simmons in https://github.com/CarperAI/trlx/pull/496
- fix(modeling_ilql): single q-head indexing by @maxreciprocate in https://github.com/CarperAI/trlx/pull/471
- Fix deprecated arguments for Accelerate >= v0.20.0 by @iwiwi in https://github.com/CarperAI/trlx/pull/506
- Fix PPO log_ratio bug by @TobiasNorlund in https://github.com/CarperAI/trlx/pull/509
- fix(ppo_trainer): default gen kwargs by @maxreciprocate in https://github.com/CarperAI/trlx/pull/510
New Contributors
- @ZHAOTING made their first contribution in https://github.com/CarperAI/trlx/pull/417
- @cauyxy made their first contribution in https://github.com/CarperAI/trlx/pull/409
- @Chen9154 made their first contribution in https://github.com/CarperAI/trlx/pull/395
- @jovany-wang made their first contribution in https://github.com/CarperAI/trlx/pull/455
- @li-plus made their first contribution in https://github.com/CarperAI/trlx/pull/451
- @mymusise made their first contribution in https://github.com/CarperAI/trlx/pull/460
- @mikljohansson made their first contribution in https://github.com/CarperAI/trlx/pull/420
- @g-simmons made their first contribution in https://github.com/CarperAI/trlx/pull/496
- @iwiwi made their first contribution in https://github.com/CarperAI/trlx/pull/506
- @TobiasNorlund made their first contribution in https://github.com/CarperAI/trlx/pull/509
- @glerzing made their first contribution in https://github.com/CarperAI/trlx/pull/486
Full Changelog: https://github.com/CarperAI/trlx/compare/v0.6.0...v0.7.0
- Python
Published by jon-tow over 2 years ago
https://github.com/carperai/trlx - v0.6.0: LLaMa (Alpaca), Benchmark Util, T5 ILQL, Tests
The v0.6.0 release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:
📏 Benchmarking and Improved Unit Tests
This release introduces a new benchmark util to more easily track regressions in our training pipeline along with improved unit tests with the help of the hypothesis package:
* [feat] Add benchmark tools by @reciprocated in https://github.com/CarperAI/trlx/pull/357
* Add hypothesis tests for ILQL and fix edge cases by @cat-state in https://github.com/CarperAI/trlx/pull/370
🦙 LLaMa and Alpaca PPO/SFT Support
PPO support and examples for LLaMa are now available and we’ve baked in an example for instruction fine-tuning models with the Alpaca dataset using our SFT trainer: * [feat] Add LLaMa Model support for PPO by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/375 * Add Alpaca by @cat-state in https://github.com/CarperAI/trlx/pull/400
5️⃣ T5 ILQL Support
T5 models can now be fine-tuned with ILQL: * Support ILQL for T5 model, Fix PPO T5 for refactored code by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/290
Fixes
- Remove example usage of deprecating
trlx.traindataset arg by @jon-tow in https://github.com/CarperAI/trlx/pull/331 - Remove logit_mask unused argument by @cat-state in https://github.com/CarperAI/trlx/pull/332
- [fix] Convert the rest of configs from
ymlsby @reciprocated in https://github.com/CarperAI/trlx/pull/346 - fix defaultilqlconfig in notebook by @xu-song in https://github.com/CarperAI/trlx/pull/350
- hot-fix: update PPOConfig import in examples by @jon-tow in https://github.com/CarperAI/trlx/pull/352
- [fix] Update
AdaptiveKLControllerwith correct KL by @reciprocated in https://github.com/CarperAI/trlx/pull/361 - [fix] Drop
<eos>from ILQL sample's phrases by @reciprocated in https://github.com/CarperAI/trlx/pull/362 - fixes half exp not implemented error by @Dahoas in https://github.com/CarperAI/trlx/pull/363
- [fix] ILQL
total_stepscalculation when running distributed by @reciprocated in https://github.com/CarperAI/trlx/pull/374 - [fix] split for validation by @hzwer in https://github.com/CarperAI/trlx/pull/369
- fix(docs): Update incorrect
PPORLElementlogprob tensor shape hint by @jon-tow in https://github.com/CarperAI/trlx/pull/377 - [fix] Enable HF downloads from a revision by @reciprocated in https://github.com/CarperAI/trlx/pull/382
- [fix] Fix ILQL head sync under ZeRO3 by @reciprocated in https://github.com/CarperAI/trlx/pull/387
- [fix] Preserve
<eos>token and in-place it after trimming by @reciprocated in https://github.com/CarperAI/trlx/pull/401 - Nemo ILQL fixes by @cat-state in https://github.com/CarperAI/trlx/pull/404
What's Changed
- Move to Python config classes instead of
ymlsby @cat-state in https://github.com/CarperAI/trlx/pull/306 - Add intermediate checkpointing to
acceleratetrainers by @jon-tow in https://github.com/CarperAI/trlx/pull/349 - Enable infinite dataloader for prompt_dataloader in PPO Trainer by @alexandremuzio in https://github.com/CarperAI/trlx/pull/358
- [feat] Add optional dependency list by @reciprocated in https://github.com/CarperAI/trlx/pull/381
- Add some synchronization to the db download in the simulacra example by @dakinggg in https://github.com/CarperAI/trlx/pull/406 ## New Contributors
- @xu-song made their first contribution in https://github.com/CarperAI/trlx/pull/350
- @hzwer made their first contribution in https://github.com/CarperAI/trlx/pull/369
- @alexandremuzio made their first contribution in https://github.com/CarperAI/trlx/pull/358
- @dakinggg made their first contribution in https://github.com/CarperAI/trlx/pull/406
Full Changelog: https://github.com/CarperAI/trlx/compare/v0.5.0...v0.6.0
- Python
Published by jon-tow almost 3 years ago
https://github.com/carperai/trlx - v0.5.0: Initial NeMo integration, HH example, and improved Hugging Face integration
Highlights
- Initial NeMo ILQL integration leading way to large-scale RLHF efforts. See https://github.com/CarperAI/trlx/blob/main/trlx/models/README.md to get started.
- In-depth example showcasing
trlxusage on AnthropicAI's Helpful & Harmless dataset https://github.com/CarperAI/trlx/tree/main/examples/hh - Improved ILQL modeling integration with Hugging Face
transformers. Users can now work withAutoModelForCausalLMWithILQLHeadsobjects to generate samples and save/load fine-tuned ILQL models that can be quickly pushed to the Hub.
What's Changed
- Add
wandbgroup naming by @jon-tow in https://github.com/CarperAI/trlx/pull/188 - Update
reward_fnsignatures in examples by @jon-tow in https://github.com/CarperAI/trlx/pull/190 - Add tokenizer config by @reciprocated in https://github.com/CarperAI/trlx/pull/189
- Fix extraction of
mixed_precisionoption for deepspeed by @reciprocated in https://github.com/CarperAI/trlx/pull/197 - Fix
summarize_rlhfinference checkpoint paths by @jon-tow in https://github.com/CarperAI/trlx/pull/194 - Make the config loading consistent across all example scripts. by @shermansiu in https://github.com/CarperAI/trlx/pull/192
- Make
Trainer.save_pretrainedsub-directory optional by @jon-tow in https://github.com/CarperAI/trlx/pull/201 - Update Readme to include T5 models by @aaronrmm in https://github.com/CarperAI/trlx/pull/198
- Make
make_headaccept dtype parameter by @reciprocated in https://github.com/CarperAI/trlx/pull/213 - Enable training with Tensorboard tracking by @marcobellagente93 in https://github.com/CarperAI/trlx/pull/209
- Support nested updates in
mergeby @cat-state in https://github.com/CarperAI/trlx/pull/219 - Fix typo reward normalize summarize by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/221
- Update stale comment from results table by @jon-tow in https://github.com/CarperAI/trlx/pull/222
- Fix undefined trackers property by @alan-cooney in https://github.com/CarperAI/trlx/pull/224
- Fix tokenizer missing form config.to_dict() by @alan-cooney in https://github.com/CarperAI/trlx/pull/228
- Make experiment tracking optional by @jon-tow in https://github.com/CarperAI/trlx/pull/226
- read tokenizer path from config correctly by @JustinAWei in https://github.com/CarperAI/trlx/pull/230
- Add devcontainer support by @alan-cooney in https://github.com/CarperAI/trlx/pull/196
- fix: change loraa:float to lorar:int by @aaronrmm in https://github.com/CarperAI/trlx/pull/235
- Bump
isortto hotfix CI code quality workflow by @jon-tow in https://github.com/CarperAI/trlx/pull/237 - Fix optional tracking in
accelerator.logby @jon-tow in https://github.com/CarperAI/trlx/pull/233 - Improve documentation/comments on the random walk example by @alan-cooney in https://github.com/CarperAI/trlx/pull/208
- Update link to "Learning to Summarize from Human Feedback" by @jon-tow in https://github.com/CarperAI/trlx/pull/241
- Fix deepspeed state saving under
save_bestcondition by @reciprocated in https://github.com/CarperAI/trlx/pull/242 - added colab notebook by @smellslikeml in https://github.com/CarperAI/trlx/pull/244
- [style] Increase black's line length by @reciprocated in https://github.com/CarperAI/trlx/pull/250
- Add help string to getadvantagesand_returns by @pesvut in https://github.com/CarperAI/trlx/pull/225
- Filter out empty responses by @reciprocated in https://github.com/CarperAI/trlx/pull/265
- NeMo Integrate by @cat-state in https://github.com/CarperAI/trlx/pull/125
- Add multi-process logger utility for status monitoring by @jon-tow in https://github.com/CarperAI/trlx/pull/254
- Add
NeMosupport info toREADMEby @jon-tow in https://github.com/CarperAI/trlx/pull/275 - Fix distributed dataloaders & deduplicate eval by @reciprocated in https://github.com/CarperAI/trlx/pull/276
- Improve PPO readability by @alan-cooney in https://github.com/CarperAI/trlx/pull/210
- Add T5 to delta modifier map by @aaronrmm in https://github.com/CarperAI/trlx/pull/234
- [fix] Set deepspeed's fp16
auto_castto false by @reciprocated in https://github.com/CarperAI/trlx/pull/279 - Rename remaining
logprobs_from_logitscall by @jon-tow in https://github.com/CarperAI/trlx/pull/281 - [feat] Add Accelerate SFT Trainer by @reciprocated in https://github.com/CarperAI/trlx/pull/280
- Add Colab Notebook for Sentiment by @zswitten in https://github.com/CarperAI/trlx/pull/285
- Remove
pylanceinstalls from devcontainer by @jon-tow in https://github.com/CarperAI/trlx/pull/296 - Move notebooks to examples dir by @jon-tow in https://github.com/CarperAI/trlx/pull/294
- [fix] Summarize config discrepancy by @reciprocated in https://github.com/CarperAI/trlx/pull/293
- Make Git check optional by @cat-state in https://github.com/CarperAI/trlx/pull/299
- refactor: remove orchestrator abstraction from API by @jon-tow in https://github.com/CarperAI/trlx/pull/289
- Set
add_special_tokens=Falseto not add EOS unexpectedly by @cat-state in https://github.com/CarperAI/trlx/pull/287 - [feat] Gather experience samples by @reciprocated in https://github.com/CarperAI/trlx/pull/305
- [fix] Make
gather_for_metricsusage more strict by @reciprocated in https://github.com/CarperAI/trlx/pull/315 - Add helpful and harmless example by @reciprocated in https://github.com/CarperAI/trlx/pull/128
- Adopt
PreTrainedModelWrapperfor Hugging Face models by @jon-tow in https://github.com/CarperAI/trlx/pull/215
New Contributors
- @shermansiu made their first contribution in https://github.com/CarperAI/trlx/pull/192
- @aaronrmm made their first contribution in https://github.com/CarperAI/trlx/pull/198
- @marcobellagente93 made their first contribution in https://github.com/CarperAI/trlx/pull/209
- @alan-cooney made their first contribution in https://github.com/CarperAI/trlx/pull/224
- @JustinAWei made their first contribution in https://github.com/CarperAI/trlx/pull/230
- @smellslikeml made their first contribution in https://github.com/CarperAI/trlx/pull/244
- @pesvut made their first contribution in https://github.com/CarperAI/trlx/pull/225
- @zswitten made their first contribution in https://github.com/CarperAI/trlx/pull/285
Full Changelog: https://github.com/CarperAI/trlx/compare/v0.4...v0.5.0
- Python
Published by jon-tow about 3 years ago
https://github.com/carperai/trlx - v0.4
Summary of release notes:
Along with many improvements to experiment tracking, rollout logging, and configuration flexibility, new highlight features include:
Support for T5-based student models. Check out this example, where we show how to fine-tune a FLAN-T5 model on CNN/DailyMail for summarization.
Support for parameter-efficient tuning methods. Some of our preliminary results have shown LoRA to be a promising technique in scaling RLHF under low-resource settings and hope users get the chance to explore its potential. We've seen a ~30% reduction in memory usage and ~20% reduction in wallclock time for the same performance (quick report here)
Out-of-the-box support for 8-bit Adam(W) optimizers via TimDettmers/bitsandbytes, leading to a 15% decrease in memory allocation in one of our baseline examples (related report).
Other interesting examples are in the works, so stay tuned!
What's Changed
- ILQL indicies on wrong device by @cat-state in https://github.com/CarperAI/trlx/pull/105
- Fix ppo ratio inaccuracy by @reciprocated in https://github.com/CarperAI/trlx/pull/108
- Set RNG seeds across multiple dependencies by @jon-tow in https://github.com/CarperAI/trlx/pull/113
- Set seed after default config instantiation by @jon-tow in https://github.com/CarperAI/trlx/pull/114
- Move queries on the device by @reciprocated in https://github.com/CarperAI/trlx/pull/115
- Add ppo randomwalks example by @reciprocated in https://github.com/CarperAI/trlx/pull/119
- Add unit tests to ensure valid example configs by @jon-tow in https://github.com/CarperAI/trlx/pull/120
- updating gptj-config by @Dahoas in https://github.com/CarperAI/trlx/pull/109
- Fix get distributed config by @reciprocated in https://github.com/CarperAI/trlx/pull/122
- Add local rollout logging by @thomfoster in https://github.com/CarperAI/trlx/pull/124
- Add support for more
CausalLMs by @jon-tow in https://github.com/CarperAI/trlx/pull/103 - Add hydra head support for
GPTNeoby @jon-tow in https://github.com/CarperAI/trlx/pull/126 - Add
BloomModelhydra support by @jon-tow in https://github.com/CarperAI/trlx/pull/129 - Simplifying logic to merge configs by @leshanbog in https://github.com/CarperAI/trlx/pull/134
- add: load function for AccelerateRLModel by @dongs0104 in https://github.com/CarperAI/trlx/pull/136
- Add
OptimizerConfigandSchedulerConfigby @jon-tow in https://github.com/CarperAI/trlx/pull/135 - Remove incorrect default config settings by @jon-tow in https://github.com/CarperAI/trlx/pull/137
- Update TRL acknowledgement by @osanseviero in https://github.com/CarperAI/trlx/pull/138
- Fix context overflow by @reciprocated in https://github.com/CarperAI/trlx/pull/131
- Fix seeding per process by @reciprocated in https://github.com/CarperAI/trlx/pull/141
- Set device-specific seeding with global rank by @jon-tow in https://github.com/CarperAI/trlx/pull/143
- Freeze hydra model branches by @jon-tow in https://github.com/CarperAI/trlx/pull/140
- Refactor RL model wrapper into a
trainermodule by @jon-tow in https://github.com/CarperAI/trlx/pull/144 - Logging learning rate by @leshanbog in https://github.com/CarperAI/trlx/pull/147
- Fix instantiating base transformer from a custom config by @reciprocated in https://github.com/CarperAI/trlx/pull/149
- Linear LR scheduler by @leshanbog in https://github.com/CarperAI/trlx/pull/150
- Update
pre-commitversion and addisortby @jon-tow in https://github.com/CarperAI/trlx/pull/152 - fix: configure flake8, fix errors, add
trackersconfig by @Mistobaan in https://github.com/CarperAI/trlx/pull/157 - Features/use-python-3.8-in-ci by @Mistobaan in https://github.com/CarperAI/trlx/pull/159
- Add
bitsandbytesoptimizer support by @aicrumb in https://github.com/CarperAI/trlx/pull/133 - initial commit for trlx LORA support by @ethankim00 in https://github.com/CarperAI/trlx/pull/110
- Fix default
delta_kwargshandling by @jon-tow in https://github.com/CarperAI/trlx/pull/171 - Add T5 model by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/145
- Fix wandb.errors.RequireError as reported in #162 by @ayulockin in https://github.com/CarperAI/trlx/pull/167
- Update README.md by @LouisCastricato in https://github.com/CarperAI/trlx/pull/180
- Update ILQL details by @reciprocated in https://github.com/CarperAI/trlx/pull/156
- Add OpenAI Summarize RLHF with trlX by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/175
- Fix HuggingFace
model.save_pretrainedfor DDP by @jon-tow in https://github.com/CarperAI/trlx/pull/181 - Update generation utilities by @reciprocated in https://github.com/CarperAI/trlx/pull/172
New Contributors
- @thomfoster made their first contribution in https://github.com/CarperAI/trlx/pull/124
- @leshanbog made their first contribution in https://github.com/CarperAI/trlx/pull/134
- @dongs0104 made their first contribution in https://github.com/CarperAI/trlx/pull/136
- @osanseviero made their first contribution in https://github.com/CarperAI/trlx/pull/138
- @Mistobaan made their first contribution in https://github.com/CarperAI/trlx/pull/157
- @aicrumb made their first contribution in https://github.com/CarperAI/trlx/pull/133
- @ethankim00 made their first contribution in https://github.com/CarperAI/trlx/pull/110
- @PhungVanDuy made their first contribution in https://github.com/CarperAI/trlx/pull/145
Full Changelog: https://github.com/CarperAI/trlx/compare/v0.3...v0.4
- Python
Published by LouisCastricato about 3 years ago
https://github.com/carperai/trlx - Pre alpha v0.3
What's Changed
- Download simulacra by @reciprocated in https://github.com/CarperAI/trlx/pull/62
- Update documentation (first review) by @simoninithomas in https://github.com/CarperAI/trlx/pull/64
- Add ckpt/ to gitignore by @ayulockin in https://github.com/CarperAI/trlx/pull/70
- change version in package to match lib by @cat-state in https://github.com/CarperAI/trlx/pull/73
- Docs by @shahbuland in https://github.com/CarperAI/trlx/pull/71
- [fix] Remove stale options from
ppo_gptj.ymlby @jon-tow in https://github.com/CarperAI/trlx/pull/77 - Add
entityname config forwandblogging by @jon-tow in https://github.com/CarperAI/trlx/pull/78 - EXAMPLE : Interpreter grounded Neural Program Synthesis [WIP] by @reshinthadithyan in https://github.com/CarperAI/trlx/pull/81
- Update
TrainConfigoptimizer hyperparameters by @jon-tow in https://github.com/CarperAI/trlx/pull/82 - Add examples tip to contribution guide by @jon-tow in https://github.com/CarperAI/trlx/pull/84
- Fix pipeline's context overflow by @reciprocated in https://github.com/CarperAI/trlx/pull/87
- Refactor PPO objective function by @jon-tow in https://github.com/CarperAI/trlx/pull/88
- Fix slow ilql eval by @reciprocated in https://github.com/CarperAI/trlx/pull/91
- rerun https://github.com/CarperAI/trlx/pull/89 by @cat-state in https://github.com/CarperAI/trlx/pull/92
- Hyperparameter Optimization with Ray Tune and Weights and Biases by @ayulockin in https://github.com/CarperAI/trlx/pull/76
- Update readme instructions by @reciprocated in https://github.com/CarperAI/trlx/pull/93
- Update README to align nomenclature correctness by @ayulockin in https://github.com/CarperAI/trlx/pull/97
- Add optional reward scaling by @reciprocated in https://github.com/CarperAI/trlx/pull/95
- Force class registry via imports by @jon-tow in https://github.com/CarperAI/trlx/pull/100
- Add optional normalization (cont.) by @reciprocated in https://github.com/CarperAI/trlx/pull/98
- Restructure sweeps for reuse by @reciprocated in https://github.com/CarperAI/trlx/pull/102
New Contributors
- @simoninithomas made their first contribution in https://github.com/CarperAI/trlx/pull/64
- @ayulockin made their first contribution in https://github.com/CarperAI/trlx/pull/70
- @reshinthadithyan made their first contribution in https://github.com/CarperAI/trlx/pull/81
Full Changelog: https://github.com/CarperAI/trlx/compare/v0.2...v0.3
- Python
Published by LouisCastricato over 3 years ago
https://github.com/carperai/trlx - Alpha v0.2
Complete revamp of our initial release.
New features: - Hydra models, 20x faster than vanilla PPO with minimal performance hits at large scales - Massively revamped API, significantly less boiler plate. - Save/load callbacks. - Greatly improved orchestrator. - Better commented RL code, easier to understand whats going on. - Cool examples, including architext and simulacra. - Better extendability, and standardized styling.
Features coming soon: - Megatron support! we're already working on this. - More interesting examples that are relevant to production use cases of TRLX. - Better integration of W&B, including sweeps. - Evaluation and benchmarking.
:)
Autogenerated release notes below:
What's Changed
- Fix typo by @mrm8488 in https://github.com/CarperAI/trlx/pull/2
- Create LICENSE by @LouisCastricato in https://github.com/CarperAI/trlx/pull/3
- QOL fixes by @LouisCastricato in https://github.com/CarperAI/trlx/pull/5
- stage ilql by @reciprocated in https://github.com/CarperAI/trlx/pull/6
- Adds style file and reward function capabilities to ppo orchestrator by @LouisCastricato in https://github.com/CarperAI/trlx/pull/8
- Update ppo value head + print logs by @Dahoas in https://github.com/CarperAI/trlx/pull/11
- Make ilql respect the config & remove sin by @reciprocated in https://github.com/CarperAI/trlx/pull/22
- Docs by @shahbuland in https://github.com/CarperAI/trlx/pull/31
- Implemented hydra heads + adaptive kl by @Dahoas in https://github.com/CarperAI/trlx/pull/33
- Add pre-commit with
blackby @cat-state in https://github.com/CarperAI/trlx/pull/36 - [update] Improve package setup by @jon-tow in https://github.com/CarperAI/trlx/pull/42
- Add initial issue templates by @jon-tow in https://github.com/CarperAI/trlx/pull/45
- Some readme improvements by @thedch in https://github.com/CarperAI/trlx/pull/44
- Add initial GitHub workflows by @jon-tow in https://github.com/CarperAI/trlx/pull/43
- [docs] Add
CONTRIBUTING.mdby @jon-tow in https://github.com/CarperAI/trlx/pull/52 - Simplify api by @reciprocated in https://github.com/CarperAI/trlx/pull/24
New Contributors
- @mrm8488 made their first contribution in https://github.com/CarperAI/trlx/pull/2
- @LouisCastricato made their first contribution in https://github.com/CarperAI/trlx/pull/3
- @reciprocated made their first contribution in https://github.com/CarperAI/trlx/pull/6
- @Dahoas made their first contribution in https://github.com/CarperAI/trlx/pull/11
- @shahbuland made their first contribution in https://github.com/CarperAI/trlx/pull/31
- @cat-state made their first contribution in https://github.com/CarperAI/trlx/pull/36
- @jon-tow made their first contribution in https://github.com/CarperAI/trlx/pull/42
- @thedch made their first contribution in https://github.com/CarperAI/trlx/pull/44
Full Changelog: https://github.com/CarperAI/trlx/commits/v0.2
- Python
Published by LouisCastricato over 3 years ago