Recent Releases of torchtune

v0.6.1

Patch for #2561.

Full Changelog: https://github.com/pytorch/torchtune/compare/v0.6.0...v0.6.1

- Python
Published by joecummings about 1 year ago

Highlights

We are releasing torchtune v0.6.0 with exciting new features and improved distributed training support! This release includes TensorParallel (TP) + FSDP training, TP inference, multinode training, and a full DPO distributed recipe. We also landed Phi4, logging with MLFlow, and improved support for NPUs.

Tensor Parallel training + inference (#2245) (#2330)

Tensor parallelism is a model parallelism technique for distributed training. When combined with FSDP, TP allows more efficient training of large models across many GPUs versus FSDP alone. While FSDP splits your dataset across different GPUs, TP splits each model layer across GPUs, allowing model layers to be computed much faster at larger scales. In addition to training, we've also enabled TP inference, which is crucial for generating text or doing reinforcement learning when your model doesn't fit on a single GPU. To learn more on how to define a TP model, take a look here.

Multinode training support (#2301)

Multinode finetuning is now supported, allowing you to train larger models faster. Using SLURM you can launch tune run across multiple nodes and train just as you would now on a single machine. We include one example slurm recipe and a tutorial for getting started here.

Full Distributed DPO recipe (#2275)

We've had DPO support for some time but we've now added the ability to train DPO using all of the distributed goodies that we've had and those listed above. This improves our coverage of recipes that you can use on the increasing number of 70B+ models. To finetune Llama 3.1 8B with Full Distributed DPO, you can run:

```

Download Llama 3.1 8B

tune download meta-llama/Meta-Llama-3.1-8B-Instruct --ignore-patterns "original/consolidated.00.pth"

Finetune on four devices

tune run --nnodes 1 --nprocpernode 4 fulldpodistributed --config llama31/8Bfull_dpo

```

A special thanks to @sam-pi for adding this recipe.

Phi 4 models (#1835)

We now support Phi 4! This includes the 14B model for now, with recipes for full, LoRA, and QLoRA finetuning on one or more devices. For example, you can full finetune Phi 4 14B on a single GPU by running:

```

Download Phi 4 14B

tune download microsoft/phi-4

pip install bits and bytes

pip install bitsandbytes

Finetune on a single GPU

tune run fullfinetunesingledevice --config phi4/14Bfulllowmemory ```

A huge thanks to @krammnic for landing these models!

Improved NPU support (#2234)

We are continuing to improve our support for Ascend NPU devices. This release includes fixes and enhancements to give you better performance with the NPU backend. Thank you to @Nicorgi for the help!

What's Changed

Small readme, config updates by @ebsmothers in https://github.com/pytorch/torchtune/pull/2157
Using FormattedCheckpointFiles in configs by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2147
Move get_world_size_and_rank to utils by @joecummings in https://github.com/pytorch/torchtune/pull/2155
Faster intermediate checkpoints with DCP async save in TorchTune by @saumishr in https://github.com/pytorch/torchtune/pull/2006
torchdata integration - multi-dataset and streaming support by @andrewkho in https://github.com/pytorch/torchtune/pull/1929
Allow higher version of lm-eval by @joecummings in https://github.com/pytorch/torchtune/pull/2165
Using FormattedCheckpointFiles in configs... round 2 by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2167
[EZ] Fix settorchnum_threads in multi-node. by @EugenHotaj in https://github.com/pytorch/torchtune/pull/2164
Fix adapter_config.json saving in DPO recipes by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2162
Fix excessive QAT warning by @andrewor14 in https://github.com/pytorch/torchtune/pull/2174
Add output dir to top of all configs by @ebsmothers in https://github.com/pytorch/torchtune/pull/2183
change saving logic by @felipemello1 in https://github.com/pytorch/torchtune/pull/2182
output_dir not in ckpt dir by @felipemello1 in https://github.com/pytorch/torchtune/pull/2181
Set teacher ckptr output_dir to match student in KD configs by @ebsmothers in https://github.com/pytorch/torchtune/pull/2185
raise compile error by @felipemello1 in https://github.com/pytorch/torchtune/pull/2188
Update DPO Max Seq Len by @pbontrager in https://github.com/pytorch/torchtune/pull/2176
Llama3.2 3B eval by @ReemaAlzaid in https://github.com/pytorch/torchtune/pull/2186
Update typo in docstring for generation.getcausalmaskfrom_padding… by @psoulos in https://github.com/pytorch/torchtune/pull/2187
new docs for checkpointing by @felipemello1 in https://github.com/pytorch/torchtune/pull/2189
Update E2E Tutorial w/ vLLM and HF Hub by @joecummings in https://github.com/pytorch/torchtune/pull/2192
pytorch/torchtune/tests/torchtune/modules/_export by @gmagogsfm in https://github.com/pytorch/torchtune/pull/2179
update torchtune version by @felipemello1 in https://github.com/pytorch/torchtune/pull/2195
[metric_logging][wandb] Fix wandb metric logger config save path by @akashc1 in https://github.com/pytorch/torchtune/pull/2196
Add evaluation file for code_llama2 model by @ReemaAlzaid in https://github.com/pytorch/torchtune/pull/2209
Adds message_transform link from SFTDataset docstring to docs by @thomasjpfan in https://github.com/pytorch/torchtune/pull/2219
Change alpacadataset trainon_input doc to match default value by @mirceamironenco in https://github.com/pytorch/torchtune/pull/2227
Set default value for 'subset' parameter in thecauldrondataset by @Ankur-singh in https://github.com/pytorch/torchtune/pull/2228
Add eval config for QWEN2_5 model using 0.5B variant by @Ankur-singh in https://github.com/pytorch/torchtune/pull/2230
T5 Encoder by @calvinpelletier in https://github.com/pytorch/torchtune/pull/2069
Migrate distributed state dict API by @mori360 in https://github.com/pytorch/torchtune/pull/2138
Flux Autoencoder by @calvinpelletier in https://github.com/pytorch/torchtune/pull/2098
Fix gradient scaling to account for world_size normalization by @mirceamironenco in https://github.com/pytorch/torchtune/pull/2172
[Small fix] Update CUDA version in README by @acisseJZhong in https://github.com/pytorch/torchtune/pull/2242
Adds clipgradnorm to all recipe config that supports it by @thomasjpfan in https://github.com/pytorch/torchtune/pull/2220
llama 3.1 has correct max_seq_len for all versions by @akashc1 in https://github.com/pytorch/torchtune/pull/2203
Log grad norm aggregated over all ranks, not just rank zero by @ebsmothers in https://github.com/pytorch/torchtune/pull/2248
Remove example inputs from aoticompileand_package by @angelayi in https://github.com/pytorch/torchtune/pull/2244
Fix issue #2243, update the document to show correct usage by @insop in https://github.com/pytorch/torchtune/pull/2252
[EZ] Fix config bug where interpolation happens too early by @EugenHotaj in https://github.com/pytorch/torchtune/pull/2236
Small formatting fix by @krammnic in https://github.com/pytorch/torchtune/pull/2256
Multi-tile support in vision rope by @RdoubleA in https://github.com/pytorch/torchtune/pull/2247
Add AlpacaToMessages to message transforms doc page by @AndrewMead10 in https://github.com/pytorch/torchtune/pull/2265
Add a "division by zero" check in chunked loss handling in kd_losses.py by @insop in https://github.com/pytorch/torchtune/pull/2239
Fixing docstring linter by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2163
PPO Performance Improvements by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2066
Add Ascend NPU as a backend for single device recipes by @Nicorgi in https://github.com/pytorch/torchtune/pull/2234
Fix tests due to upgrade to cuda126 by @acisseJZhong in https://github.com/pytorch/torchtune/pull/2260
Fix a bug in set float32 precision by @Nicorgi in https://github.com/pytorch/torchtune/pull/2271
Construct EarlyFusion's encodertokenids on correct device by @ebsmothers in https://github.com/pytorch/torchtune/pull/2276
Sample packing for ConcatDataset by @ebsmothers in https://github.com/pytorch/torchtune/pull/2278
Added Distributed(Tensor Parallel) Inference Recipe by @acisseJZhong in https://github.com/pytorch/torchtune/pull/2245
Logging resolved config by @Ankur-singh in https://github.com/pytorch/torchtune/pull/2274
Removing SimPOLoss by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2290
Proper prefix handling in EarlyFusion sd hooks by @ebsmothers in https://github.com/pytorch/torchtune/pull/2291
Remove deprecated components for 0.6.0 by @RdoubleA in https://github.com/pytorch/torchtune/pull/2293
Update the e2e flow tutorial to fix errors of generate by @iseeyuan in https://github.com/pytorch/torchtune/pull/2251
profiling ops on xpu by @songhappy in https://github.com/pytorch/torchtune/pull/2249
Refactored modules/tokenizers to be a subdir of modules/transforms by @Ankur-singh in https://github.com/pytorch/torchtune/pull/2231
Update model builders by @Ankur-singh in https://github.com/pytorch/torchtune/pull/2282
[EZ] Only log deprecation warning on rank zero by @RdoubleA in https://github.com/pytorch/torchtune/pull/2308
[ez] Add output_dir field to a couple configs by @ebsmothers in https://github.com/pytorch/torchtune/pull/2309
Disable DSD and fix bitsandbytes test by @RdoubleA in https://github.com/pytorch/torchtune/pull/2314
fix state dict hook for early fusion models by @acisseJZhong in https://github.com/pytorch/torchtune/pull/2317
Adding reverse and symmetric KLD losses by @insop in https://github.com/pytorch/torchtune/pull/2094
[WIP] 'tune cat' command for pretty printing configuration files by @Ankur-singh in https://github.com/pytorch/torchtune/pull/2298
Use checkout@v4 / upload@v4 for docs build by @joecummings in https://github.com/pytorch/torchtune/pull/2322
Fix stop tokens in PPO by @RedTachyon in https://github.com/pytorch/torchtune/pull/2304
Update PT pin for modules/_export by @Jack-Khuu in https://github.com/pytorch/torchtune/pull/2336
Update to proper EOS ids for Qwen2 and Qwen2.5 by @joecummings in https://github.com/pytorch/torchtune/pull/2342
Multinode support in torchtune by @joecummings in https://github.com/pytorch/torchtune/pull/2301
[Bug Fix]Disable DSD for saving ckpt by @acisseJZhong in https://github.com/pytorch/torchtune/pull/2346
Update README for multinode by @joecummings in https://github.com/pytorch/torchtune/pull/2348
added tie_word_embeddings to llama3_2 models by @jingzhaoou in https://github.com/pytorch/torchtune/pull/2331
Fix saving adapter weights after disabling DSD by @acisseJZhong in https://github.com/pytorch/torchtune/pull/2351
Remove "ft-" prefix from checkpoint shards. by @EugenHotaj in https://github.com/pytorch/torchtune/pull/2354
Full DPO Distributed by @sam-pi in https://github.com/pytorch/torchtune/pull/2275
[Fix Test] Fix failed generation test by pining pytorch nightlies by @acisseJZhong in https://github.com/pytorch/torchtune/pull/2362
TP + FSDP distributed training (full finetuning) by @acisseJZhong in https://github.com/pytorch/torchtune/pull/2330
Add max-autotune try/except if flex attn breaks by @felipemello1 in https://github.com/pytorch/torchtune/pull/2357
readme updates for full DPO distributed recipe by @ebsmothers in https://github.com/pytorch/torchtune/pull/2363
Fix Qwen config by @acisseJZhong in https://github.com/pytorch/torchtune/pull/2377
feat: Added cfg.cudnndeterministicmode flag by @bogdansalyp in https://github.com/pytorch/torchtune/pull/2367
Add Phi4 by @krammnic in https://github.com/pytorch/torchtune/pull/2197
Add tests and implementation for disabling dropout layers in models by @Ankur-singh in https://github.com/pytorch/torchtune/pull/2378
nit: Phi4 to readme by @krammnic in https://github.com/pytorch/torchtune/pull/2383
Implements MLFlowLogger by @nathan-az in https://github.com/pytorch/torchtune/pull/2365
'ft-' prefix occurrence removal by @rajuptvs in https://github.com/pytorch/torchtune/pull/2385
check if log_dir is not none by @felipemello1 in https://github.com/pytorch/torchtune/pull/2389
HF tokenizers: initial base tokenizer support by @ebsmothers in https://github.com/pytorch/torchtune/pull/2350
Simplify README and prominently display recipes by @joecummings in https://github.com/pytorch/torchtune/pull/2349
Renamed parallelizeplan to tensorparallel_plan by @pbontrager in https://github.com/pytorch/torchtune/pull/2387
Fix optimizerinbackward at loading optstatedict in distributed recipes by @mori360 in https://github.com/pytorch/torchtune/pull/2390
Add core dependency on stable torchdata (#2408) by @pbontrager in https://github.com/pytorch/torchtune/pull/2509

New Contributors

@saumishr made their first contribution in https://github.com/pytorch/torchtune/pull/2006
@andrewkho made their first contribution in https://github.com/pytorch/torchtune/pull/1929
@EugenHotaj made their first contribution in https://github.com/pytorch/torchtune/pull/2164
@ReemaAlzaid made their first contribution in https://github.com/pytorch/torchtune/pull/2186
@psoulos made their first contribution in https://github.com/pytorch/torchtune/pull/2187
@gmagogsfm made their first contribution in https://github.com/pytorch/torchtune/pull/2179
@akashc1 made their first contribution in https://github.com/pytorch/torchtune/pull/2196
@acisseJZhong made their first contribution in https://github.com/pytorch/torchtune/pull/2242
@angelayi made their first contribution in https://github.com/pytorch/torchtune/pull/2244
@insop made their first contribution in https://github.com/pytorch/torchtune/pull/2252
@AndrewMead10 made their first contribution in https://github.com/pytorch/torchtune/pull/2265
@Nicorgi made their first contribution in https://github.com/pytorch/torchtune/pull/2234
@jingzhaoou made their first contribution in https://github.com/pytorch/torchtune/pull/2331
@sam-pi made their first contribution in https://github.com/pytorch/torchtune/pull/2275
@bogdansalyp made their first contribution in https://github.com/pytorch/torchtune/pull/2367
@nathan-az made their first contribution in https://github.com/pytorch/torchtune/pull/2365
@rajuptvs made their first contribution in https://github.com/pytorch/torchtune/pull/2385

Full Changelog: https://github.com/pytorch/torchtune/compare/v0.5.0...v0.6.0

- Python
Published by pbontrager about 1 year ago

torchtune - v0.5.0

Highlights

We are releasing torchtune v0.5.0 with lots of exciting new features! This includes Kaggle integration, a QAT + LoRA training recipe, improved integrations with Hugging Face and vLLM, Gemma2 models, a recipe enabling finetuning for LayerSkip via early exit, and support for NPU devices.

Kaggle integration (#2002)

torchtune is proud to announce our integration with Kaggle! You can now finetune all your favorite models using torchtune in Kaggle notebooks with Kaggle model hub integration. Download a model from the Kaggle Hub, finetune on your dataset with any torchtune recipe, then pick your best model and upload your best checkpoint to the Kaggle Hub to share with the community. Check out our example Kaggle notebook here to get started!

QAT + LoRA training recipe (#1931)

If you've seen the Llama 3.2 quantized models, you may know that they were trained using quantization-aware training with LoRA adapters. This is an effective way to maintain good model performance when you need to quantize for on-device inference. Now you can train your own quant-friendly LoRA models in torchtune with our QAT + LoRA recipe!

To finetune Llama 3.2 3B with QAT + LoRA, you can run:

```

Download Llama 3.2 3B

tune download meta-llama/Llama-3.2-3B-Instruct --ignore-patterns "original/consolidated.00.pth"

Finetune on two devices

tune run --nprocpernode 2 qatlorafinetunedistributed --config llama32/3Bqatlora

```

Improved Hugging Face and vLLM integration (#2074)

We heard your feedback, and we're happy to say that it's now easier than ever to load your torchtune models into Hugging Face or vLLM! It's as simple as:

``` from transformers import AutoModelForCausalLM

trainedmodelpath = "/path/to/my/torchtune/checkpoint"

model = AutoModelForCausalLM.frompretrained( pretrainedmodelnameorpath=trainedmodel_path, ) ```

See the full examples in our docs! Hugging Face, vLLM

Gemma 2 models (#1835)

We now support models from the Gemma 2 family! This includes the 2B, 9B, and 27B sizes, with recipes for full, LoRA, and QLoRA finetuning on one or more devices. For example, you can finetune Gemma 2 27B with QLoRA by running:

```

Download Gemma 2 27B

tune download google/gemma-2-27b --ignore-patterns "gemma-2-27b.gguf"

Finetune on a single GPU

tune run lorafinetunesingledevice --config gemma2/27Bqlorasingledevice ```

A huge thanks to @Optimox for landing these models!

Early exit training recipe (#1076)

LayerSkip is an end-to-end solution to speed up LLM inference. By combining layer dropout with an appropriate dropout schedule and using an early exit loss during training, you can increase the accuracy of early exit at inference time. You can use our early exit config to reproduce experiments from LayerSkip, LayerDrop, and other papers.

You can try torchtune's early exit recipe by running the following:

```

Download Llama2 7B

tune download meta-llama/Llama-2-7b-hf --output-dir /tmp/Llama-2-7b-hf

Finetune with early exit on four devices

tune run --nnodes 1 --nprocpernode 4 dev/earlyexitfinetunedistributed --config recipes/dev/7Bfullearlyexit.yaml

```

NPU support (#1826)

We are excited to share that torchtune can now be used on Ascend NPU devices! All your favorite single-device recipes can be run as-is, with support for distributed recipes coming later. A huge thanks to @noemotiovon for their work to enable this!

What's Changed

nit: Correct compile_loss return type hint by @bradhilton in https://github.com/pytorch/torchtune/pull/1940
Fix grad accum + FSDP CPU offload, pass None via CLI by @ebsmothers in https://github.com/pytorch/torchtune/pull/1941
QAT tutorial nit by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1945
A more encompassing fix for offloading + ac by @janeyx99 in https://github.com/pytorch/torchtune/pull/1936
Add Qwen2.5 to live docs by @RdoubleA in https://github.com/pytorch/torchtune/pull/1949
[Bug] model_type argument as str for checkpoints classes by @smujjiga in https://github.com/pytorch/torchtune/pull/1946
llama3.2 90b config updates + nits by @RdoubleA in https://github.com/pytorch/torchtune/pull/1950
Add Ascend NPU as a backend by @noemotiovon in https://github.com/pytorch/torchtune/pull/1826
fix missing key by @felipemello1 in https://github.com/pytorch/torchtune/pull/1952
update memory optimization tutorial by @felipemello1 in https://github.com/pytorch/torchtune/pull/1948
update configs by @felipemello1 in https://github.com/pytorch/torchtune/pull/1954
add expandable segment to integration tests by @felipemello1 in https://github.com/pytorch/torchtune/pull/1963
Fix check in load_from_full_state_dict for modified state dicts by @RylanC24 in https://github.com/pytorch/torchtune/pull/1967
Update torchtune generation to be more flexible by @RylanC24 in https://github.com/pytorch/torchtune/pull/1970
feat: add gemma2b variants by @Optimox in https://github.com/pytorch/torchtune/pull/1835
typo by @felipemello1 in https://github.com/pytorch/torchtune/pull/1972
Update QAT: add grad clipping, torch.compile, collate fn by @andrewor14 in https://github.com/pytorch/torchtune/pull/1854
VQA Documentation by @calvinpelletier in https://github.com/pytorch/torchtune/pull/1974
Convert all non-rgb images to rgb by @vancoykendall in https://github.com/pytorch/torchtune/pull/1976
Early fusion multimodal models by @RdoubleA in https://github.com/pytorch/torchtune/pull/1904
Refactor Recipe State Dict Code by @pbontrager in https://github.com/pytorch/torchtune/pull/1964
Update KV Cache to use numkvheads instead of num_heads by @mirceamironenco in https://github.com/pytorch/torchtune/pull/1961
Migrate to epochs: 1 in all configs by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1981
Make sure CLIP resized pos_embed is contiguous by @gau-nernst in https://github.com/pytorch/torchtune/pull/1986
Add **quantization_kwargs to FrozenNF4Linear and LoRALinear and DoRALinear by @joecummings in https://github.com/pytorch/torchtune/pull/1987
Enables Python 3.13 for nightly builds by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1988
DOC Fixes custom message transform example by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1983
Pass quantization_kwargs to CLIP builders by @joecummings in https://github.com/pytorch/torchtune/pull/1994
Adding MM eval tests / attention bugfixes by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1989
Update Qwen2.5 configs by @joecummings in https://github.com/pytorch/torchtune/pull/1999
nit: Fix/add some type annotations by @bradhilton in https://github.com/pytorch/torchtune/pull/1982
Fixing special_tokens arg in Llama3VisionTransform by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2000
Recent updates to the README by @joecummings in https://github.com/pytorch/torchtune/pull/1979
Bump version to 0.5.0 by @joecummings in https://github.com/pytorch/torchtune/pull/2009
gemma2 had wrong path to scheduler by @felipemello1 in https://github.com/pytorch/torchtune/pull/2013
Create _export directory in torchtune by @Jack-Khuu in https://github.com/pytorch/torchtune/pull/2011
torchrun defaults for concurrent distributed training jobs by @ebsmothers in https://github.com/pytorch/torchtune/pull/2015
Remove unused FSDP components by @ebsmothers in https://github.com/pytorch/torchtune/pull/2016
2D RoPE + CLIP updates by @RdoubleA in https://github.com/pytorch/torchtune/pull/1973
Some KD recipe cleanup by @ebsmothers in https://github.com/pytorch/torchtune/pull/2020
Remove lrscheduler requirement in loradposingledevice by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1991
chore: remove PyTorch 2.5.0 checks by @JP-sDEV in https://github.com/pytorch/torchtune/pull/1877
Make tokenize tests readable by @krammnic in https://github.com/pytorch/torchtune/pull/1868
add flags to readme by @felipemello1 in https://github.com/pytorch/torchtune/pull/2003
Support for unsharded parameters in state_dict APIs by @RdoubleA in https://github.com/pytorch/torchtune/pull/2023
[WIP] Reducing eval vision tests runtime by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2022
log rank zero everywhere by @RdoubleA in https://github.com/pytorch/torchtune/pull/2030
Add LR Scheduler to full finetune distributed by @parthsarthi03 in https://github.com/pytorch/torchtune/pull/2017
Fix Qlora/lora for 3.2 vision by @felipemello1 in https://github.com/pytorch/torchtune/pull/2028
CLIP Text Encoder by @calvinpelletier in https://github.com/pytorch/torchtune/pull/1969
feat(cli): allow users to download models from Kaggle by @KeijiBranshi in https://github.com/pytorch/torchtune/pull/2002
remove default to ignore safetensors by @felipemello1 in https://github.com/pytorch/torchtune/pull/2042
Remove deprecated TiedEmbeddingTransformerDecoder by @EmilyIsCoding in https://github.com/pytorch/torchtune/pull/2047
Use hf transfer as default by @felipemello1 in https://github.com/pytorch/torchtune/pull/2046
Fix issue in loading mixed precision vocab pruned models during torchtune generation for evaluation by @ifed-ucsd in https://github.com/pytorch/torchtune/pull/2043
[export] Add exportable attention and kv cache by @larryliu0820 in https://github.com/pytorch/torchtune/pull/2049
Switch to PyTorch's built-in RMSNorm by @calvinpelletier in https://github.com/pytorch/torchtune/pull/2054
[export] Add exportable position embedding by @larryliu0820 in https://github.com/pytorch/torchtune/pull/2068
MM Docs nits by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2067
Add support for QAT + LoRA by @andrewor14 in https://github.com/pytorch/torchtune/pull/1931
Add ability to shard custom layers for DPO and LoRA distributed by @joecummings in https://github.com/pytorch/torchtune/pull/2072
[ez] remove stale pytorch version check by @ebsmothers in https://github.com/pytorch/torchtune/pull/2075
Fail early with packed=True on MM datasets. by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2080
Error message on packed=True for stack exchange dataset by @joecummings in https://github.com/pytorch/torchtune/pull/2079
Fix nightly tests for qatlorafintune_distributed by @andrewor14 in https://github.com/pytorch/torchtune/pull/2085
Update buildlinuxwheels.yaml - Pass test-infra input params by @atalman in https://github.com/pytorch/torchtune/pull/2086
DPO Activation Offloading by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2087
Deprecate SimpoLoss by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2063
DPO Recipe Doc by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2091
initial commit by @songhappy in https://github.com/pytorch/torchtune/pull/1953
Vector Quantized Embeddings by @RdoubleA in https://github.com/pytorch/torchtune/pull/2040
Fix bug in loading multimodal datasets and update tests accordingly by @joecummings in https://github.com/pytorch/torchtune/pull/2110
Set gloo process group for FSDP with CPU offload by @ebsmothers in https://github.com/pytorch/torchtune/pull/2108
Llama 3.3 70B by @pbontrager in https://github.com/pytorch/torchtune/pull/2124
Llama 3.3 readme updates by @ebsmothers in https://github.com/pytorch/torchtune/pull/2125
update configs by @felipemello1 in https://github.com/pytorch/torchtune/pull/2107
Reduce logging output for distributed KD by @joecummings in https://github.com/pytorch/torchtune/pull/2120
Support Early Exit Loss and/or Layer Dropout by @mostafaelhoushi in https://github.com/pytorch/torchtune/pull/1076
Update checkpointing directory -> using vLLM and from_pretrained by @felipemello1 in https://github.com/pytorch/torchtune/pull/2074
pass correct arg by @felipemello1 in https://github.com/pytorch/torchtune/pull/2127
update configs by @felipemello1 in https://github.com/pytorch/torchtune/pull/2128
fix qatloratest by @felipemello1 in https://github.com/pytorch/torchtune/pull/2131
guard ckpt imports by @felipemello1 in https://github.com/pytorch/torchtune/pull/2133
[bug fix] add parents=True by @felipemello1 in https://github.com/pytorch/torchtune/pull/2136
[bug fix] re-add model by @felipemello1 in https://github.com/pytorch/torchtune/pull/2135
Update save sizes into GiB by @joecummings in https://github.com/pytorch/torchtune/pull/2143
[bug fix] remove config download when source is kaggle by @felipemello1 in https://github.com/pytorch/torchtune/pull/2144
[fix] remove "with_suffix" by @felipemello1 in https://github.com/pytorch/torchtune/pull/2146
DoRA fixes by @ebsmothers in https://github.com/pytorch/torchtune/pull/2139
[Fix] Llama 3.2 Vision decoder_trainable flag fixed by @pbontrager in https://github.com/pytorch/torchtune/pull/2150

New Contributors

@bradhilton made their first contribution in https://github.com/pytorch/torchtune/pull/1940
@smujjiga made their first contribution in https://github.com/pytorch/torchtune/pull/1946
@noemotiovon made their first contribution in https://github.com/pytorch/torchtune/pull/1826
@RylanC24 made their first contribution in https://github.com/pytorch/torchtune/pull/1967
@vancoykendall made their first contribution in https://github.com/pytorch/torchtune/pull/1976
@Jack-Khuu made their first contribution in https://github.com/pytorch/torchtune/pull/2011
@JP-sDEV made their first contribution in https://github.com/pytorch/torchtune/pull/1877
@KeijiBranshi made their first contribution in https://github.com/pytorch/torchtune/pull/2002
@EmilyIsCoding made their first contribution in https://github.com/pytorch/torchtune/pull/2047
@ifed-ucsd made their first contribution in https://github.com/pytorch/torchtune/pull/2043
@larryliu0820 made their first contribution in https://github.com/pytorch/torchtune/pull/2049
@atalman made their first contribution in https://github.com/pytorch/torchtune/pull/2086
@songhappy made their first contribution in https://github.com/pytorch/torchtune/pull/1953
@mostafaelhoushi made their first contribution in https://github.com/pytorch/torchtune/pull/1076

Full Changelog: https://github.com/pytorch/torchtune/compare/v0.4.0...v0.5.0

- Python
Published by ebsmothers over 1 year ago

torchtune - v0.4.0

Highlights

Today we release v0.4.0 of torchtune with some exciting new additions! Some notable ones include full support for activation offloading, recipes for Llama3.2V 90B and QLoRA variants, new documentation, and Qwen2.5 models!

Activation offloading (#1443, #1645, #1847)

Activation offloading is a memory-saving technique that asynchronously moves checkpointed activations that are not currently running to the CPU. Right before the GPU needs the activations for the microbatch’s backward pass, this functionality prefetches the offloaded activations back from the CPU. Enabling this functionality is as easy as setting the following options in your config:

yaml enable_activation_checkpointing: True enable_activation_offloading: True

In experiments with Llama3 8B, activation offloading used roughly 24% less memory while inflicting a performance slowdown of under 1%.

Llama3.2V 90B with QLoRA (#1880, #1726)

We added model builders and configs for the 90B version of Llama3.2V, which outperforms the 11B version of the model across common benchmarks. Because this model size is larger, we also added the ability to run the model using QLoRA and FSDP2.

```bash

Download the model first

tune download meta-llama/Llama-3.2-90B-Vision-Instruct --ignore-patterns "original/consolidated*"

Run with e.g. 4 GPUs

tune run --nprocpernode 4 lorafinetunedistributed --config llama32vision/90B_qlora ```

Qwen2.5 model family has landed (#1863)

We added builders for Qwen2.5, the cutting-edge models from the Qwen family of models! In their own words "Compared to Qwen2, Qwen2.5 has acquired significantly more knowledge (MMLU: 85+) and has greatly improved capabilities in coding (HumanEval 85+) and mathematics (MATH 80+)."

Get started with the models easily:

bash tune download Qwen/Qwen2.5-1.5B-Instruct --ignore-patterns None tune run lora_finetune_single_device --config qwen2_5/1.5B_lora_single_device

New documentation on using custom recipes, configs, and components (#1910)

We heard your feedback and wrote up a simple page on how to customize configs, recipes, and individual components! Check it out here

What's Changed

Fix PackedDataset bug for seqlen > 2 * maxseq_len setting. by @mirceamironenco in https://github.com/pytorch/torchtune/pull/1697
Bump version 0.3.1 by @joecummings in https://github.com/pytorch/torchtune/pull/1720
Add error propagation to distributed run. by @mirceamironenco in https://github.com/pytorch/torchtune/pull/1719
Update fusion layer counting logic for Llama 3.2 weight conversion by @ebsmothers in https://github.com/pytorch/torchtune/pull/1722
Resizable image positional embeddings by @felipemello1 in https://github.com/pytorch/torchtune/pull/1695
Unpin numpy by @ringohoffman in https://github.com/pytorch/torchtune/pull/1728
Add HF Checkpoint Format Support for Llama Vision by @pbontrager in https://github.com/pytorch/torchtune/pull/1727
config changes by @felipemello1 in https://github.com/pytorch/torchtune/pull/1733
Fix custom imports for both distributed and single device by @RdoubleA in https://github.com/pytorch/torchtune/pull/1731
Pin urllib3<2.0.0 to fix eleuther eval errors by @RdoubleA in https://github.com/pytorch/torchtune/pull/1738
Fixing recompiles in KV-cache + compile by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1663
Fix CLIP pos embedding interpolation to work on DTensors by @ebsmothers in https://github.com/pytorch/torchtune/pull/1739
Bump version to 0.4.0 by @RdoubleA in https://github.com/pytorch/torchtune/pull/1748
[Feat] Activation offloading for distributed lora recipe by @Jackmin801 in https://github.com/pytorch/torchtune/pull/1645
Add LR Scheduler to single device full finetune by @user074 in https://github.com/pytorch/torchtune/pull/1350
Custom recipes use slash path by @RdoubleA in https://github.com/pytorch/torchtune/pull/1760
Adds repr to Message by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1757
Fix save adapter weights only by @ebsmothers in https://github.com/pytorch/torchtune/pull/1764
Set drop_last to always True by @RdoubleA in https://github.com/pytorch/torchtune/pull/1761
Remove nonexistent flag for acc offloading in memory_optimizations.rst by @janeyx99 in https://github.com/pytorch/torchtune/pull/1772
[BUGFIX] Adding sequence truncation to max_seq_length in eval recipe by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1773
Add ROCm "support" by @joecummings in https://github.com/pytorch/torchtune/pull/1765
[BUG] Include system prompt in Phi3 by default by @joecummings in https://github.com/pytorch/torchtune/pull/1778
Fixing quantization in eval recipe by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1777
Delete deprecated ChatDataset and InstructDataset by @joecummings in https://github.com/pytorch/torchtune/pull/1781
Add split argument to required builders and set it default value to "train" by @krammnic in https://github.com/pytorch/torchtune/pull/1783
Fix quantization with generate by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1784
Fix typo in multimodal_datasets.rst by @krammnic in https://github.com/pytorch/torchtune/pull/1787
Make AlpacaToMessage public. by @krammnic in https://github.com/pytorch/torchtune/pull/1785
Fix misleading attn_dropout docstring by @ebsmothers in https://github.com/pytorch/torchtune/pull/1792
Add filter_fn to all generic dataset classes and builders API by @krammnic in https://github.com/pytorch/torchtune/pull/1789
Set dropout in SDPA to 0.0 when not in training mode by @ebsmothers in https://github.com/pytorch/torchtune/pull/1803
Skip entire header for llama3 decode by @RdoubleA in https://github.com/pytorch/torchtune/pull/1656
Remove unused bsz variable by @zhangtemplar in https://github.com/pytorch/torchtune/pull/1805
Adding max_seq_length to vision eval config by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1802
Add check that there is no PackedDataset while building ConcatDataset by @krammnic in https://github.com/pytorch/torchtune/pull/1796
Add posibility to pack in _wikitext.py by @krammnic in https://github.com/pytorch/torchtune/pull/1807
Add evaluation configs under qwen2 dir by @joecummings in https://github.com/pytorch/torchtune/pull/1809
Fix eos_token problem in all required models by @krammnic in https://github.com/pytorch/torchtune/pull/1806
Deprecating TiedEmbeddingTransformerDecoder by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1815
Torchao version check changes/BC import of TensorCoreTiledLayout by @ebsmothers in https://github.com/pytorch/torchtune/pull/1812
1810 move gemma evaluation by @malinjawi in https://github.com/pytorch/torchtune/pull/1819
Consistent type checks for prepend and append tags. by @krammnic in https://github.com/pytorch/torchtune/pull/1824
Move schedulers to training from modules. by @krammnic in https://github.com/pytorch/torchtune/pull/1801
Update EleutherAI Eval Harness to v0.4.5 by @joecummings in https://github.com/pytorch/torchtune/pull/1800
1810 Add evaluation configs under phi3 dir by @Harthi7 in https://github.com/pytorch/torchtune/pull/1822
Create CITATION.cff by @joecummings in https://github.com/pytorch/torchtune/pull/1756
fixed error message for GatedRepoError by @DawiAlotaibi in https://github.com/pytorch/torchtune/pull/1832
1810 Move mistral evaluation by @Yousof-kayal in https://github.com/pytorch/torchtune/pull/1829
More consistent trace names. by @krammnic in https://github.com/pytorch/torchtune/pull/1825
fbcode using TensorCoreLayout by @jerryzh168 in https://github.com/pytorch/torchtune/pull/1834
Remove padmaxtiles in CLIP by @pbontrager in https://github.com/pytorch/torchtune/pull/1836
Remove padmaxtiles in CLIP inference by @lucylq in https://github.com/pytorch/torchtune/pull/1853
Add vqa_dataset, update docs by @krammnic in https://github.com/pytorch/torchtune/pull/1820
Add offloading tests and fix obscure edge case by @janeyx99 in https://github.com/pytorch/torchtune/pull/1860
Toggling KV-caches by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1763
Cacheing doc nits by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1876
LoRA typo fix + bias=True by @felipemello1 in https://github.com/pytorch/torchtune/pull/1881
Correct torchao check for TensorCoreTiledLayout by @joecummings in https://github.com/pytorch/torchtune/pull/1886
Kd_loss avg over tokens by @moussaKam in https://github.com/pytorch/torchtune/pull/1885
Support Optimizer-in-the-backward by @mori360 in https://github.com/pytorch/torchtune/pull/1833
Remove deprecated GemmaTransformerDecoder by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1892
Add PromptTemplate examples by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1891
Temporarily disable building Python 3.13 version of torchtune by @joecummings in https://github.com/pytorch/torchtune/pull/1896
Block on Python 3.13 version by @joecummings in https://github.com/pytorch/torchtune/pull/1898
[bug] fix sharding multimodal by @felipemello1 in https://github.com/pytorch/torchtune/pull/1889
QLoRA with bias + Llama 3.2 Vision QLoRA configs by @ebsmothers in https://github.com/pytorch/torchtune/pull/1726
Block on Python 3.13 version by @joecummings in https://github.com/pytorch/torchtune/pull/1899
Normalize CE loss by total number of (non-padding) tokens by @ebsmothers in https://github.com/pytorch/torchtune/pull/1875
nit: remove (nightly) in recipes by @krammnic in https://github.com/pytorch/torchtune/pull/1882
Expose packed: False, set logpeakmemory_stats: True, set compile: False by @krammnic in https://github.com/pytorch/torchtune/pull/1872
Remove ChatFormat, InstructTemplate, old message converters by @RdoubleA in https://github.com/pytorch/torchtune/pull/1895
Make TensorCoreTiledLayout import more robust by @andrewor14 in https://github.com/pytorch/torchtune/pull/1912
[ez] Fix README download example by @RdoubleA in https://github.com/pytorch/torchtune/pull/1915
[docs] Custom components page by @RdoubleA in https://github.com/pytorch/torchtune/pull/1910
Update imports after QAT was moved out of prototype by @andrewor14 in https://github.com/pytorch/torchtune/pull/1883
Updating memory optimization overview by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1916
Patch github link in torchtune docs header by @ebsmothers in https://github.com/pytorch/torchtune/pull/1914
Llama 3.2 Vision - 90B by @felipemello1 in https://github.com/pytorch/torchtune/pull/1880
Fixing DoRA docs, adding to mem opt tutorial by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1918
Add KD distributed recipe by @lindawangg in https://github.com/pytorch/torchtune/pull/1631
add missing doc by @felipemello1 in https://github.com/pytorch/torchtune/pull/1924
[FIX] MM Eval Mask Sizes by @pbontrager in https://github.com/pytorch/torchtune/pull/1920
Activation offloading for fullfinetuning + fix tied embedding by @felipemello1 in https://github.com/pytorch/torchtune/pull/1847
Qwen2.5 by @calvinpelletier in https://github.com/pytorch/torchtune/pull/1863
Restore backward after each batch for grad accum by @ebsmothers in https://github.com/pytorch/torchtune/pull/1917
Fix lora single device fine tune checkpoint saving & nan loss when use_dora=True by @mirceamironenco in https://github.com/pytorch/torchtune/pull/1909

New Contributors

@ringohoffman made their first contribution in https://github.com/pytorch/torchtune/pull/1728
@Jackmin801 made their first contribution in https://github.com/pytorch/torchtune/pull/1645
@user074 made their first contribution in https://github.com/pytorch/torchtune/pull/1350
@krammnic made their first contribution in https://github.com/pytorch/torchtune/pull/1783
@zhangtemplar made their first contribution in https://github.com/pytorch/torchtune/pull/1805
@malinjawi made their first contribution in https://github.com/pytorch/torchtune/pull/1819
@Harthi7 made their first contribution in https://github.com/pytorch/torchtune/pull/1822
@DawiAlotaibi made their first contribution in https://github.com/pytorch/torchtune/pull/1832
@Yousof-kayal made their first contribution in https://github.com/pytorch/torchtune/pull/1829
@moussaKam made their first contribution in https://github.com/pytorch/torchtune/pull/1885
@mori360 made their first contribution in https://github.com/pytorch/torchtune/pull/1833

Full Changelog: https://github.com/pytorch/torchtune/compare/v0.3.1...v0.4.0

- Python
Published by joecummings over 1 year ago

torchtune - v0.3.1 (Llama 3.2 Vision patch)

Overview

We've added full support for Llama 3.2 after it was announced, and this includes full/LoRA fine-tuning on the Llama3.2-1B, Llama3.2-3B base and instruct text models and Llama3.2-11B-Vision base and instruct text models. This means we now support the full end-to-end development of VLMs - fine-tuning, inference, and eval! We've also included a lot more goodies in a few short weeks:

Llama 3.2 1B/3B/11B Vision configs for full/LoRA fine-tuning
Updated recipes to support VLMs
Multimodal eval via EleutherAI
Support for torch.compile for VLMs
Revamped generation utilities for multimodal support + batched inference for text only
New knowledge distillation recipe with configs for Llama3.2 and Qwen2
Llama 3.1 405B QLoRA fine-tuning on 8xA100s
MPS support (beta) - you can now use torchtune on Mac!

New Features

Models

QLoRA with Llama 3.1 405B (#1232)
Llama 3.2 (#1679, #1688, #1661)

Multimodal

Update recipes for multimodal support (#1548, #1628)
Multimodal eval via EleutherAI (#1669, #1660)
Multimodal compile support (#1670)
Exportable multimodal models (#1541)

Generation

Revamped generate recipe with multimodal support (#1559, #1563, #1674, #1686)
Batched inference for text-only models (#1424, #1449, #1603, #1622)

Knowledge Distillation

Add single device KD recipe and configs for Llama 3.2, Qwen2 (#1539, #1690)

Memory and Performance

Compile FFT FSDP (#1573)
Apply rope on k earlier for efficiency (#1558)
Streaming offloading in (q)lora single device (#1443)

Quantization

Update quantization to use tensor subclasses (#1403)
Add int4 weight-only QAT flow targeting tinygemm kernel (#1570)

RLHF

Adding generic preference dataset builder (#1623)

Miscellaneous

Add drop_last to dataloader (#1654)
Add lowcpuram config to qlora (#1580)
MPS support (#1706)

Documentation

nits in memory optimizations doc (#1585)
Tokenizer and prompt template docs (#1567)
Latexifying IPOLoss docs (#1589)
modules doc updates (#1588)
More doc nits (#1611)
update docs (#1602)
Update llama3 chat tutorial (#1608)
Instruct and chat datasets docs (#1571)
Preference dataset docs (#1636)
Messages and message transforms docs (#1574)
Readme Updates (#1664)
Model transform docs (#1665)
Multimodal dataset builder + docs (#1667)
Datasets overview docs (#1668)
Update README.md (#1676)
Readme updates for Llama 3.2 (#1680)
Add 3.2 models to README (#1683)
Knowledge distillation tutorial (#1698)
Text completion dataset docs (#1696)

Quality-of-Life Improvements

Set possible resolutions to debug, not info (#1560)
Remove TiedEmbeddingTransformerDecoder from Qwen (#1547)
Make Gemma use regular TransformerDecoder (#1553)
llama 3_1 instantiate pos embedding only once (#1554)
Run unit tests against PyTorch nightlies as part of our nightly CI (#1569)
Support load_dataset kwargs in other dataset builders (#1584)
add fused = true to adam, except pagedAdam (#1575)
Move RLHF out of modules (#1591)
Make logger only log on rank0 for Phi3 loading errors (#1599)
Move rlhf tests out of modules (#1592)
Update PR template (#1614)
Update get_unmasked_sequence_lengths example 4 release (#1613)
remove ipo loss + small fixed (#1615)
Fix dora configs (#1618)
Remove unused var in generate (#1612)
remove deprecated message (#1619)
Fix qwen2 config (#1620)
Proper names for dataset types (#1625)
Make q optional in sample (#1637)
Rename JSONToMessages to OpenAIToMessages (#1643)
update gemma to ignore gguf (#1655)
Add Pillow >= 9.4 requirement (#1671)
guard import (#1684)
add upgrade to pip command (#1687)
Do not run CI on forked repos (#1681)

Bug Fixes

Fix flex attention test (#1568)
Add eom_id to Llama3 Tokenizer (#1586)
Only merge model weights in LoRA recipe when save_adapter_weights_only=False (#1476)
Hotfix eval recipe (#1594)
Fix typo in PPO recipe (#1607)
Fix loradpodistributed recipe (#1609)
Fixes for MM Masking and Collation (#1601)
delete duplicate LoRA dropout fields in DPO configs (#1583)
Fix tune download command in PPO config (#1593)
Fix tune run not identifying custom components (#1617)
Fix compile error in get_causal_mask_from_padding_mask (#1627)
Fix eval recipe bug for group tasks (#1642)
Fix basic tokenizer no special tokens (#1640)
add BlockMask to batchtodevice (#1651)
Fix PACK_TYPE import in collate (#1659)
Fix llavainstructdataset (#1658)
convert rgba to rgb (#1678)

New Contributors (auto-generated by GitHub)

@dvorjackz made their first contribution (#1558)

Full Changelog: https://github.com/pytorch/torchtune/compare/v0.3.0...v0.3.1

- Python
Published by RdoubleA over 1 year ago

torchtune - v0.3.0

Overview

We haven’t had a new release for a little while now, so there is a lot in this one. Some highlights include FSDP2 recipes for full finetune and LoRA(/QLoRA), support for DoRA fine-tuning, a PPO recipe for RLHF, Qwen2 models of various sizes, a ton of improvements to memory and performance (try our recipes with torch compile! try our sample packing with flex attention!), and Comet ML integration. For the full set of perf and memory improvements, we recommend installing with the PyTorch nightlies.

New Features

Here are highlights of some of our new features in 0.3.0.

Recipes

Full finetune FSDP2 recipe (#1287)
LoRA FSDP2 recipe with faster training than FSDP1 (#1517)
RLHF with PPO (#1005)
DoRA (#1115)
SimPO (#1223)

Models

Qwen2 0.5B, 1.5B, 7B model (#1143, #1247)
Flamingo model components (#1357)
CLIP encoder and vision transform (#1127)

Perf, memory, and quantization

Per-layer compile: 90% faster compile time and 75% faster training time (#1419)
Sample packing with flex attention: 80% faster training time with compile vs unpacked (#1193)
Chunked cross-entropy to reduce peak memory (#1390)
Make KV cache optional (#1207)
Option to save adapter checkpoint only (#1220)
Delete logits before bwd, saving ~4 GB (#1235)
Quantize linears without LoRA applied to NF4 (#1119)
Compile model and loss (#1296, #1319)
Speed up QLoRA initialization (#1294)
Set LoRA dropout to 0.0 to save memory (#1492)

Data/Datasets

Multimodal datasets: The Cauldron and LLaVA-Instruct-150K (#1158)
Multimodal collater (#1156)
Tokenizer redesign for better model-specific feature support (#1082)
Create general SFTDataset combining instruct and chat (#1234)
Interleaved image support in tokenizers (#1138)
Image transforms for CLIP encoder (#1084)
Vision cross-attention mask transform (#1141)
Support images in messages (#1504)

Miscellaneous

Deep fusion modules (#1338)
CometLogger integration (#1221)
Add profiler to full finetune recipes (#1288)
Support memory viz tool through the profiler (#1382, #1384)
Add RSO loss (#1197)
Add support for non-incremental decoding (#973)
Move utils directory to training (#1432, #1519, …)
Add bf16 dtype support on CPU (#1218)
Add grad norm logging (#1451)

Documentation

QAT tutorial (#1105)
Recipe docs pages and memory optimizations tutorial (#1230)
Add download commands to model API docs (#1167)
Updates to utils API docs (#1170)

Bug Fixes

Prevent pad ids, special tokens displaying in generate (#1211)
Reverting Gemma checkpoint logic causing missing head weight (#1168)
Fix compile on PyTorch 2.4 (#1512)
Fix Llama 3.1 RoPE init for compile (#1544)
Fix checkpoint load for FSDP2 with CPU offload (#1495)
Add missing quantization to Llama 3.1 layers (#1485)
Fix accuracy number parsing in Eleuther eval test (#1135)
Allow adding custom system prompt to messages (#1366)
Cast DictConfig -> dict in instantiate (#1450)

New Contributors (Auto generated by Github)

@sanchitintel made their first contribution in https://github.com/pytorch/torchtune/pull/1218 @lulmer made their first contribution in https://github.com/pytorch/torchtune/pull/1134 @stsouko made their first contribution in https://github.com/pytorch/torchtune/pull/1238 @spider-man-tm made their first contribution in https://github.com/pytorch/torchtune/pull/1220 @winglian made their first contribution in https://github.com/pytorch/torchtune/pull/1119 @fyabc made their first contribution in https://github.com/pytorch/torchtune/pull/1143 @mreso made their first contribution in https://github.com/pytorch/torchtune/pull/1274 @gau-nernst made their first contribution in https://github.com/pytorch/torchtune/pull/1288 @lucylq made their first contribution in https://github.com/pytorch/torchtune/pull/1269 @dzheng256 made their first contribution in https://github.com/pytorch/torchtune/pull/1221 @ChinoUkaegbu made their first contribution in https://github.com/pytorch/torchtune/pull/1310 @janeyx99 made their first contribution in https://github.com/pytorch/torchtune/pull/1382 @Gasoonjia made their first contribution in https://github.com/pytorch/torchtune/pull/1385 @shivance made their first contribution in https://github.com/pytorch/torchtune/pull/1417 @yf225 made their first contribution in https://github.com/pytorch/torchtune/pull/1419 @thomasjpfan made their first contribution in https://github.com/pytorch/torchtune/pull/1363 @AnuravModak made their first contribution in https://github.com/pytorch/torchtune/pull/1429 @lindawangg made their first contribution in https://github.com/pytorch/torchtune/pull/1451 @andrewldesousa made their first contribution in https://github.com/pytorch/torchtune/pull/1470 @mirceamironenco made their first contribution in https://github.com/pytorch/torchtune/pull/1523 @mikaylagawarecki made their first contribution in https://github.com/pytorch/torchtune/pull/1315

- Python
Published by ebsmothers almost 2 years ago

torchtune - v0.2.1 (llama3.1 patch)

Overview

This patch includes support for fine-tuning Llama3.1 with torchtune as well as various improvements to the library.

New Features & Improvements

Models

Added support for Llama3.1 (#1208)

Modules

Tokenizer refactor to improve the extensibility of our tokenizer components (#1082)

- Python
Published by joecummings almost 2 years ago

torchtune - v0.2.0

Overview

It’s been awhile since we’ve done a release and we have a ton of cool, new features in the torchtune library including distributed QLoRA support, new models, sample packing, and more! Checkout #new-contributors for an exhaustive list of new contributors to the repo.

Enjoy the new release and happy tuning!

New Features

Here’s some highlights of our new features in v0.2.0.

Recipes

We added support for QLoRA with FSDP2! This means users can now run 70B+ models on multiple GPUs. We provide example configs for Llama2 7B and 70B sizes. Note: this currently requires you to install PyTorch nightlies to access the FSDP2 methods. (#909)
- Also by leveraging FSDP2, we see a speed up of 12% tokens/sec and a 3.2x speedup in model init over FSDP1 with LoRA (#855)
- We added support for other variants of the Meta-Llama3 recipes including:
- 70B with LoRA (#802)
- 70B full finetune (#993)
- 8B memory-efficient full finetune which saves 46% peak memory over previous version (#990)
- We introduce a quantization-aware training (QAT) recipe. Training with QAT shows significant improvement in model quality if you plan on quantizing your model post-training. (#980)
- torchtune made updates to the eval recipe including:
- Batched inference for faster eval (#947)
- Support for free generation tasks in EleutherAI Eval Harness (#975)
- Support for custom eval configs (#1055)

Models

Phi-3 Mini-4K-Instruct from Microsoft (#876)
Gemma 7B from Google (#971)
Code Llama2: 7B, 13B, and 70B sizes from Meta (#847)
@salman designed and implemented reward modeling for Mistral models (#840, #991)

Perf, memory, and quantization

We made improvements to our FSDP + Llama3 recipe, resulting in 13% more savings in allocated memory for the 8B model. (#865)
Added Int8 per token dynamic activation + int4 per axis grouped weight (8da4w) quantization (#884)

Data/Datasets

We added support for a widely requested feature - sample packing! This feature drastically speeds up model training - e.g. 2X faster with the alpaca dataset. (#875, #1109)
In addition to our instruct tuning, we now also support continued pretraining and include several example datasets like wikitext and CNN DailyMail. (#868)
Users can now train on multiple datasets using concat datasets (#889)
We now support OpenAI conversation style data (#890)

Miscellaneous

@jeromeku added a much more advanced profiler so users can understand the exact bottlenecks in their LLM training. (#1089)
We made several metric logging improvements:
- Log tokens/sec, per-step logging, configurable memory logging (#831)
- Better formatting for stdout memory logs (#817)
- Users can now save models in a safetensor format. (#1096)
Updated activation checkpointing to support selective layer and selective op activation checkpointing (#785)
We worked with the Hugging Face team to provide support for loading adapter weights fine tuned via torchtune directly into the PEFT library. (#933)

Documentation

We wrote a new tutorial for fine-tuning Llama3 with chat data (#823) and revamped the datasets tutorial (#994)
Looooooooong overdue, but we added proper documentation for the tune CLI (#1052)
Improved contributing guide (#896) ## Bug Fixes
@optimox found and fixed a bug to ensure that LoRA dropout was correctly applied (#996)
Fixed a broken link for Llama3 tutorial in #805
Fixed Gemma model generation (#1016)
Bug workaround: to download CNN DailyMail, launch a single device recipe first and once it’s downloaded you can use the dataset for distributed recipes.

New Contributors

@supernovae made their first contribution in https://github.com/pytorch/torchtune/pull/803
@eltociear made their first contribution in https://github.com/pytorch/torchtune/pull/814
@Carolinabanana made their first contribution in https://github.com/pytorch/torchtune/pull/810
@musab-mk made their first contribution in https://github.com/pytorch/torchtune/pull/818
@apthagowda97 made their first contribution in https://github.com/pytorch/torchtune/pull/816
@lessw2020 made their first contribution in https://github.com/pytorch/torchtune/pull/785
@weifengpy made their first contribution in https://github.com/pytorch/torchtune/pull/843
@musabgultekin made their first contribution in https://github.com/pytorch/torchtune/pull/857
@xingyaoww made their first contribution in https://github.com/pytorch/torchtune/pull/890
@vmoens made their first contribution in https://github.com/pytorch/torchtune/pull/902
@andrewor14 made their first contribution in https://github.com/pytorch/torchtune/pull/884
@kunal-mansukhani made their first contribution in https://github.com/pytorch/torchtune/pull/926
@EvilFreelancer made their first contribution in https://github.com/pytorch/torchtune/pull/889
@water-vapor made their first contribution in https://github.com/pytorch/torchtune/pull/950
@Optimox made their first contribution in https://github.com/pytorch/torchtune/pull/995
@tambulkar made their first contribution in https://github.com/pytorch/torchtune/pull/1011
@christobill made their first contribution in https://github.com/pytorch/torchtune/pull/1004
@j-dominguez9 made their first contribution in https://github.com/pytorch/torchtune/pull/1056
@andyl98 made their first contribution in https://github.com/pytorch/torchtune/pull/1061
@hmosousa made their first contribution in https://github.com/pytorch/torchtune/pull/1065
@yasser-sulaiman made their first contribution in https://github.com/pytorch/torchtune/pull/1055
@parthsarthi03 made their first contribution in https://github.com/pytorch/torchtune/pull/1081
@mdeff made their first contribution in https://github.com/pytorch/torchtune/pull/1086
@jeffrey-fong made their first contribution in https://github.com/pytorch/torchtune/pull/1096
@jeromeku made their first contribution in https://github.com/pytorch/torchtune/pull/1089
@man-shar made their first contribution in https://github.com/pytorch/torchtune/pull/1126

Full Changelog: https://github.com/pytorch/torchtune/compare/v0.1.1...v0.2.0

- Python
Published by pbontrager almost 2 years ago

torchtune - v0.1.1 (llama3 patch)

Overview

This patch includes support for fine-tuning Llama3 with torchtune as well as various improvements to the library.

New Features & Improvements

Recipes

Added configuration for Llama2 13B QLoRA (#779)
Added support for Llama2 70B LoRA (#788)

Models

Added support for Llama3 (#793)

Utils

Improvements to Weights & Biases logger (#772, #777)

Documentation

Added Llama3 tutorial (#793)
Updated E2E tutorial with instructions for uploading to the Hugging Face Hub (#773)
Updates to the README (#775, #778, #786)
Added instructions for installing torchtune nightly (#792)

- Python
Published by joecummings about 2 years ago

torchtune - torchtune v0.1.0 (first release)

Overview

We are excited to announce the release of torchtune v0.1.0! torchtune is a PyTorch library for easily authoring, fine-tuning and experimenting with LLMs. The library emphasizes 4 key aspects:

Simplicity and Extensibility. Native-PyTorch, componentized design and easy-to-reuse abstractions
Correctness. High bar on proving the correctness of components and recipes
Stability. PyTorch just works. So should torchtune
Democratizing LLM fine-tuning. Works out-of-the-box on both consumer and professional hardware setups

torchtune is tested with the latest stable PyTorch release (2.2.2) as well as the preview nightly version.

New Features

Here are a few highlights of new features from this release.

Recipes

Added support for running a LoRA finetune using a single GPU (#454)
Added support for running a QLoRA finetune using a single GPU (#478)
Added support for running a LoRA finetune using multiple GPUs with FSDP (#454, #266)
Added support for running a full finetune using a single GPU (#482)
Added support for running a full finetune using multiple GPUs with FSDP (#251, #482)
Added WIP support for DPO (#645)
Integrated with EleutherAI Eval Harness for an evaluation recipe (#549)
Added support for quantization through integration with torchao (#632)
Added support for single-GPU inference (#619)
Created a config parsing system to interact with recipes through YAML and the command line (#406, #456, #468)

Models

Added support for Llama2 7B (#70, #137) and 13B (#571)
Added support for Mistral 7B (#571)
Added support for Gemma WIP

Datasets

Added support for instruction and chat-style datasets (#752, #624)
Included example implementations of datasets (#303, #116, #407, #541, #576, #645)
Integrated with Hugging Face Datasets (#70)

Utils

Integrated with Weights & Biases for metric logging (#162, #660)
Created a checkpointer to handle model files from HF and Meta (#442)
Added a tune CLI tool (#396)

Documentation

In addition to documenting torchtune’s public facing APIs, we include several new tutorials and “deep-dives” in our documentation.

Added LoRA tutorial (#368)
Added “End-to-End Workflow with torchtune” tutorial (#690)
Added datasets tutorial (#735)
Added QLoRA tutorial (#693)
Added deep-dive on the checkpointer (#674)
Added deep-dive on configs (#311)
Added deep-dive on recipes (#316)
Added deep-dive on Weights & Biases integration (#660)

Community Contributions

This release of torchtune features some amazing work from the community:

Gemma 2B model from @solitude-alive (#630)
DPO finetuning recipe from @yechenzhi (#645)
Weights & Biases updates from @tcapelle (#660)

- Python
Published by joecummings about 2 years ago

Recent Releases of torchtune

torchtune - v0.6.1

v0.6.1

torchtune - v0.6.0

Highlights

Tensor Parallel training + inference (#2245) (#2330)

Multinode training support (#2301)

Full Distributed DPO recipe (#2275)

Download Llama 3.1 8B

Finetune on four devices

Phi 4 models (#1835)

Download Phi 4 14B

pip install bits and bytes

Finetune on a single GPU

Improved NPU support (#2234)

What's Changed

New Contributors

torchtune - v0.5.0

Highlights

Kaggle integration (#2002)

QAT + LoRA training recipe (#1931)

Download Llama 3.2 3B

Finetune on two devices

Improved Hugging Face and vLLM integration (#2074)

Gemma 2 models (#1835)

Download Gemma 2 27B

Finetune on a single GPU

Early exit training recipe (#1076)

Download Llama2 7B

Finetune with early exit on four devices

NPU support (#1826)

What's Changed

New Contributors

torchtune - v0.4.0

Highlights

Activation offloading (#1443, #1645, #1847)

Llama3.2V 90B with QLoRA (#1880, #1726)

Download the model first

Run with e.g. 4 GPUs

Qwen2.5 model family has landed (#1863)

New documentation on using custom recipes, configs, and components (#1910)

What's Changed

New Contributors

torchtune - v0.3.1 (Llama 3.2 Vision patch)

Overview

New Features

Models

Multimodal

Generation

Knowledge Distillation

Memory and Performance

Quantization

RLHF

Miscellaneous

Documentation

Quality-of-Life Improvements

Bug Fixes

New Contributors (auto-generated by GitHub)

torchtune - v0.3.0

Overview

New Features

Recipes

Models

Perf, memory, and quantization

Data/Datasets

Miscellaneous

Documentation

Bug Fixes

New Contributors (Auto generated by Github)

torchtune - v0.2.1 (llama3.1 patch)

Overview

New Features & Improvements

Models

Modules

torchtune - v0.2.0

Overview

New Features

Recipes

Models

Perf, memory, and quantization