Recent Releases of composer

composer - v0.32.1

What's Changed

  • Removed extraneous usage of fsdp_config.load_monolith_rank0_only since that's unreliable by @rithwik-db in https://github.com/mosaicml/composer/pull/3901
  • Fixed automicrobatching issue for FSDP1 by @rithwik-db in https://github.com/mosaicml/composer/pull/3909
  • reverted mlflow upgrade due to slowdowns by @ethantang-db in https://github.com/mosaicml/composer/pull/3910

Full Changelog: https://github.com/mosaicml/composer/compare/v0.32.0...v0.32.1

- Python
Published by ethantang-db 11 months ago

composer - v0.32.0

What's Changed

  • Update FSDP checkpointing test to use UC Volumes and updated dockerfile for new composer version by @rithwik-db in https://github.com/mosaicml/composer/pull/3865
  • Using cu128 instead of cu126 for pr-gpu and daily tests by @rithwik-db in https://github.com/mosaicml/composer/pull/3867
  • Refactored auto-microbatching hook handles for FSDP by @rithwik-db in https://github.com/mosaicml/composer/pull/3843
  • Removed most s3 bucket based tests (replaced with UC Volumes) by @rithwik-db in https://github.com/mosaicml/composer/pull/3869
  • Supporting Mixed Init on FSDP2 by @rithwik-db in https://github.com/mosaicml/composer/pull/3872
  • Documentation Improvements: Clarify Explanations in Gated Linear Units and Squeeze-Excite README Files by @leopardracer in https://github.com/mosaicml/composer/pull/3875
  • Mlflow move to cpu by @dakinggg in https://github.com/mosaicml/composer/pull/3878
  • FSDP2 mixed init fixes by @rithwik-db in https://github.com/mosaicml/composer/pull/3882
  • Remove sklearn dep by @dakinggg in https://github.com/mosaicml/composer/pull/3883
  • Monolithic checkpointing by @rithwik-db in https://github.com/mosaicml/composer/pull/3876
  • Update docs conf.py copyright to 2025 by @jacobfulano in https://github.com/mosaicml/composer/pull/3751
  • Fix Typos in Comments for activation_monitor.py and mlperf.py by @kilavvy in https://github.com/mosaicml/composer/pull/3877
  • Mixed Precision for FSDP2 by @rithwik-db in https://github.com/mosaicml/composer/pull/3884
  • Add h200 to flops dict by @dakinggg in https://github.com/mosaicml/composer/pull/3889
  • updated fsdp2 config by @rithwik-db in https://github.com/mosaicml/composer/pull/3896
  • Supporting peft for FSDP2 by @rithwik-db in https://github.com/mosaicml/composer/pull/3897

New Contributors

  • @leopardracer made their first contribution in https://github.com/mosaicml/composer/pull/3875
  • @kilavvy made their first contribution in https://github.com/mosaicml/composer/pull/3877

Full Changelog: https://github.com/mosaicml/composer/compare/v0.31.0...v0.32.0

- Python
Published by ethantang-db 12 months ago

composer - v0.31.0

What's New

1. PyTorch 2.7.0 Compatibility (https://github.com/mosaicml/composer/pull/3850)

We've added support for PyTorch 2.7.0 and created a Dockerfile to support PyTorch 2.7.0 + CUDA 12.8. The current Composer image supports PyTorch 2.7.0 + CUDA 12.6.3.

2. Experimental FSDP2 support has been added to Trainer (https://github.com/mosaicml/composer/pull/3852)

Experimental FSDP2 support was added to Trainer with: - auto_wrap based on _fsdp_wrap_fn and/or _fsdp_wrap attributes within the model (https://github.com/mosaicml/composer/pull/3826) - Activation checkpointing and CPU offloading (https://github.com/mosaicml/composer/pull/3832) - Meta initialization (https://github.com/mosaicml/composer/pull/3852)

Note: Not all features are supported yet (e.g. automicrobatching, monolithic checkpointing)

Usage:

Add FSDP_VERSION=2 as an environment variable and set your FSDP2 config (parallelism_config) as desired. The full set of available attributes can be found here.

Bug Fixes

  • Resolve a memory hang issue in Mlflow monitor process (https://github.com/mosaicml/composer/pull/3830)

What's Changed

  • Bump Composer 0.31.0.dev0 by @KuuCi in https://github.com/mosaicml/composer/pull/3808
  • Update Checkpoint Back-Compatibility Test by @KuuCi in https://github.com/mosaicml/composer/pull/3810
  • Extend docker build matrix to add an entry for pytorch2.6+cu126 by @sirejdua-db in https://github.com/mosaicml/composer/pull/3805
  • Bump databricks-sdk from 0.47.0 to 0.49.0 by @dependabot in https://github.com/mosaicml/composer/pull/3814
  • Bump pypandoc from 1.14 to 1.15 by @dependabot in https://github.com/mosaicml/composer/pull/3813
  • Update google-cloud-storage requirement from <3.0,>=2.0.0 to >=2.0.0,<4.0 by @dependabot in https://github.com/mosaicml/composer/pull/3812
  • Update setuptools version by @irenedea in https://github.com/mosaicml/composer/pull/3816
  • Kickstart FSDP2 by @bowenyang008 in https://github.com/mosaicml/composer/pull/3806
  • Remove network calls to HF in CI by @dakinggg in https://github.com/mosaicml/composer/pull/3817
  • Update psutil requirement from <7,>=5.8.0 to >=5.8.0,<8 by @dependabot in https://github.com/mosaicml/composer/pull/3818
  • [FSDP2] Init FSDP2 based checkpointing by @bowenyang008 in https://github.com/mosaicml/composer/pull/3824
  • Update torchmetrics requirement from <1.6.1,>=1.0 to >=1.0,<1.7.2 by @dependabot in https://github.com/mosaicml/composer/pull/3829
  • Bump coverage[toml] from 7.6.8 to 7.8.0 by @dependabot in https://github.com/mosaicml/composer/pull/3827
  • Bump yamllint from 1.35.1 to 1.37.0 by @dependabot in https://github.com/mosaicml/composer/pull/3820
  • Update numpy requirement from <2.2.0,>=1.21.5 to >=1.21.5,<2.3.0 by @dependabot in https://github.com/mosaicml/composer/pull/3828
  • Update optimizer params for fsdp2 by @rithwik-db in https://github.com/mosaicml/composer/pull/3822
  • Change Mlflow monitor process from fork to spawn to reduce memory usage by @dakinggg in https://github.com/mosaicml/composer/pull/3830
  • Ignore mlflow warning in test by @dakinggg in https://github.com/mosaicml/composer/pull/3831
  • Bump HF hub version by @dakinggg in https://github.com/mosaicml/composer/pull/3839
  • Bump databricks-sdk from 0.49.0 to 0.50.0 by @dependabot in https://github.com/mosaicml/composer/pull/3834
  • Update transformers requirement from !=4.34.0,<4.51,>=4.11 to >=4.11,!=4.34.0,<4.52 by @dependabot in https://github.com/mosaicml/composer/pull/3838
  • Eliminate dead code before torch version 2.4 by @bowenyang008 in https://github.com/mosaicml/composer/pull/3833
  • Support submodule wrapping for FSDP2 according to model definition (with _fsdp_wrap and fsdp_wrap_fn) by @rithwik-db in https://github.com/mosaicml/composer/pull/3826
  • Activation Checkpointing and Offloading for FSDP2 by @rithwik-db in https://github.com/mosaicml/composer/pull/3832
  • Pin EFA installer version by @dakinggg in https://github.com/mosaicml/composer/pull/3842
  • Add two legacy torch images to the container build matrix by @asfandyarq in https://github.com/mosaicml/composer/pull/3841
  • Bump yamllint from 1.37.0 to 1.37.1 by @dependabot in https://github.com/mosaicml/composer/pull/3845
  • Update packaging requirement from <24.3,>=21.3.0 to >=21.3.0,<25.1 by @dependabot in https://github.com/mosaicml/composer/pull/3846
  • Bump cryptography from 44.0.0 to 44.0.3 by @dependabot in https://github.com/mosaicml/composer/pull/3848
  • Upgrade yapf version by @dakinggg in https://github.com/mosaicml/composer/pull/3840
  • Bump ipython from 8.11.0 to 8.36.0 by @dependabot in https://github.com/mosaicml/composer/pull/3847
  • Update huggingface-hub requirement from <0.31,>=0.21.2 to >=0.21.2,<0.32 by @dependabot in https://github.com/mosaicml/composer/pull/3851
  • Update EFA installer version by @dakinggg in https://github.com/mosaicml/composer/pull/3844
  • Fix typos by @omahs in https://github.com/mosaicml/composer/pull/3853
  • Integrate FSDP2 wrapper into Trainer by @bowenyang008 in https://github.com/mosaicml/composer/pull/3852
  • Deprecate code eval utils by @dakinggg in https://github.com/mosaicml/composer/pull/3854
  • FSDP2 time and verbose logging by @bowenyang008 in https://github.com/mosaicml/composer/pull/3856
  • Fix RDMA installation by @dakinggg in https://github.com/mosaicml/composer/pull/3857
  • Update ci-testing version to latest by @dakinggg in https://github.com/mosaicml/composer/pull/3859
  • Updating composer to support Torch 2.7 by @rithwik-db in https://github.com/mosaicml/composer/pull/3850
  • Cleanup version gating pre-2.6.0 by @rithwik-db in https://github.com/mosaicml/composer/pull/3863

New Contributors

  • @sirejdua-db made their first contribution in https://github.com/mosaicml/composer/pull/3805
  • @asfandyarq made their first contribution in https://github.com/mosaicml/composer/pull/3841
  • @omahs made their first contribution in https://github.com/mosaicml/composer/pull/3853

Full Changelog: https://github.com/mosaicml/composer/compare/v0.30.0...v0.31.0

- Python
Published by rithwik-db about 1 year ago

composer - v0.30.0

What's New

1. Python 3.12 Bump (https://github.com/mosaicml/composer/pull/3783)

We've added support for Python 3.12 and deprecated Python 3.9 support.

What's Changed

  • Updated test_fsdp_load_old_checkpoint with 0.29.0 by @rithwik-db in https://github.com/mosaicml/composer/pull/3771
  • Mlflow rocm error by @KuuCi in https://github.com/mosaicml/composer/pull/3775
  • Update docker to have FA==2.7.4.post1 by @KuuCi in https://github.com/mosaicml/composer/pull/3772
  • [GRT-3415] Remove dead code for peft logging by @bowenyang008 in https://github.com/mosaicml/composer/pull/3777
  • Patch Mflow .trash directories by @KuuCi in https://github.com/mosaicml/composer/pull/3778
  • Remove TE ONNX Export Context to Enable TE FusedAttention on AMD Hardware by @jjuvonen-amd in https://github.com/mosaicml/composer/pull/3779
  • Update Makefile to use WORLD_SIZE by @irenedea in https://github.com/mosaicml/composer/pull/3781
  • Bump gitpython from 3.1.43 to 3.1.44 by @dependabot in https://github.com/mosaicml/composer/pull/3785
  • deprecate gcs test by @ethantang-db in https://github.com/mosaicml/composer/pull/3791
  • Update mosaicml-cli requirement from <0.7,>=0.5.25 to >=0.5.25,<0.8 by @dependabot in https://github.com/mosaicml/composer/pull/3742
  • Bump databricks-sdk from 0.44.1 to 0.47.0 by @dependabot in https://github.com/mosaicml/composer/pull/3786
  • deprecate ghcr by @KevDevSha in https://github.com/mosaicml/composer/pull/3790
  • Bump transformers by @dakinggg in https://github.com/mosaicml/composer/pull/3793
  • Bump Python 3.12 by @KuuCi in https://github.com/mosaicml/composer/pull/3783
  • Fix checkpoint loading in Pytorch 2.6.0 for ckpts exported before Pytorch 2.1.0 by @ethantang-db in https://github.com/mosaicml/composer/pull/3792
  • Update huggingface-hub requirement from <0.27,>=0.21.2 to >=0.21.2,<0.30 by @dependabot in https://github.com/mosaicml/composer/pull/3795
  • Update pytest-httpserver requirement from <1.1,>=1.0.4 to >=1.0.4,<1.2 by @dependabot in https://github.com/mosaicml/composer/pull/3796
  • Update scikit-learn requirement from <1.6,>=1.2.0 to >=1.2.0,<1.7 by @dependabot in https://github.com/mosaicml/composer/pull/3799
  • Bump Release Ref 0.3.3 by @KuuCi in https://github.com/mosaicml/composer/pull/3804
  • Remove huggyllama fixture by @dakinggg in https://github.com/mosaicml/composer/pull/3807
  • Fix release docker with 3.10 by @KuuCi in https://github.com/mosaicml/composer/pull/3809

New Contributors

  • @bowenyang008 made their first contribution in https://github.com/mosaicml/composer/pull/3777
  • @jjuvonen-amd made their first contribution in https://github.com/mosaicml/composer/pull/3779

Full Changelog: https://github.com/mosaicml/composer/compare/v0.29.0...v0.30.0

- Python
Published by KuuCi about 1 year ago

composer - v0.29.0

Deprecations

1. device_transforms param in DataSpec has been deprecated (https://github.com/mosaicml/composer/pull/3770)

Composer no longer supports the device_transforms parameter in DataSpec. Instead, DataSpec supports batch_transforms for batch level transformations on CPU and microbatch_transforms for micro-batch level transformations on target device.

What's Changed

  • Add checkpoint BC tests for 0.27.0 and 0.28.0 by @snarayan21 in https://github.com/mosaicml/composer/pull/3735
  • Address sklearn device issues by @snarayan21 in https://github.com/mosaicml/composer/pull/3748
  • Update FAQ with hf-transfer info by @KuuCi in https://github.com/mosaicml/composer/pull/3745
  • Fix MLFlow logger CI error by ignoring UserWarning by @j316chuck in https://github.com/mosaicml/composer/pull/3758
  • Bump ci to v0.3.3 by @j316chuck in https://github.com/mosaicml/composer/pull/3759
  • Fix order of arguments to loss by @gsganden in https://github.com/mosaicml/composer/pull/3754
  • fix: make JSONTraceHandler.batch_end robust to /tmp/ being on diff mount to dest by @thundergolfer in https://github.com/mosaicml/composer/pull/3766
  • Bump pytorch to 2.6.0 by @rithwik-db in https://github.com/mosaicml/composer/pull/3763
  • Bump databricks-sdk from 0.38.0 to 0.44.1 by @dependabot in https://github.com/mosaicml/composer/pull/3765
  • Version bump to v0.30.0.dev0 by @rithwik-db in https://github.com/mosaicml/composer/pull/3770

New Contributors

  • @gsganden made their first contribution in https://github.com/mosaicml/composer/pull/3754
  • @thundergolfer made their first contribution in https://github.com/mosaicml/composer/pull/3766
  • @rithwik-db made their first contribution in https://github.com/mosaicml/composer/pull/3763

Full Changelog: https://github.com/mosaicml/composer/compare/v0.28.0...v0.29.0

- Python
Published by rithwik-db over 1 year ago

composer - v0.28.0

Deprecations

1. Deepspeed Deprecation (https://github.com/mosaicml/composer/pull/3732)

Composer no longer supports the Deepspeed deep learning library. Support has shifted to PyTorch-native solutions such as FSDP and DDP only. Please use Composer v0.27.0 or before to continue using Deepspeed!

What's Changed

  • Fix composer gpu daily test to use torch 2.5.1 by @j316chuck in https://github.com/mosaicml/composer/pull/3712
  • Bump coverage[toml] from 7.6.4 to 7.6.7 by @dependabot in https://github.com/mosaicml/composer/pull/3713
  • Update torchmetrics requirement from <1.5.3,>=1.0 to >=1.0,<1.6.1 by @dependabot in https://github.com/mosaicml/composer/pull/3714
  • Bump ubuntu 22.04 + fix CI mlflow tests by @KuuCi in https://github.com/mosaicml/composer/pull/3716
  • Bump databricks-sdk from 0.36.0 to 0.37.0 by @dependabot in https://github.com/mosaicml/composer/pull/3715
  • Bump mosaicml/pytorch images to use new mosaicml/pytorch images with updated ubuntu 22.04 by @KuuCi in https://github.com/mosaicml/composer/pull/3718
  • migrated all possible assets from GCP to repo by @ethantang-db in https://github.com/mosaicml/composer/pull/3717
  • Bump databricks-sdk from 0.37.0 to 0.38.0 by @dependabot in https://github.com/mosaicml/composer/pull/3720
  • Bump coverage[toml] from 7.6.7 to 7.6.8 by @dependabot in https://github.com/mosaicml/composer/pull/3721
  • Expose DistributedSampler RNG seed argument by @janEbert in https://github.com/mosaicml/composer/pull/3724
  • Fix netifaces install in Dockerfile by @j316chuck in https://github.com/mosaicml/composer/pull/3726
  • Update protobuf requirement from <5.29 to <5.30 by @dependabot in https://github.com/mosaicml/composer/pull/3728
  • Bump cryptography from 43.0.3 to 44.0.0 by @dependabot in https://github.com/mosaicml/composer/pull/3731
  • Speed up CI tests :) by @KuuCi in https://github.com/mosaicml/composer/pull/3727
  • Remove deepspeed completely by @snarayan21 in https://github.com/mosaicml/composer/pull/3732
  • Fix daily test failures by @snarayan21 in https://github.com/mosaicml/composer/pull/3733
  • Version bump to v0.29.0.dev0 by @snarayan21 in https://github.com/mosaicml/composer/pull/3734

New Contributors

  • @janEbert made their first contribution in https://github.com/mosaicml/composer/pull/3724

Full Changelog: https://github.com/mosaicml/composer/compare/v0.27.0...v0.28.0

- Python
Published by snarayan21 over 1 year ago

composer - v0.27.0

What's New

1. Torch 2.5.1 Compatibility (https://github.com/mosaicml/composer/pull/3701)

We've added support for torch 2.5.1, including checkpointing bug fixes from PyTorch.

2. Add batch/microbatch transforms (https://github.com/mosaicml/composer/pull/3703)

Sped up device transformations by doing batch transform on CPU and microbatch transforms on GPU

Deprecations and Breaking Changes

1. MLFlow Metrics Deduplication (https://github.com/mosaicml/composer/pull/3678)

We added a metric de-duplication feature for the MLflow logger in Composer. Metrics that remain unchanged since the last step are not logged unless specific conditions are met, which by default is if we've reached a 100th multiple of duplicated metric steps. This optimizes logging storage by reducing redundant entries, balancing detailed sampling with efficiency.

Example: MlflowLogger(..., log_duplicated_metric_every_n_steps=100)

What's Changed

  • Metrics dedup for MLflow logger by @chenmoneygithub in https://github.com/mosaicml/composer/pull/3678
  • Bump databricks-sdk from 0.33.0 to 0.36.0 by @dependabot in https://github.com/mosaicml/composer/pull/3686
  • Update pillow requirement from <11,>=10.3.0 to >=10.3.0,<12 by @dependabot in https://github.com/mosaicml/composer/pull/3684
  • Lower min torchmetrics version by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3691
  • Private link error handling by @nancyhung in https://github.com/mosaicml/composer/pull/3689
  • Update checkpoint tests to use new version 0.26.0 by @irenedea in https://github.com/mosaicml/composer/pull/3683
  • Bump coverage[toml] from 7.6.3 to 7.6.4 by @dependabot in https://github.com/mosaicml/composer/pull/3694
  • Pin checkpoint state dict flattening patch by @b-chu in https://github.com/mosaicml/composer/pull/3700
  • Torch bump to 2.5.1 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3701
  • Fix typo in trainer doc by @XiaohanZhangCMU in https://github.com/mosaicml/composer/pull/3702
  • Update packaging requirement from <24.2,>=21.3.0 to >=21.3.0,<24.3 by @dependabot in https://github.com/mosaicml/composer/pull/3707
  • Update torchmetrics requirement from <1.4.1,>=1.0 to >=1.0,<1.5.3 by @dependabot in https://github.com/mosaicml/composer/pull/3706
  • Add batch/microbatch transforms by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3703
  • Bump version to 0.28.0.dev0 by @j316chuck in https://github.com/mosaicml/composer/pull/3709
  • Add torch 2.5.1 composer tests by @j316chuck in https://github.com/mosaicml/composer/pull/3710

Full Changelog: https://github.com/mosaicml/composer/compare/v0.26.1...v0.27.0

- Python
Published by j316chuck over 1 year ago

composer - v0.26.1

What's Changed

  • Private link error handling by @nancyhung in https://github.com/mosaicml/composer/pull/3689

Full Changelog: https://github.com/mosaicml/composer/compare/v0.26.0...v0.26.1

- Python
Published by dakinggg over 1 year ago

composer - v0.26.0

What's New

1. Torch 2.5.0 Compatibility (https://github.com/mosaicml/composer/pull/3609)

We've added support for torch 2.5.0, including necessary patches to Torch.

Deprecations and Breaking Changes

1. FSDP Configuration Changes(#3681)

We no longer support passing fsdp_config and fsdp_auto_wrap directly to Trainer.

If you'd like to specify an fsdp config and configure fsdp auto wrapping, you should use parallelism_config.

trainer = Trainer( parallelism_config = { 'fsdp': { 'auto_wrap': True ... } } )

2. Removal of Pytorch Legacy Sharded Checkpoint Support (#3631)

PyTorch briefly used a different sharded checkpoint format than the current one, which was quickly deprecated by PyTorch. We have removed support for this format. We initially removed support for saving in this format in https://github.com/mosaicml/composer/pull/2262, and the original feature was added in https://github.com/mosaicml/composer/pull/1902. Please reach out if you have concerns or need help converting your checkpoints to the new format.

What's Changed

  • Add backward compatibility checkpoint tests for v0.25.0 by @dakinggg in https://github.com/mosaicml/composer/pull/3635
  • Don't use TP when tensor_parallel_degree is 1 by @eitanturok in https://github.com/mosaicml/composer/pull/3636
  • Update huggingface-hub requirement from <0.25,>=0.21.2 to >=0.21.2,<0.26 by @dependabot in https://github.com/mosaicml/composer/pull/3637
  • Update transformers requirement from !=4.34.0,<4.45,>=4.11 to >=4.11,!=4.34.0,<4.46 by @dependabot in https://github.com/mosaicml/composer/pull/3638
  • Bump databricks-sdk from 0.32.0 to 0.33.0 by @dependabot in https://github.com/mosaicml/composer/pull/3639
  • Remove Legacy Checkpointing by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3631
  • Surface UC permission error by @b-chu in https://github.com/mosaicml/composer/pull/3642
  • Tensor Parallelism Tests by @eitanturok in https://github.com/mosaicml/composer/pull/3620
  • Switch to log.info for deterministic mode by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3643
  • Update pre-commit requirement from <4,>=3.4.0 to >=3.4.0,<5 by @dependabot in https://github.com/mosaicml/composer/pull/3645
  • Update peft requirement from <0.13,>=0.10.0 to >=0.10.0,<0.14 by @dependabot in https://github.com/mosaicml/composer/pull/3646
  • Create callback to load checkpoint by @irenedea in https://github.com/mosaicml/composer/pull/3641
  • Bump jupyter from 1.0.0 to 1.1.1 by @dependabot in https://github.com/mosaicml/composer/pull/3595
  • Fix DB SDK Import by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3648
  • Bump coverage[toml] from 7.6.0 to 7.6.3 by @dependabot in https://github.com/mosaicml/composer/pull/3651
  • Bump pypandoc from 1.13 to 1.14 by @dependabot in https://github.com/mosaicml/composer/pull/3652
  • Replace list with Sequence by @KuuCi in https://github.com/mosaicml/composer/pull/3654
  • Add better error handling for non-rank 0 during Monolithic Checkpoint Loading by @j316chuck in https://github.com/mosaicml/composer/pull/3647
  • Raising a better warning if train or eval did not process any data. by @ethantang-db in https://github.com/mosaicml/composer/pull/3656
  • Fix Logo by @XiaohanZhangCMU in https://github.com/mosaicml/composer/pull/3659
  • Update huggingface-hub requirement from <0.26,>=0.21.2 to >=0.21.2,<0.27 by @dependabot in https://github.com/mosaicml/composer/pull/3668
  • Bump cryptography from 42.0.8 to 43.0.3 by @dependabot in https://github.com/mosaicml/composer/pull/3667
  • Bump pytorch to 2.5.0 by @b-chu in https://github.com/mosaicml/composer/pull/3663
  • Don't overwrite sys.excepthook in mlflow logger by @dakinggg in https://github.com/mosaicml/composer/pull/3675
  • Fix pull request target by @b-chu in https://github.com/mosaicml/composer/pull/3676
  • Use a temp path to save local checkpoints for remote save path by @irenedea in https://github.com/mosaicml/composer/pull/3673
  • Loss gen tokens by @dakinggg in https://github.com/mosaicml/composer/pull/3677
  • Refactor maybe_create_object_store_from_uri by @irenedea in https://github.com/mosaicml/composer/pull/3679
  • Don't error if some batch slice has no loss generating tokens by @dakinggg in https://github.com/mosaicml/composer/pull/3682
  • Bump version to 0.27.0.dev0 by @irenedea in https://github.com/mosaicml/composer/pull/3681

New Contributors

  • @ethantang-db made their first contribution in https://github.com/mosaicml/composer/pull/3656

Full Changelog: https://github.com/mosaicml/composer/compare/v0.25.0...v0.26.0

- Python
Published by irenedea over 1 year ago

composer - v0.25.0

What's New

1. Torch 2.4.1 Compatibility (#3609)

We've added support for torch 2.4.1, including necessary patches to Torch.

Deprecations and breaking changes

1. Microbatch device movement (#3567)

Instead of moving the entire batch to device at once, we now move each microbatch to device. This saves memory for large inputs, e.g. multimodal data, when training with many microbatches.

This change may affect certain callbacks which run operations on the batch which require it to be moved to an accelerator ahead of time, such as the two changed in this PR. There shouldn't be too many of these callbacks, so we anticipate this change will be relatively safe.

2. DeepSpeed deprecation version (#3634)

We have update the Composer version that we will remove support for DeepSpeed to 0.27.0. Please reach out on GitHub if you have any concerns about this.

3. PyTorch legacy sharded checkpoint format

PyTorch briefly used a different sharded checkpoint format than the current one, which was quickly deprecated by PyTorch. We have continued to support loading legacy format checkpoints for a while, but we will likely be removing support for this format entirely in an upcoming release. We initially removed support for saving in this format in https://github.com/mosaicml/composer/pull/2262, and the original feature was added in https://github.com/mosaicml/composer/pull/1902. Please reach out if you have concerns or need help converting your checkpoints to the new format.

What's Changed

  • Set dev version back to 0.25.0.dev0 by @snarayan21 in https://github.com/mosaicml/composer/pull/3582
  • Microbatch Device Movement by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3567
  • Init Dist Default None by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3585
  • Explicit None Check in get_device by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3586
  • Update protobuf requirement from <5.28 to <5.29 by @dependabot in https://github.com/mosaicml/composer/pull/3591
  • Bump databricks-sdk from 0.30.0 to 0.31.1 by @dependabot in https://github.com/mosaicml/composer/pull/3592
  • Update ci-testing to 0.2.2 by @dakinggg in https://github.com/mosaicml/composer/pull/3590
  • Bump Mellanox Tools by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3597
  • Roll back ci-testing for daillies by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3598
  • Revert driver changes by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3599
  • Remove step in log_image for MLFlow by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3601
  • Reduce system metrics logging frequency by @chenmoneygithub in https://github.com/mosaicml/composer/pull/3604
  • Bump databricks-sdk from 0.31.1 to 0.32.0 by @dependabot in https://github.com/mosaicml/composer/pull/3608
  • torch2.4.1 by @bigning in https://github.com/mosaicml/composer/pull/3609
  • Test with torch2.4.1 image by @bigning in https://github.com/mosaicml/composer/pull/3610
  • fix 2.4.1 test by @bigning in https://github.com/mosaicml/composer/pull/3612
  • Remove tensor option for globalexception_occured by @irenedea in https://github.com/mosaicml/composer/pull/3611
  • Update error message for overwrite to be more user friendly by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3619
  • Update wandb requirement from <0.18,>=0.13.2 to >=0.13.2,<0.19 by @dependabot in https://github.com/mosaicml/composer/pull/3615
  • Fix RNG key checking by @dakinggg in https://github.com/mosaicml/composer/pull/3623
  • Update datasets requirement from <3,>=2.4 to >=2.4,<4 by @dependabot in https://github.com/mosaicml/composer/pull/3626
  • Disable exceptions for MosaicML Logger by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3627
  • Fix CPU dailies by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3628
  • fix 2.4.1ckpt by @bigning in https://github.com/mosaicml/composer/pull/3629
  • More checkpoint debug logs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3632
  • Lower DeepSpeed deprecation version by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3634
  • Bump version 25 by @dakinggg in https://github.com/mosaicml/composer/pull/3633

Full Changelog: https://github.com/mosaicml/composer/compare/v0.24.1...v0.25.0

- Python
Published by dakinggg almost 2 years ago

composer - v0.24.1

Bug Fixes

1. Disallow passing device_mesh to FSDPConfig (#3580)

Explicitly errors if device_mesh is passed to FSDPConfig. This completes the deprecation from v0.24.0 and also addresses cases where a user specified a device mesh but it was ignored, leading to training with the incorrect parallelism style (e.g., using FSDP instead of HSDP).

What's Changed

  • Bump main version to 0.25.0.dev0 by @snarayan21 in https://github.com/mosaicml/composer/pull/3573
  • update daily by @KevDevSha in https://github.com/mosaicml/composer/pull/3572
  • Bump pandoc from 2.3 to 2.4 by @dependabot in https://github.com/mosaicml/composer/pull/3575
  • Update transformers requirement from !=4.34.0,<4.44,>=4.11 to >=4.11,!=4.34.0,<4.45 by @dependabot in https://github.com/mosaicml/composer/pull/3574
  • Checkpoint backwards compatibility tests for v0.24.0 by @snarayan21 in https://github.com/mosaicml/composer/pull/3579
  • Error if device mesh specified in fsdp config by @snarayan21 in https://github.com/mosaicml/composer/pull/3580
  • Bump version to 0.24.1. by @snarayan21 in https://github.com/mosaicml/composer/pull/3581

Full Changelog: https://github.com/mosaicml/composer/compare/v0.24.0...v0.24.1

- Python
Published by snarayan21 almost 2 years ago

composer - v0.24.0

What's New

1. Torch 2.4 Compatibility (#3542, #3549, #3553, #3552, #3565)

Composer now supports Torch 2.4! We are tracking a few issues with the latest PyTorch we have raised with the PyTorch team related to checkpointing: - [PyTorch Issue] Distributed checkpointing using PyTorch DCP has issues with stateless optimizers, e.g. SGD. We recommend using composer.optim.DecoupledSGDW as a workaround. - [PyTorch Issue] Distributed checkpointing using PyTorch DCP broke backwards compatibility. We have patched this using the following planner, but this may break custom planner loading.

2. New checkpointing APIs (#3447, #3474, #3488, #3452)

We've added new checkpointing APIs to download, upload, and load / save, so that checkpointing is usable outside of a Trainer object. We will be fully migrating to these new APIs in the next minor release.

3: Improved Auto-microbatching (#3510, #3522)

We've fixed deadlocks with auto-microbatching with FSDP, bringing throughput in line with manually setting the microbatch size. This is achieved through enabling sync hooks wherever a training run might OOM to find the correct microbatch size, and disabling these hooks for the rest of training.

Bug Fixes

1. Fix checkpoint symlink uploads (#3376)

Ensures that checkpoint files are uploaded before the symlink file, fixing errors with missing or incomplete checkpoints.

2. Optimizer tracks same parameters after FSDP wrapping (#3502)

When only a subset of parameters should be tracked by the optimizer, FSDP wrapping will now not interfere.

What's Changed

  • Bump ipykernel from 6.29.2 to 6.29.5 by @dependabot in https://github.com/mosaicml/composer/pull/3459
  • Update torchmetrics requirement from <1.3.3,>=0.10.0 to >=1.4.0.post0,<1.4.1 by @dependabot in https://github.com/mosaicml/composer/pull/3460
  • [Checkpoint] Fix symlink issue where symlink file uploaded before checkpoint files upload by @bigning in https://github.com/mosaicml/composer/pull/3376
  • Bump databricks-sdk from 0.28.0 to 0.29.0 by @dependabot in https://github.com/mosaicml/composer/pull/3456
  • Remove Log Exception by @jjanezhang in https://github.com/mosaicml/composer/pull/3464
  • Corrected docs for MFU in SpeedMonitor by @JackZ-db in https://github.com/mosaicml/composer/pull/3469
  • [checkpoint v2] Download api by @bigning in https://github.com/mosaicml/composer/pull/3447
  • Upload api by @bigning in https://github.com/mosaicml/composer/pull/3474
  • [Checkpoint V2] Upload API by @bigning in https://github.com/mosaicml/composer/pull/3488
  • Load api by @eracah in https://github.com/mosaicml/composer/pull/3452
  • Add helpful comment explaining HSDP initialization seeding by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3470
  • Add fit start to mosaicmllogger by @ethanma-db in https://github.com/mosaicml/composer/pull/3467
  • Remove OOM-Driven FSDP Deadlocks and Increase Throughput of Automicrobatching by @JackZ-db in https://github.com/mosaicml/composer/pull/3510
  • Move hooks and fsdp modules onto state rather than trainer by @JackZ-db in https://github.com/mosaicml/composer/pull/3522
  • Bump coverage[toml] from 7.5.4 to 7.6.0 by @dependabot in https://github.com/mosaicml/composer/pull/3471
  • revert a wip PR by @bigning in https://github.com/mosaicml/composer/pull/3475
  • Change FP8 Eval to default to activation dtype by @j316chuck in https://github.com/mosaicml/composer/pull/3454
  • Get a shared file system safe signal file name by @dakinggg in https://github.com/mosaicml/composer/pull/3485
  • Bumping flash attention version to v2.6.2 by @ShashankMosaicML in https://github.com/mosaicml/composer/pull/3489
  • Bump to Pytorch 2.4 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3542
  • Add Torch 2.4 Tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3549
  • Fix torch 2.4 images for tests by @snarayan21 in https://github.com/mosaicml/composer/pull/3553
  • Fix torch 2.4 tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3552
  • Fix bug when subset of model parameters is passed into optimizer with FSDP by @sashaDoubov in https://github.com/mosaicml/composer/pull/3502
  • Correctly process parallelism_config['tp'] when it's a dict by @snarayan21 in https://github.com/mosaicml/composer/pull/3434
  • [torch2.4] Fix sharded checkpointing backward compatibility issue by @bigning in https://github.com/mosaicml/composer/pull/3565
  • [fix-daily] Use composer getmodelstate_dict instead of torch's by @eracah in https://github.com/mosaicml/composer/pull/3492
  • Load Microbatches instead of Entire Batches to GPU by @JackZ-db in https://github.com/mosaicml/composer/pull/3487
  • Make Pytest log in color in Github Action by @eitanturok in https://github.com/mosaicml/composer/pull/3505
  • Revert "Load Microbatches instead of Entire Batches to GPU " by @JackZ-db in https://github.com/mosaicml/composer/pull/3508
  • Bump transformers version by @dakinggg in https://github.com/mosaicml/composer/pull/3511
  • Fix FSDP Config Validation by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3530
  • Add FSDP input validation for useorigparams and activationcpuoffload flag by @j316chuck in https://github.com/mosaicml/composer/pull/3515
  • Fix checkpoint events by @b-chu in https://github.com/mosaicml/composer/pull/3468
  • Patch conf.py for readthedocs sphinx injection deprecation. by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3491
  • save load path in state and pass to mosaicmllogger by @ethanma-db in https://github.com/mosaicml/composer/pull/3506
  • Disable gcs azure daily test by @bigning in https://github.com/mosaicml/composer/pull/3514
  • Update huggingface-hub requirement from <0.24,>=0.21.2 to >=0.21.2,<0.25 by @dependabot in https://github.com/mosaicml/composer/pull/3481
  • restore version on dev by @XiaohanZhangCMU in https://github.com/mosaicml/composer/pull/3451
  • Deprecate deepspeed by @dakinggg in https://github.com/mosaicml/composer/pull/3512
  • Update importlib-metadata requirement from <7,>=5.0.0 to >=5.0.0,<9 by @dependabot in https://github.com/mosaicml/composer/pull/3519
  • Update peft requirement from <0.12,>=0.10.0 to >=0.10.0,<0.13 by @dependabot in https://github.com/mosaicml/composer/pull/3518
  • Use gloo as part of DeviceGPU's process group backend by @snarayan21 in https://github.com/mosaicml/composer/pull/3509
  • Add a monitor of mlflow logger so that it sets run status as failed if main thread exits unexpectedly by @chenmoneygithub in https://github.com/mosaicml/composer/pull/3449
  • Revert "Use gloo as part of DeviceGPU's process group backend (#3509)" by @snarayan21 in https://github.com/mosaicml/composer/pull/3523
  • Fix autoresume docstring (save_overwrite) by @eracah in https://github.com/mosaicml/composer/pull/3526
  • Unpin pip by @dakinggg in https://github.com/mosaicml/composer/pull/3524
  • hasattr check for Wandb 0.17.6 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3531
  • Remove dev on github workflows by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3536
  • Remove dev branch in GPU workflows by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3539
  • restore google cloud object store test by @bigning in https://github.com/mosaicml/composer/pull/3538
  • Update moto[s3] requirement from <5,>=4.0.1 to >=4.0.1,<6 by @dependabot in https://github.com/mosaicml/composer/pull/3516
  • use s3 boto3 Adaptive retry as default retry mode by @bigning in https://github.com/mosaicml/composer/pull/3543
  • Use python 3.11 in GAs by @eitanturok in https://github.com/mosaicml/composer/pull/3529
  • Implement ruff rules enforcing pep 585 by @snarayan21 in https://github.com/mosaicml/composer/pull/3551
  • Update numpy requirement from <2.1.0,>=1.21.5 to >=1.21.5,<2.2.0 by @dependabot in https://github.com/mosaicml/composer/pull/3556
  • Bump databricks-sdk from 0.29.0 to 0.30.0 by @dependabot in https://github.com/mosaicml/composer/pull/3559
  • Update Optim to DecoupledSGD in Notebooks by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3554
  • Remove lambda code eval testing by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3560
  • Restore Azure Tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3561
  • Remove tokens for to_next_epoch by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3562
  • Change iteration timestamp for old checkpoints by @b-chu in https://github.com/mosaicml/composer/pull/3563
  • Fix typo in composer_collect_env by @dakinggg in https://github.com/mosaicml/composer/pull/3566
  • Add default value to get_device() by @coryMosaicML in https://github.com/mosaicml/composer/pull/3568
  • add ghcr and update build matrix generator by @KevDevSha in https://github.com/mosaicml/composer/pull/3465
  • Bump awsofinccl to 1.11.0 by @willgleich in https://github.com/mosaicml/composer/pull/3569
  • allow listed runners by @KevDevSha in https://github.com/mosaicml/composer/pull/3486
  • fix runner linux-ubuntu > ubuntu-latest by @KevDevSha in https://github.com/mosaicml/composer/pull/3571
  • Bump version to v0.24.0 + deprecations by @snarayan21 in https://github.com/mosaicml/composer/pull/3570

New Contributors

  • @ethanma-db made their first contribution in https://github.com/mosaicml/composer/pull/3467
  • @KevDevSha made their first contribution in https://github.com/mosaicml/composer/pull/3465

Full Changelog: https://github.com/mosaicml/composer/compare/v0.23.5...v0.24.0

- Python
Published by snarayan21 almost 2 years ago

composer - v0.23.5

What's New

1. Variable length dataloaders (#3416)

Adds support for dataloaders with rank-dependent lengths. The solution terminates iteration for dataloaders on all ranks when the first dataloader finishes.

Bug Fixed

1. Remove close flush for mosaicml logger (#3446)

Previously, the MosaicML Logger sporadically raised an error when the python interpreter was shutting down as it attempted to flush data on Event.CLOSE using futures, which cannot be scheduled at that time. Instead, we now only block on finishing existing data upload on Event.CLOSE, avoiding scheduling new futures.

What's Changed

  • Update numpy requirement from <1.27.0,>=1.21.5 to >=1.21.5,<2.1.0 by @dependabot in https://github.com/mosaicml/composer/pull/3406
  • Restore dev version by @karan6181 in https://github.com/mosaicml/composer/pull/3417
  • Save checkpoint to disk for API with new save layout by @eracah in https://github.com/mosaicml/composer/pull/3399
  • Patch PyTorch 2.3.1 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3419
  • Fixes some typing issues by @dakinggg in https://github.com/mosaicml/composer/pull/3418
  • Fix style by @b-chu in https://github.com/mosaicml/composer/pull/3420
  • Bump coverage[toml] from 7.5.3 to 7.5.4 by @dependabot in https://github.com/mosaicml/composer/pull/3422
  • Update psutil requirement from <6,>=5.8.0 to >=5.8.0,<7 by @dependabot in https://github.com/mosaicml/composer/pull/3424
  • Add support for variable length dataloaders in DDP by @JAEarly in https://github.com/mosaicml/composer/pull/3416
  • Hsdp + MoE CI tests by @KuuCi in https://github.com/mosaicml/composer/pull/3378
  • Bumping MLflow version to 2.14.1 by @JackZ-db in https://github.com/mosaicml/composer/pull/3425
  • Skip HSDP + TP pytests that require torch 2.3 or above by @KuuCi in https://github.com/mosaicml/composer/pull/3426
  • Remove CodeQL workflow by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3429
  • Remove save overwrite by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3431
  • Fixes to TP Docs by @snarayan21 in https://github.com/mosaicml/composer/pull/3430
  • Lower the system metrics logging frequency to reduce MLflow server's load by @chenmoneygithub in https://github.com/mosaicml/composer/pull/3436
  • Update paramiko requirement from <3,>=2.11.0 to >=3.4.0,<4 by @dependabot in https://github.com/mosaicml/composer/pull/3439
  • Bump CI testing version by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3433
  • Fix docstring for EVALAFTERALL/EVALBEFOREALL by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3445
  • Remove close flush for mosaicml logger by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3446
  • Remove MosaicMLLambdaEvalClient by @aspfohl in https://github.com/mosaicml/composer/pull/3432
  • Relax hf hub pin by @dakinggg in https://github.com/mosaicml/composer/pull/3435
  • Pytest skip 2 by @KuuCi in https://github.com/mosaicml/composer/pull/3448
  • bump version v0.23.5 by @XiaohanZhangCMU in https://github.com/mosaicml/composer/pull/3450

Full Changelog: https://github.com/mosaicml/composer/compare/v0.23.4...v0.23.5

- Python
Published by XiaohanZhangCMU almost 2 years ago

composer - v0.23.4

Bug Fixes

1. Patch PyTorch 2.3.1 (https://github.com/mosaicml/composer/pull/3419)

Fixes missing import when monkeypatching device mesh functions in PyTorch 2.3.1. This is necessary for MoE training.

Full Changelog: https://github.com/mosaicml/composer/compare/v0.23.3...v0.23.4

- Python
Published by mvpatel2000 about 2 years ago

composer - v0.23.3

New Features

1. Update mlflow logger to use the new API with time-dimension to view images in MLFlow (#3286)

We've enhanced the MLflow logger's log_image function to use the new API with time-dimension support, enabling images to be viewed in MLflow.

2. Add logging buffer time to MLFLow logger (#3401)

We've added the logging_buffer_seconds argument to the MLflow logger, which specifies how many seconds to buffer before sending logs to the MLflow tracking server.

Bug Fixes

1. Only require databricks-sdk when on Databricks platform (#3389)

Previously, MLFlow always imported the databricks-sdk. Now, we only require the sdk if on the databricks platform and using databricks secrets to access managed MLFlow.

2. Skip extra dataset state load during job resumption (#3393)

Previously, when loading a checkpoint with train_dataloader, the dataset_state would load first, and if train_dataloader was set again afterward, load_state_dict would be called with a None value. Now, we've added a check in the train_dataloader setter to skip this redundant load.

3. Fix auto-microbatching on CUDA 12.4 (#3400)

In CUDA 12.4, the out-of-memory error message has changed to CUDA error: out of memory. Previously, our logic hardcoded checks for CUDA out of memory when using device_train_microbatch_size="auto". Now, we check for both CUDA out of memory and CUDA error: out of memory.

4. Fix mlflow logging to Databricks workspace file paths which startswith /Shared/ prefix (#3410)

Previously, for MLflow logging, we prepended the path /Users/ to all user-provided logging paths on the Databricks platform, if not specified, including paths starting with /Shared/, which was incorrect since /Shared/ indicates a shared workspace. Now, the /Users/ prepend is skipped for paths starting with /Shared/.

What's Changed

  • Bump CI from 0.0.7 to 0.0.8 by @KuuCi in https://github.com/mosaicml/composer/pull/3383
  • Fix backward compatibility caused by missing eval metrics class by @bigning in https://github.com/mosaicml/composer/pull/3385
  • Bump version v0.23.2 by @bigning in https://github.com/mosaicml/composer/pull/3386
  • Restore dev version by @bigning in https://github.com/mosaicml/composer/pull/3388
  • Only requires databricks-sdk when inside the Databricks platform by @antoinebrl in https://github.com/mosaicml/composer/pull/3389
  • Update packaging requirement from <24.1,>=21.3.0 to >=21.3.0,<24.2 by @dependabot in https://github.com/mosaicml/composer/pull/3392
  • Bump cryptography from 42.0.6 to 42.0.8 by @dependabot in https://github.com/mosaicml/composer/pull/3391
  • Skip extra dataset state load by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3393
  • Remove FSDP restriction from PyTorch 1.13 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3395
  • Check for 'CUDA error: out of memory' when auto-microbatching by @JAEarly in https://github.com/mosaicml/composer/pull/3400
  • Add tokens to iterations by @b-chu in https://github.com/mosaicml/composer/pull/3374
  • Busy wait utils in dist by @dakinggg in https://github.com/mosaicml/composer/pull/3396
  • Add buffering time to mlflow logger by @chenmoneygithub in https://github.com/mosaicml/composer/pull/3401
  • Add missing import for PyTorch 2.3.1 device mesh slicing by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3402
  • Add pynvml to mlflow dep group by @dakinggg in https://github.com/mosaicml/composer/pull/3404
  • min/max flagging added to systemmetricsmonitor with only non-redundant, necessary gpu metrics logged by @JackZ-db in https://github.com/mosaicml/composer/pull/3373
  • Simplify launcher world size parsing by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3398
  • Optionally use flash-attn's CE loss for metrics by @snarayan21 in https://github.com/mosaicml/composer/pull/3394
  • log image fix by @jessechancy in https://github.com/mosaicml/composer/pull/3286
  • [ckpt-rewr] Save state dict API by @eracah in https://github.com/mosaicml/composer/pull/3372
  • Revert "Optionally use flash-attn's CE loss for metrics (#3394)" by @snarayan21 in https://github.com/mosaicml/composer/pull/3408
  • CPU tests image fix by @snarayan21 in https://github.com/mosaicml/composer/pull/3409
  • Add setter for epoch in iteration by @b-chu in https://github.com/mosaicml/composer/pull/3407
  • Move pillow dep as required by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3412
  • fixing mlflow logging to Databricks workspace file paths with /Shared/ prefix by @JackZ-db in https://github.com/mosaicml/composer/pull/3410
  • Bump version v0.23.3 by @karan6181 in https://github.com/mosaicml/composer/pull/3414

New Contributors

  • @JackZ-db made their first contribution in https://github.com/mosaicml/composer/pull/3373

Full Changelog: https://github.com/mosaicml/composer/compare/v0.23.2...v0.23.3

- Python
Published by karan6181 about 2 years ago

composer - v0.23.2

Bug Fixes

  • Fix backward compatibility issue caused by missing eval metrics class

What's Changed:

  • Fix backward compatibility issue caused by missing eval metrics class by @bigning in https://github.com/mosaicml/composer/pull/3385

Full Changelog: https://github.com/mosaicml/composer/compare/v0.23.1...release/v0.23.2

- Python
Published by bigning about 2 years ago

composer - v0.23.1

What's New

1. PyTorch 2.3.1 Upgrade

Composer now supports PyTorch 2.3.1.

What's Changed

  • Torch 2.3.1 Upgrade by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3367
  • Fix monkeypatch imports by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3375
  • Remove unnecessary state dict and loadstatedict functions by @eracah in https://github.com/mosaicml/composer/pull/3361
  • Adding checkpoint backwards compatibility tests after 0.23.0 release by @bigning in https://github.com/mosaicml/composer/pull/3377
  • preparefsdpmodule documentation fix by @KuuCi in https://github.com/mosaicml/composer/pull/3379
  • Composer version bump to v0.23.1 by @snarayan21 in https://github.com/mosaicml/composer/pull/3380
  • Clear caplog and use as context manager in test_logging by @snarayan21 in https://github.com/mosaicml/composer/pull/3382

Full Changelog: https://github.com/mosaicml/composer/compare/v0.23.0...v0.23.1

- Python
Published by mvpatel2000 about 2 years ago

composer - v0.23.0

What's New

1. Parallelism V2 + Tensor Parallel (#3335)

Composer now supports PyTorch's implementation of tensor parallelism. As part of this, we've revamped and simplified how Composer does distributed training. Previously, Composer accepted a fsdp_config attribute in the Trainer: trainer = Trainer(model, fsdp_config = {'sharding_strategy': 'FULL_SHARD'}) As we generalize to more forms of parallelism, we've deprecated fsdp_config in favor of parallelism_config: trainer = Trainer( model = model, ... parallelism_config = { 'fsdp': { 'sharding_strategy': 'FULL_SHARD', 'data_parallel_shard_degree': 2, # Size of shard dimension 'data_parallel_replicate_degree': 2, # Size of replicate dimension }, 'tp_config': { 'tensor_parallel_degree': 2, # Size of TP dimension 'layer_plan': ... # describes how to TP layers } } ) As part of this change, we now default to using DTensor for parallelism with PyTorch FSDP. PyTorch has deprecated ShardedTensor, so this migrates to the new backend which avoids various checkpointing bugs.

See the docs for tensor parallel for more information. Note that tensor parallel is still experimental and may be subject to API breaking changes. All checkpointing features may also not work with this parallelism.

2. MLFLow API Simplification

Previously, MLFlow logger required a tracking URI and an absolute user path when using MLFlow with Databricks: ``` mlflowlogger = MLFlowLogger( trackinguri = 'databricks', experiment_name = '/Users/xxx.yyy@zzz.com/my-first-project/' )

trainer = Trainer( model = model, ... loggers = mlflowlogger, ) `` Now, if you are using Databricks secrets as an environment variable, Composer will autopopulatetrackinguriand theexperimentnameprefix: `` trainer = Trainer( model = model, ... loggers = MLFlowLogger(experimentname='my-first-project'), ) ```

3. Wallclock Save Interval

Composer now supports setting a save interval in wallclock time: trainer = Trainer( model = model, ... save_interval='30m', ) Note that most durations, such as max_duration, do not accept wallclock time, and the initial version of this feature is only limited to a subset of time features like save_interval.

Bug Fixes

  • Don't close the engine if it's already closed in https://github.com/mosaicml/composer/pull/3143
  • Fix HF tests with Pin in https://github.com/mosaicml/composer/pull/3248
  • Fix backwards compatibility tests in https://github.com/mosaicml/composer/pull/3252
  • Fix unexpected remote checkpointing downloading in https://github.com/mosaicml/composer/pull/3271
  • Fix HSDP with ShardDegree < 8 in https://github.com/mosaicml/composer/pull/3313 ## What's Changed
  • Remove CPU offload for DDP/single-gpu by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3242
  • Adding more checkpoint backwards compatability tests by @snarayan21 in https://github.com/mosaicml/composer/pull/3244
  • Don't close the engine if its already closed by @dakinggg in https://github.com/mosaicml/composer/pull/3143
  • Replace evaluator.dataloader.device_eval_batch_size with evaluator.device_eval_microbatch_size by @ShashankMosaicML in https://github.com/mosaicml/composer/pull/3247
  • Fix HF tests with Pin by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3248
  • Remove ICL metrics by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3243
  • Add offset and length arguments for checkpoint validation functions by @irenedea in https://github.com/mosaicml/composer/pull/3246
  • Fix backwards compatibility tests, raise error for torch version mismatch by @snarayan21 in https://github.com/mosaicml/composer/pull/3252
  • Bump cryptography from 41.0.5 to 42.0.6 by @dependabot in https://github.com/mosaicml/composer/pull/3256
  • Bump databricks-sdk from 0.25.1 to 0.27.0 by @dependabot in https://github.com/mosaicml/composer/pull/3257
  • Improve GCS Object Store by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3251
  • add retry to gcs.upload_file by @bigning in https://github.com/mosaicml/composer/pull/3232
  • Add unit test support for full state dict + loadweightsonly and saveweightsonly by @eracah in https://github.com/mosaicml/composer/pull/3260
  • will/bumpawsofi_nccl by @willgleich in https://github.com/mosaicml/composer/pull/3253
  • Fix daily GCS tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3268
  • Fix: SAM not working with FSDP/DeepSpeed and LR scheduler. by @Joqsan in https://github.com/mosaicml/composer/pull/3259
  • Add upload timeout patch to mlflow on azure by @dakinggg in https://github.com/mosaicml/composer/pull/3265
  • Add option to stagger uploads based on local rank by @dakinggg in https://github.com/mosaicml/composer/pull/3275
  • explicit close by @dakinggg in https://github.com/mosaicml/composer/pull/3276
  • Update NCCLASYNCERROR_HANDLING env variable by @priba in https://github.com/mosaicml/composer/pull/3267
  • new dist_cp save planner to fix issue that each rank needs to download all checkpoint files by @bigning in https://github.com/mosaicml/composer/pull/3271
  • Bump to torch 2.2.2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3283
  • Fix UCObjectStore.list_objects by @dakinggg in https://github.com/mosaicml/composer/pull/3284
  • Update peft version by @dakinggg in https://github.com/mosaicml/composer/pull/3287
  • replace load_fsdp_monolith_ with load_monolith_ by @milocress in https://github.com/mosaicml/composer/pull/3288
  • Return PyTorch Latest by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3290
  • Fix daily tests by filtering a warning by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3291
  • remove orig_params check by @milocress in https://github.com/mosaicml/composer/pull/2981
  • [ckpt-rewr] Get Model State Dict Util Function by @eracah in https://github.com/mosaicml/composer/pull/3250
  • Skip compression check with symlink files by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3300
  • Monkeypatch Device Mesh ND Slicing by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3302
  • Bump coverage[toml] from 7.4.4 to 7.5.1 by @dependabot in https://github.com/mosaicml/composer/pull/3305
  • Bump databricks-sdk from 0.27.0 to 0.27.1 by @dependabot in https://github.com/mosaicml/composer/pull/3306
  • Update transformers requirement from !=4.34.0,<4.41,>=4.11 to >=4.11,!=4.34.0,<4.42 by @dependabot in https://github.com/mosaicml/composer/pull/3307
  • Allow overwrite on upload retry in remote uploader downloader by @irenedea in https://github.com/mosaicml/composer/pull/3310
  • Update platform references by @aspfohl in https://github.com/mosaicml/composer/pull/3304
  • Fix cometml unit tests by @j316chuck in https://github.com/mosaicml/composer/pull/3314
  • Fix HSDP with ShardDegree < 8 by @bigning in https://github.com/mosaicml/composer/pull/3313
  • Update docstring for getmodelstate_dict by @eracah in https://github.com/mosaicml/composer/pull/3318
  • Tensor Parallelism Integration by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3269
  • Bugfixes to FSDP + TP by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3323
  • Wct save interval by @KuuCi in https://github.com/mosaicml/composer/pull/3264
  • Wrap ChunkedEncodingError from UCObjectStore by @irenedea in https://github.com/mosaicml/composer/pull/3321
  • Add checkpoint events to mosaicml logger by @b-chu in https://github.com/mosaicml/composer/pull/3316
  • Bump timeout to fix daily tests by @j316chuck in https://github.com/mosaicml/composer/pull/3325
  • Fix FSDP ckpt by filtering User Waring by @j316chuck in https://github.com/mosaicml/composer/pull/3327
  • Revert TP integration by @dakinggg in https://github.com/mosaicml/composer/pull/3328
  • Bump databricks-sdk from 0.27.1 to 0.28.0 by @dependabot in https://github.com/mosaicml/composer/pull/3331
  • Bump sphinxcontrib-katex from 0.9.6 to 0.9.10 by @dependabot in https://github.com/mosaicml/composer/pull/3333
  • Update peft requirement from <0.11,>=0.10.0 to >=0.10.0,<0.12 by @dependabot in https://github.com/mosaicml/composer/pull/3332
  • Bump coverage[toml] from 7.5.1 to 7.5.2 by @dependabot in https://github.com/mosaicml/composer/pull/3330
  • Update protobuf requirement from <5.27 to <5.28 by @dependabot in https://github.com/mosaicml/composer/pull/3329
  • Improving memory snapshot by @cli99 in https://github.com/mosaicml/composer/pull/3315
  • Add A10 to speed monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3336
  • change ComposerModel output type by @hyenal in https://github.com/mosaicml/composer/pull/3341
  • Remove evaluator state by @snarayan21 in https://github.com/mosaicml/composer/pull/3339
  • [ckpt-rewr] Generate Metadata State Dict API by @eracah in https://github.com/mosaicml/composer/pull/3311
  • Tensor Parallelism v2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3335
  • Migrate Type Hints for PEP 585 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3344
  • [checkpoint v2] add remote uploader class by @bigning in https://github.com/mosaicml/composer/pull/3303
  • Raise errors on all ranks for checkpoint download failures by @irenedea in https://github.com/mosaicml/composer/pull/3345
  • Add return type annotation when init doesn't take any argument by @antoinebrl in https://github.com/mosaicml/composer/pull/3347
  • [ckpt-rewr] Get Optim State Dict Util API by @eracah in https://github.com/mosaicml/composer/pull/3299
  • Fix type check issue with device train microbatch size by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3349
  • Add torch distributed checkpointing monkeypatches to enable TE checkpointing for extra_state attribute by @j316chuck in https://github.com/mosaicml/composer/pull/3298
  • Bump coverage[toml] from 7.5.2 to 7.5.3 by @dependabot in https://github.com/mosaicml/composer/pull/3353
  • Update wandb requirement from <0.17,>=0.13.2 to >=0.13.2,<0.18 by @dependabot in https://github.com/mosaicml/composer/pull/3352
  • Optional CheckpointSaver instantiation inside the Trainer by @antoinebrl in https://github.com/mosaicml/composer/pull/3334
  • MLFlow better experiment defaults by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3356
  • Rename metadata keys by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3354
  • Dataclasses for ParallelismConfig by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3346
  • Upgrade Mofed with apt by @willgleich in https://github.com/mosaicml/composer/pull/3340
  • Multi gpu ci test by @KuuCi in https://github.com/mosaicml/composer/pull/3312
  • Autoresume Validation with Max Duration by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3358
  • Deprecate and bump verstion to 0.23.0 by @bigning in https://github.com/mosaicml/composer/pull/3359

New Contributors

  • @Joqsan made their first contribution in https://github.com/mosaicml/composer/pull/3259

Full Changelog: https://github.com/mosaicml/composer/compare/v0.22.0...v0.23.0

- Python
Published by bigning about 2 years ago

composer - v0.22.0

What's New

🔥 Support for PyTorch v2.3.0

Composer now supports the recently-released PyTorch version 2.3.0! Please raise any issues with us so we can address them.

Bug Fixes

  • Fixing checks for device microbatch size for sequence parallelism in #3200
  • Fixing token logging in #3206
  • Search for run name in MLFlowLogger in #3215
  • Fix FQN names with activation checkpointing in #3210
  • Strict weight matching for checkpoint loading in #3219

What's Changed

  • Bump transformers by @dakinggg in https://github.com/mosaicml/composer/pull/3197
  • Add deprecation warnings for ICL datasets/helper functions/metrics by @bmosaicml in https://github.com/mosaicml/composer/pull/3125
  • Bump traitlets from 5.14.2 to 5.14.3 by @dependabot in https://github.com/mosaicml/composer/pull/3204
  • Raise LR schedule warnings only when necessary by @snarayan21 in https://github.com/mosaicml/composer/pull/3207
  • Add torch 2.3 support by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3209
  • Add torch 2.3 CI/CD by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3211
  • Fix daily test images by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3212
  • Try FAv2 2.5.7 from source by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3213
  • Update tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3217
  • Fix torch 2.3 GPU tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3218
  • Use flash-attn 2.5.8 with no build isolation in docker images by @snarayan21 in https://github.com/mosaicml/composer/pull/3224
  • Add a torch.cuda.emptycache() in utils.savecheckpoint by @bfontain in https://github.com/mosaicml/composer/pull/3216
  • Require 2 steps for GS object store by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3228
  • Add rename_metrics to Mlflow logger by @hanlint in https://github.com/mosaicml/composer/pull/3225
  • Fix daily tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3229
  • Change precision for daily tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3231
  • Create new Mlflow run by default and introduce run_group by @chenmoneygithub in https://github.com/mosaicml/composer/pull/3208
  • Fix daily test pt 4 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3233
  • Deprecate and bump version to 0.22 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3230
  • Fix daily tests v5 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3234
  • Fix daily v6 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3235
  • fix daily tests v7 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3236
  • Raise the daily test timeout by @dakinggg in https://github.com/mosaicml/composer/pull/3241
  • Accelerate GPU tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3237
  • Make sharded checkpoint loading backwards-compatible by @snarayan21 in https://github.com/mosaicml/composer/pull/3240

Full Changelog: https://github.com/mosaicml/composer/compare/v0.21.3...v0.22.0

- Python
Published by snarayan21 about 2 years ago

composer - v0.21.3

Bug Fixes

1. Increased Robustness to Checkpoint Loading

We've patched several edge cases in loading sharded checkpoints, especially with DTensors, which should decrease memory usage when loading checkpoints. We've also hardened retry logic against object cloud failure, ensuring higher robustness to transient network issues.

What's Changed

  • Raise daily test timeout by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3172
  • fix remote file naming by @cli99 in https://github.com/mosaicml/composer/pull/3173
  • [fix] DTensor + SHARDGRADOP + useorigparams by @bigning in https://github.com/mosaicml/composer/pull/3175
  • Bump db sdk by @dakinggg in https://github.com/mosaicml/composer/pull/3176
  • Build latest pytorch nightly images by @dakinggg in https://github.com/mosaicml/composer/pull/3179
  • Add FP8 TransformerEngine activation checkpointing by @cli99 in https://github.com/mosaicml/composer/pull/3156
  • Enabling the computation of validation loss and other metrics when using sequence parallelism by @ShashankMosaicML in https://github.com/mosaicml/composer/pull/3183
  • Update mosaicfsdputils.py by @vchiley in https://github.com/mosaicml/composer/pull/3185
  • Fix the FSDP.optimstatedicttoload OOM by @bigning in https://github.com/mosaicml/composer/pull/3184
  • Revert "Update mosaicfsdputils.py" by @vchiley in https://github.com/mosaicml/composer/pull/3187
  • Bump databricks-sdk from 0.24.0 to 0.25.1 by @dependabot in https://github.com/mosaicml/composer/pull/3190
  • Add version tag to local builds by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3188
  • Update NeptuneLogger by @AleksanderWWW in https://github.com/mosaicml/composer/pull/3165
  • Filter neptune warning in doctests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3195
  • Removal of metrics deepcopy before computing the metrics by @gregjauvion in https://github.com/mosaicml/composer/pull/3180
  • Fix MLFlow Tag Name for Resumption by @KuuCi in https://github.com/mosaicml/composer/pull/3194
  • Fix mistral gating by @dakinggg in https://github.com/mosaicml/composer/pull/3199
  • Bump version to 0.21.3 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3198

New Contributors

  • @gregjauvion made their first contribution in https://github.com/mosaicml/composer/pull/3180

Full Changelog: https://github.com/mosaicml/composer/compare/v0.21.2...v0.21.3

- Python
Published by mvpatel2000 about 2 years ago

composer - v0.21.2

Bug Fixes

1. Enable torch 2.2.2 (#3161)

Composer currently monkeypatches PyTorch for nightly versions in order to fix upstream bugs. With the release of torch 2.2.2, these monkeypatches were mistakenly applied to the stable release due to incorrect gating on imports. This release fixes the gating, enabling torch 2.2.2.

2. MPS Metric Computation on CPU (#3105)

Due to bugs in computing torchmetrics on Mac devices, we move metric computation onto CPU. This previously had issues with data not properly moving to CPU.

Thank you to @hyenal for this contribution!

3. Batch Sampler Support (#3105)

Composer now supports batch sampler, which previously resulted in an error if specified in the dataloader.

Thank you to @Ghelfi for this contribution!

What's Changed

  • Make codequality callable by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3133
  • Explicitly print checkpoint downloading exception by @bigning in https://github.com/mosaicml/composer/pull/3131
  • Change release actions by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3136
  • Passing rank and numreplicas to dist.getsampler by @ShashankMosaicML in https://github.com/mosaicml/composer/pull/3137
  • Fix broadcast by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3138
  • Compressor fixes by @mbway in https://github.com/mosaicml/composer/pull/3142
  • In case of MPS device also copy batch to CPU by @hyenal in https://github.com/mosaicml/composer/pull/3105
  • Composer object store download retry by @bigning in https://github.com/mosaicml/composer/pull/3140
  • Bump databricks-sdk from 0.22.0 to 0.23.0 by @dependabot in https://github.com/mosaicml/composer/pull/3144
  • Update transformers requirement from !=4.34.0,<4.39,>=4.11 to >=4.11,!=4.34.0,<4.40 by @dependabot in https://github.com/mosaicml/composer/pull/3148
  • Update protobuf requirement from <3.21 to <5.27 by @dependabot in https://github.com/mosaicml/composer/pull/3147
  • Bump traitlets from 5.14.1 to 5.14.2 by @dependabot in https://github.com/mosaicml/composer/pull/3145
  • Bump to 0.21 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3150
  • Fixing sequence parallel error conditions and adding type float for microbatch_size in typehints by @ShashankMosaicML in https://github.com/mosaicml/composer/pull/3139
  • Fix torch monkeypatch version check by @dakinggg in https://github.com/mosaicml/composer/pull/3155
  • Update torchmetrics requirement from <1.3.2,>=0.10.0 to >=0.10.0,<1.3.3 by @dependabot in https://github.com/mosaicml/composer/pull/3157
  • Bump gitpython from 3.1.42 to 3.1.43 by @dependabot in https://github.com/mosaicml/composer/pull/3160
  • Prevent crash if signal handler cannot be set by @mbway in https://github.com/mosaicml/composer/pull/3152
  • Pin pillow for code quality workflow by @dakinggg in https://github.com/mosaicml/composer/pull/3162
  • Fix torch version check by @dakinggg in https://github.com/mosaicml/composer/pull/3161
  • add more retry to checkpoint downloading by @bigning in https://github.com/mosaicml/composer/pull/3164
  • Append to gpu rank log files instead of throwing error by @jjanezhang in https://github.com/mosaicml/composer/pull/3166
  • Call set_epoch on Dataloader.batch_sampler if defined by @Ghelfi in https://github.com/mosaicml/composer/pull/3124
  • Bump version to 0.21.2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3168

New Contributors

  • @hyenal made their first contribution in https://github.com/mosaicml/composer/pull/3105
  • @Ghelfi made their first contribution in https://github.com/mosaicml/composer/pull/3124

Full Changelog: https://github.com/mosaicml/composer/compare/v0.21.1...v0.21.2

- Python
Published by mvpatel2000 about 2 years ago

composer - v0.21.1

Bug Fixes

1. Fix to HSDP checkpoint loading

The previous release broke checkpoint loading when using HSDP with mutliple replicas. This patch release fixes checkpoint loading.

What's Changed

  • Fix broadcast by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3138

Full Changelog: https://github.com/mosaicml/composer/compare/v0.21.0...v0.21.1

- Python
Published by mvpatel2000 over 2 years ago

composer - v0.21.0

What's New

1. Aggregate Memory Monitoring (#3042)

The Memory Monitor callback now supports aggregating memory statistics across nodes. Getting summary stats for a run's memory usage across the cluster can dramatically help debug straggler nodes or non-homogenous workloads. The memory monitor can now aggregate and log combined values at a user specified frequency.

Example: ``` from composer import Trainer from composer.callbacks import MemoryMonitor

trainer = Trainer( model=model, traindataloader=traindataloader, optimizers=optimizer, maxduration="1ep", callbacks=[ MemoryMonitor( distaggregatebatchinterval=10, # aggregate every 10 batches ) ], ) ```

2. Advanced Compression Options (#3118)

Large model checkpoints can be expensive to store and transfer. In this release, we've upgraded our compression support to accept several new formats which result in better compression-time tradeoffs using CLI tools. In order to use compression, you can post-fix your checkpoint name with a compression path. We know support the following extensions: - bz2 - gz - lz4 - lzma - lzo - xz - zst

Example: ``` from composer import Trainer from composer.callbacks import MemoryMonitor

trainer = Trainer( model=model, traindataloader=traindataloader, optimizers=optimizer, maxduration="1ep", savefilename='ep{epoch}-ba{batch}-rank{rank}.pt.lz4', ) ```

Thank you to @mbway for adding this support!

What's Changed

  • Rename composerrunname tag to run_name when logging to MLflow by @jerrychen109 in https://github.com/mosaicml/composer/pull/3040
  • enable aggregate mem monitoring by @vchiley in https://github.com/mosaicml/composer/pull/3042
  • Bump junitparser from 3.1.1 to 3.1.2 by @dependabot in https://github.com/mosaicml/composer/pull/3056
  • Add SHARDGRADOP to device mesh error check by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3058
  • Add torch 2.2.1 support by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3059
  • Use testing repo actions for linting by @b-chu in https://github.com/mosaicml/composer/pull/3060
  • Link autoresume docs back to watchdog by @aspfohl in https://github.com/mosaicml/composer/pull/3052
  • Deprecate get_state and remove deprecations by @b-chu in https://github.com/mosaicml/composer/pull/3017
  • Bump version to 0.20.1 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3061
  • Remove s3_bucket pytest cli flag by @b-chu in https://github.com/mosaicml/composer/pull/3064
  • Remove s3_bucket flag from gpu test by @b-chu in https://github.com/mosaicml/composer/pull/3065
  • Clean Up OOM Observer Remote Uploader Download path by @j316chuck in https://github.com/mosaicml/composer/pull/3070
  • Fix daily test for iteration by @b-chu in https://github.com/mosaicml/composer/pull/3068
  • Remove "generationlength" in favor of "generationkwargs" by @maxisawesome in https://github.com/mosaicml/composer/pull/3014
  • Bump packaging by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3072
  • Use ci-testing repo for CPU and GPU tests by @b-chu in https://github.com/mosaicml/composer/pull/3062
  • Add new torch monkeypatches to Composer by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3063
  • Add initial support for neuron devices by @bfontain in https://github.com/mosaicml/composer/pull/3049
  • Stripping whitespaces as default for QATask ICL eval by @ksreenivasan in https://github.com/mosaicml/composer/pull/3073
  • Add ICL base class to all by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3079
  • pass prelimiter into ALL ICL datasets by @eitanturok in https://github.com/mosaicml/composer/pull/3069
  • Bump sentencepiece from 0.1.99 to 0.2.0 by @dependabot in https://github.com/mosaicml/composer/pull/3083
  • Add Iteration related Events to callbacks by @b-chu in https://github.com/mosaicml/composer/pull/3077
  • Add Iteration related Events by @b-chu in https://github.com/mosaicml/composer/pull/3076
  • Bump CI/CD to v3 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3086
  • Add docstring to iterationlength by @b-chu in https://github.com/mosaicml/composer/pull/3088
  • Check FSDP module has devicemesh before getting it by @eracah in https://github.com/mosaicml/composer/pull/3091
  • Bump minor version in base image by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3092
  • Enforce async logging flush in mlflow logger at post_close call by @chenmoneygithub in https://github.com/mosaicml/composer/pull/3093
  • Warning log to info log by @aspfohl in https://github.com/mosaicml/composer/pull/3096
  • Bump transformers by @dakinggg in https://github.com/mosaicml/composer/pull/3095
  • Change style for splitting on commas by @b-chu in https://github.com/mosaicml/composer/pull/3078
  • Remove slash by @b-chu in https://github.com/mosaicml/composer/pull/3098
  • Allowing for fractional number of samples per rank by @ShashankMosaicML in https://github.com/mosaicml/composer/pull/3075
  • Output eval logging (batch level) by @maxisawesome in https://github.com/mosaicml/composer/pull/2977
  • Replace errors with warnings for eval args by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3100
  • Ability to load sharded checkpoints with remote symlink load_path by @eracah in https://github.com/mosaicml/composer/pull/3097
  • Improvements to NeptuneLogger by @AleksanderWWW in https://github.com/mosaicml/composer/pull/3085
  • Revert "Improvements to NeptuneLogger" by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3111
  • Bump mlflow min pin by @dakinggg in https://github.com/mosaicml/composer/pull/3110
  • Fix rounding issue in interval calculation by @dakinggg in https://github.com/mosaicml/composer/pull/3109
  • Bump coverage[toml] from 7.4.1 to 7.4.3 by @dependabot in https://github.com/mosaicml/composer/pull/3102
  • Uses v0.0.4 of ci-testing by @b-chu in https://github.com/mosaicml/composer/pull/3112
  • Add versioned deprecation warning by @irenedea in https://github.com/mosaicml/composer/pull/2984
  • Update Flash Attention to 2.5.5 by @Skylion007 in https://github.com/mosaicml/composer/pull/3113
  • Setting the max duration to current timestamp in the same units as cu… by @ShashankMosaicML in https://github.com/mosaicml/composer/pull/3090
  • Making defaultsplitbatch public by @ShashankMosaicML in https://github.com/mosaicml/composer/pull/3116
  • Adding log exception to Mosaic Logger by @jjanezhang in https://github.com/mosaicml/composer/pull/3089
  • Add checks to schedulers by @b-chu in https://github.com/mosaicml/composer/pull/3115
  • Removed default attrs from exception class in the attrs dict by @jjanezhang in https://github.com/mosaicml/composer/pull/3126
  • Bump coverage[toml] from 7.4.3 to 7.4.4 by @dependabot in https://github.com/mosaicml/composer/pull/3121
  • Refactor initialization by @Practicinginhell in https://github.com/mosaicml/composer/pull/3127
  • Bump databricks sdk version by @dakinggg in https://github.com/mosaicml/composer/pull/3128
  • Update packaging requirement from <23.3,>=21.3.0 to >=21.3.0,<24.1 by @dependabot in https://github.com/mosaicml/composer/pull/3122
  • Remove rng from saveweightsonly ckpt by @eracah in https://github.com/mosaicml/composer/pull/3129
  • More compression options by @mbway in https://github.com/mosaicml/composer/pull/3118
  • Only broadcast distcp files by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3130
  • Bump version to 0.21 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3132

New Contributors

  • @ksreenivasan made their first contribution in https://github.com/mosaicml/composer/pull/3073
  • @eitanturok made their first contribution in https://github.com/mosaicml/composer/pull/3069
  • @Practicinginhell made their first contribution in https://github.com/mosaicml/composer/pull/3127
  • @mbway made their first contribution in https://github.com/mosaicml/composer/pull/3118

Full Changelog: https://github.com/mosaicml/composer/compare/v0.20.1...v0.21.0

- Python
Published by mvpatel2000 over 2 years ago

composer - v0.20.1

What's New

1. Torch 2.2.1 Support

Composer now supports torch 2.2.1! We've raised the pin to allow the latest torch, and we've upstreamed all torch monkeypatches so Composer can run out of the box with the latest and greatest torch features.

What's Changed

  • Add torch 2.2.1 support by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3059
  • Bump version to 0.20.1 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3061

- Python
Published by mvpatel2000 over 2 years ago

composer - v0.20.0

What's New

1. New Neptune Logger

Composer now supports logging training data to neptune.ai using the NeptuneLogger. To get started:

```python neptuneproject = 'testproject' neptuneapitoken = 'test_token'

neptunelogger = NeptuneLogger( project=neptuneproject, apitoken=neptuneapitoken, rankzeroonly=False, mode='debug', uploadartifacts=True, ) ```

We also have an example project demonstrating all the awesome things you can do with this integration! image

Additional information on the NeptuneLogger can be found in the docs.

2. OOM observer callback with memory visualizations

Composer now has an OOM observer callback. When a model runs out of memory, this callback helps produce a trace which identifies memory allocations, which can be critical to designing strategies to mitigate memory usage.

Example: ```python from composer import Trainer from composer.callbacks import OOMObserver

constructing trainer object with this callback

trainer = Trainer( model=model, traindataloader=traindataloader, evaldataloader=evaldataloader, optimizers=optimizer, maxduration="1ep", callbacks=[ OOMObserver( folder="traces", overwrite=true, filename="rank{rank}oom", remotefilename="oci://bucketname/{runname}/oomtraces/rank{rank}_oom", ) ], ) ```

OOM Visualization:

Screenshot 2024-02-23 at 9.30.03 AM

3. Log all gpu rank stdout/err to MosaicML platform

Composer has expanded it's integration with the MosaicML platform.. Now, we can view all gpu rank stdout/stderrs with MCLI logs to enable more comprehensive analysis of jobs.

Example commands:

mcli logs <run-name> --node x --gpu x Note, this defaults to node rank 0 if --node is not provided.

Also, we can find the logs of any global gpu rank with the command: mcli logs <run-name> --global-gpu-rank x

Bug Fixes

  • Only save RNG on rank 0 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2998
  • [Auto-microbatch fix] FSDP reshard and cleanup after OOM to fix the cuda memory leak by @bigning in https://github.com/mosaicml/composer/pull/3030
  • Fix skip_first for profiler during resumption by @bigning in https://github.com/mosaicml/composer/pull/2986
  • Race condition fix in checkpoint loading util by @jessechancy in https://github.com/mosaicml/composer/pull/3001

What's Changed

  • Remove .ci folder and move FILE_HEADER and CODEOWNERS by @irenedea in https://github.com/mosaicml/composer/pull/2957
  • Modify UCObjectStore.list_objects to lists all files recursively by @irenedea in https://github.com/mosaicml/composer/pull/2959
  • Refactor MemorySnapshot by @cli99 in https://github.com/mosaicml/composer/pull/2960
  • Log all gpu rank stdout/err to MosaicML platform by @jjanezhang in https://github.com/mosaicml/composer/pull/2839
  • Add Torch 2.2 tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2970
  • Memory snapshot dump pickle by @cli99 in https://github.com/mosaicml/composer/pull/2968
  • Neptune logger by @AleksanderWWW in https://github.com/mosaicml/composer/pull/2447
  • Fix torch pins in tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2973
  • Add a registermodelwithrunid api to MLflowLogger by @dakinggg in https://github.com/mosaicml/composer/pull/2967
  • Remove bespoke codeowners by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2971
  • Add a BEFORE_LOAD event by @snarayan21 in https://github.com/mosaicml/composer/pull/2974
  • More torch 2.2 fixes by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2975
  • Adding the step argument to logger.log_table by @ShashankMosaicML in https://github.com/mosaicml/composer/pull/2961
  • Fix daily tests for torch 2.2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2980
  • Format load_path with name by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2978
  • Bump to 0.19.1 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2979
  • Fix UC object store bugfix by @nancyhung in https://github.com/mosaicml/composer/pull/2982
  • [Bugfix][UC] Add back the full object path by @nancyhung in https://github.com/mosaicml/composer/pull/2988
  • Minor cleanup of UC getobjectsize by @dakinggg in https://github.com/mosaicml/composer/pull/2989
  • Pin UC to earlier version by @dakinggg in https://github.com/mosaicml/composer/pull/2990
  • Revert "fix skip_first for resumption" by @bigning in https://github.com/mosaicml/composer/pull/2991
  • Broadcast files for HSDP by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2914
  • Bump ipykernel from 6.29.0 to 6.29.2 by @dependabot in https://github.com/mosaicml/composer/pull/2994
  • Bump yamllint from 1.33.0 to 1.34.0 by @dependabot in https://github.com/mosaicml/composer/pull/2995
  • Refactor update_metric by @maxisawesome in https://github.com/mosaicml/composer/pull/2965
  • Add azure integration test by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2996
  • Fix Profiler schedule skip_first by @bigning in https://github.com/mosaicml/composer/pull/2992
  • Remove planner validation by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2985
  • Fix load for non-HSDP device mesh by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2997
  • Update NCCL arg since torch deprecated old one by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3000
  • Add bias argument to LPLN by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2999
  • Revert "Add bias argument to LPLN" by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3003
  • Revert "Update NCCL arg since torch deprecated old one" by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3004
  • Add torch 2.3 image for aws cluster by @j316chuck in https://github.com/mosaicml/composer/pull/3002
  • Patch torch 2.3 aws naming by @j316chuck in https://github.com/mosaicml/composer/pull/3006
  • Add debug log before training loop starts by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3005
  • Deprecate ffcv code by @j316chuck in https://github.com/mosaicml/composer/pull/3007
  • Remove log for mosaicml logger by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3008
  • [EASY] Always log 1st batch when resuming training by @bigning in https://github.com/mosaicml/composer/pull/3009
  • Use reusable actions for linting by @b-chu in https://github.com/mosaicml/composer/pull/2948
  • Make CodeEval respect deviceevalbatch_size by @josejg in https://github.com/mosaicml/composer/pull/2969
  • Use Mosaic constant for GPU file prefix by @jjanezhang in https://github.com/mosaicml/composer/pull/3018
  • Fall back to normal logging when gpu prefix is not present by @jjanezhang in https://github.com/mosaicml/composer/pull/3020
  • Revert "Use reusable actions for linting" to fix CI/CD by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3023
  • Change to pullrequesttarget by @b-chu in https://github.com/mosaicml/composer/pull/3025
  • Bump gitpython from 3.1.41 to 3.1.42 by @dependabot in https://github.com/mosaicml/composer/pull/3031
  • Bump yamllint from 1.34.0 to 1.35.1 by @dependabot in https://github.com/mosaicml/composer/pull/3034
  • Update torchmetrics requirement from <1.3.1,>=0.10.0 to >=0.10.0,<1.3.2 by @dependabot in https://github.com/mosaicml/composer/pull/3035
  • Bump pypandoc from 1.12 to 1.13 by @dependabot in https://github.com/mosaicml/composer/pull/3033
  • Add tensorboard images support by @Menduist in https://github.com/mosaicml/composer/pull/3021
  • Add sorted to logs for checkpoint broadcast by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3036
  • Friendlier device mesh error by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3039
  • Upgrade to python3.11 for torch nightly by @j316chuck in https://github.com/mosaicml/composer/pull/3038
  • Download symlink once by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3043
  • Add min size to OCI download by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3044
  • Lint fix by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3045
  • Revert "Change to pullrequesttarget " by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3047
  • Bump composer version 0.19.2 by @j316chuck in https://github.com/mosaicml/composer/pull/3048
  • Update XLA support by @bfontain in https://github.com/mosaicml/composer/pull/2964
  • Bump composer version 0.20.0 by @j316chuck in https://github.com/mosaicml/composer/pull/3051
  • Update ruff. Fix PLE & LOG lints by @Skylion007 in https://github.com/mosaicml/composer/pull/3050

New Contributors

  • @AleksanderWWW made their first contribution in https://github.com/mosaicml/composer/pull/2447
  • @ShashankMosaicML made their first contribution in https://github.com/mosaicml/composer/pull/2961
  • @nancyhung made their first contribution in https://github.com/mosaicml/composer/pull/2982
  • @bigning made their first contribution in https://github.com/mosaicml/composer/pull/2986
  • @jessechancy made their first contribution in https://github.com/mosaicml/composer/pull/3001
  • @josejg made their first contribution in https://github.com/mosaicml/composer/pull/2969
  • @Menduist made their first contribution in https://github.com/mosaicml/composer/pull/3021
  • @bfontain made their first contribution in https://github.com/mosaicml/composer/pull/2964

Full Changelog: https://github.com/mosaicml/composer/compare/v0.19.1...v0.20.0

- Python
Published by j316chuck over 2 years ago

composer - v0.19.1

What's New

1. New Event: BEFORE_LOAD (#2974)

Composer now has the events Event.BEFORE_LOAD, which lets users modify state before a model is loaded. This is particularly useful for accessing certain attributes which may not exist at Event.INIT, such as the dataloader state.

2. Registering model in MLFlow with run id (#2967)

The MLFlow logger now has register_model_with_run_id, which allows users to register a model based on the run_id. This is a different way of registering the model which preserves the link to the mlflow runs.

What's Changed

  • before_load event added https://github.com/mosaicml/composer/pull/2974
  • Add a registermodelwithrunid api to MLflowLogger https://github.com/mosaicml/composer/pull/2967

Full Changelog: https://github.com/mosaicml/composer/compare/v0.19.0...v0.19.1

- Python
Published by milocress over 2 years ago

composer - v0.19.0

What's New

1. Improved DTensor Support

Composer now supports elastic saving and loading of DTensors at various mesh sizes.

2. Checkpoint Saving and Loading from Databricks MLFlow

Composer now supports saving and loading checkpoints to Databricks-managed MLFlow.

``` composer_model = MyComposerModel(...)

trainer = Trainer( model=composermodel, savefolder= 'dbfs:/databricks/mlflow-tracking/{mlflowexperimentid}/{mlflowrunid}/artifacts', logger=MLFlowLogger(...), loadpath= 'dbfs:/databricks/mlflow-tracking/{mlflowexperimentid}/{mlflowrun_id}/artifacts', ... ) ```

3. Better Communication Computation Overlap in FSDP

Composer now has improved communication/computation overlap in our FSDP code which should improve MFU across several architectures.

4. Python3.11 + Torch2.2 Support

Initial support of Python3.11 + Torch2.2 added in Composer.

5. PEFT LoRA

PEFT LoRA is now supported in the HuggingFaceModel class.

6. Refactored Evaluation

in_context_learning_evaluation.py has a new design with cleaner abstractions and easier interfaces to work wtih.

7. Azure Checkpointing

Composer now supports saving your model in Azure.

8. MLFlow Checkpointing

Composer now supports saving your model in MLFlow.

Bug Fixes

  • Fix MLFlowLogger test by @ngcgarcia in https://github.com/mosaicml/composer/pull/2912
  • Fix bug with CoT early stopping and LLama2 tokenizer by @bmosaicml in https://github.com/mosaicml/composer/pull/2902
  • Fix splitbatch bug with empty generationkwargs by @maxisawesome in https://github.com/mosaicml/composer/pull/2913
  • Only load RNG keys that exist by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2901
  • Fix daily tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2891
  • Fix seed for FSDP wrap by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2833
  • Fix loadignorekeys with rng by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2803
  • Fix mosaicml logger on close by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2816
  • Fix torch profiler error on close by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2818
  • Fix import for daily test by @snarayan21 in https://github.com/mosaicml/composer/pull/2826
  • Fix how single value tensors are logged by @aspfohl in https://github.com/mosaicml/composer/pull/2831
  • Fix torch bump by @j316chuck in https://github.com/mosaicml/composer/pull/2855
  • Fix MPS with sequence loss by @JAEarly in https://github.com/mosaicml/composer/pull/2834

What's Changed

  • Bump transformers version by @dakinggg in https://github.com/mosaicml/composer/pull/2781
  • Bump sphinxext-opengraph from 0.9.0 to 0.9.1 by @dependabot in https://github.com/mosaicml/composer/pull/2784
  • Bump coverage[toml] from 7.3.0 to 7.3.3 by @dependabot in https://github.com/mosaicml/composer/pull/2783
  • Update torch requirement from <2.1.2,>=1.13.1 to >=1.13.1,<2.1.3 by @dependabot in https://github.com/mosaicml/composer/pull/2785
  • [UCVolumes] Rely on databricks-sdk auth for the right requirements by @panchalhp-db in https://github.com/mosaicml/composer/pull/2789
  • Enable system metrics in mosaic mlflow logger by @chenmoneygithub in https://github.com/mosaicml/composer/pull/2775
  • Update parse_uri by @irenedea in https://github.com/mosaicml/composer/pull/2787
  • default to no torch profiler memory timeline by @cli99 in https://github.com/mosaicml/composer/pull/2790
  • Add eot token to ICL generate kwargs by @bmosaicml in https://github.com/mosaicml/composer/pull/2782
  • Add nightly image for torch 2.2.0-12-20-23 by @j316chuck in https://github.com/mosaicml/composer/pull/2791
  • Add torch nightly 12-13 by @j316chuck in https://github.com/mosaicml/composer/pull/2792
  • Add process group as arg to FSDP by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2794
  • Bump coverage[toml] from 7.3.3 to 7.3.4 by @dependabot in https://github.com/mosaicml/composer/pull/2798
  • Bump ipykernel from 6.26.0 to 6.28.0 by @dependabot in https://github.com/mosaicml/composer/pull/2806
  • Bump junitparser from 3.1.0 to 3.1.1 by @dependabot in https://github.com/mosaicml/composer/pull/2805
  • Bump pytest from 7.4.3 to 7.4.4 by @dependabot in https://github.com/mosaicml/composer/pull/2807
  • Avoid futures on close for MosaicML logger by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2804
  • Require sync module states with HSDP by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2812
  • Better communication computation overlap by @snarayan21 in https://github.com/mosaicml/composer/pull/2811
  • Improve error message for speed monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2801
  • Bump torch version -- DO NOT RELEASE by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2814
  • Bump torchvision for nightly by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2815
  • Correct multi-unshard stream patching for torch 2.2.0dev, and stream waiting correctness. by @snarayan21 in https://github.com/mosaicml/composer/pull/2817
  • Bump traitlets from 5.13.0 to 5.14.1 by @dependabot in https://github.com/mosaicml/composer/pull/2822
  • All unshard streams wait on computation every step by @snarayan21 in https://github.com/mosaicml/composer/pull/2823
  • Add encoding=utf-8 by @dakinggg in https://github.com/mosaicml/composer/pull/2824
  • [MLFlowObjectStore] [1/2] Base implementation for MLFlowObjectStore by @jerrychen109 in https://github.com/mosaicml/composer/pull/2802
  • Remove fused layernorm (already deprecated for 2 versions) by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2827
  • checkpoint saver tracks all checkpoints/intervals in state by @aspfohl in https://github.com/mosaicml/composer/pull/2819
  • code-quality timeout update by @aspfohl in https://github.com/mosaicml/composer/pull/2830
  • Adds DTensor Support by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2821
  • Remove duplicate checkpoint verifications by @eracah in https://github.com/mosaicml/composer/pull/2828
  • Remove fsdp patch for comm overlap by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2836
  • Allow hsdp by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2838
  • Bump torch 2.1.2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2840
  • Upgrade pyright to 1.1.310 by @b-chu in https://github.com/mosaicml/composer/pull/2841
  • [MLFlowObjectStore] [2/2] Support checkpointing with MLFlow by @jerrychen109 in https://github.com/mosaicml/composer/pull/2810
  • update nightly to torch 2.3 by @j316chuck in https://github.com/mosaicml/composer/pull/2842
  • Pin sphinxcontrib applehelp by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2854
  • Torch 2.3 patch by @dakinggg in https://github.com/mosaicml/composer/pull/2849
  • Update mosaicml-cli requirement from <0.6,>=0.5.25 to >=0.5.25,<0.7 by @dependabot in https://github.com/mosaicml/composer/pull/2866
  • Rewrite to use individual state functions by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2860
  • Add custom stopping criteria to ICL generate tasks by @bmosaicml in https://github.com/mosaicml/composer/pull/2800
  • Add saveignorekeys by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2868
  • Remome log debug by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2871
  • Update monkeypatch to put barrier in optim load by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2874
  • Remove toml by @b-chu in https://github.com/mosaicml/composer/pull/2872
  • Update license by @b-chu in https://github.com/mosaicml/composer/pull/2875
  • Add ignore_metrics field to the MLflow logger by @ngcgarcia in https://github.com/mosaicml/composer/pull/2869
  • Convert print to log.info by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2876
  • Bump version to 0.18.0 by @irenedea in https://github.com/mosaicml/composer/pull/2877
  • Removed commented-out unshard streams patching. by @snarayan21 in https://github.com/mosaicml/composer/pull/2873
  • Make code quality workflow reusable by @b-chu in https://github.com/mosaicml/composer/pull/2878
  • Bump gitpython from 3.1.40 to 3.1.41 by @dependabot in https://github.com/mosaicml/composer/pull/2885
  • Bump torchmetrics by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2890
  • Bump transformers to 4.37 by @dakinggg in https://github.com/mosaicml/composer/pull/2894
  • Azure checkpointing support by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2893
  • Pass PG into checkpoint load and load rng with state_dict by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2897
  • Remove monkeypatch and new state dict APIs for torch 2.2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2899
  • Bump version to 0.18.1 by @b-chu in https://github.com/mosaicml/composer/pull/2905
  • Refactor incontextlearning_evaluation.py by @maxisawesome in https://github.com/mosaicml/composer/pull/2713
  • Fix FP8 checkpoint resumption with onnx export flag by @j316chuck in https://github.com/mosaicml/composer/pull/2907
  • Add Python 3.11 + FA 2.5.0 + Torch 2.3.0 Image by @KuuCi in https://github.com/mosaicml/composer/pull/2898
  • Add yamllint to pre commit by @b-chu in https://github.com/mosaicml/composer/pull/2909
  • Add ignore_hyperparameters to MLFlowLogger by @ngcgarcia in https://github.com/mosaicml/composer/pull/2908
  • Bump coverage[toml] from 7.3.4 to 7.4.1 by @dependabot in https://github.com/mosaicml/composer/pull/2915
  • Add checkpoint test for 0.18.1 by @b-chu in https://github.com/mosaicml/composer/pull/2906
  • Integrate PEFT LoRA with HuggingFaceModel by @dakinggg in https://github.com/mosaicml/composer/pull/2829

New Contributors

  • @jerrychen109 made their first contribution in https://github.com/mosaicml/composer/pull/2802
  • @JAEarly made their first contribution in https://github.com/mosaicml/composer/pull/2834
  • @maxisawesome made their first contribution in https://github.com/mosaicml/composer/pull/2713

Full Changelog: https://github.com/mosaicml/composer/compare/v0.17.2...v0.19.0

- Python
Published by j316chuck over 2 years ago

composer - v0.18.2

Bug Fixes

  • Fix lp layernorm weight by @snarayan21 in https://github.com/mosaicml/composer/pull/2954

What's Changed

  • Fix lp layernorm weight by @snarayan21 in https://github.com/mosaicml/composer/pull/2954
  • Bump version to 0.18.2 by @b-chu

Full Changelog: https://github.com/mosaicml/composer/compare/v0.18.1...v0.18.2

- Python
Published by b-chu over 2 years ago

composer - v0.18.1

Bug Fixes

  • Fix MPS with sequence loss by @JAEarly in https://github.com/mosaicml/composer/pull/2834
  • Fix daily tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2891
  • Remove monkeypatch and new state dict APIs for torch 2.2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2899
  • Only load RNG keys that exist by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2901

What's Changed

  • Bump version to 0.18.0 by @irenedea in https://github.com/mosaicml/composer/pull/2877
  • Removed commented-out unshard streams patching. by @snarayan21 in https://github.com/mosaicml/composer/pull/2873
  • Make code quality workflow reusable by @b-chu in https://github.com/mosaicml/composer/pull/2878
  • Bump gitpython from 3.1.40 to 3.1.41 by @dependabot in https://github.com/mosaicml/composer/pull/2885
  • Fix MPS with sequence loss by @JAEarly in https://github.com/mosaicml/composer/pull/2834
  • Bump torchmetrics by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2890
  • Fix daily tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2891
  • Bump transformers to 4.37 by @dakinggg in https://github.com/mosaicml/composer/pull/2894
  • Azure checkpointing support by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2893
  • Pass PG into checkpoint load and load rng with state_dict by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2897
  • Remove monkeypatch and new state dict APIs for torch 2.2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2899
  • Only load RNG keys that exist by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2901
  • Bump version to 0.18.1 by @b-chu in https://github.com/mosaicml/composer/pull/2905

New Contributors

  • @JAEarly made their first contribution in https://github.com/mosaicml/composer/pull/2834

Full Changelog: https://github.com/mosaicml/composer/compare/v0.18.0...v0.18.1

- Python
Published by b-chu over 2 years ago

composer - v0.18.0

This release has been yanked, please skip directly to Composer v0.18.1

New Features

1. Improved DTensor Support

Composer now supports elastic saving and loading of DTensors at various mesh sizes.

2. Checkpoint Saving and Loading from Databricks MLFlow

Composer now supports saving and loading checkpoints to Databricks-managed MLFlow.

``` composer_model = MyComposerModel(...)

trainer = Trainer( model=composermodel, savefolder= 'dbfs:/databricks/mlflow-tracking/{mlflowexperimentid}/{mlflowrunid}/artifacts', logger=MLFlowLogger(...), loadpath= 'dbfs:/databricks/mlflow-tracking/{mlflowexperimentid}/{mlflowrun_id}/artifacts', ... ) ```

Bug Fixes

  • Fix loadignorekeys with rng by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2803
  • Fix mosaicml logger on close by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2816
  • Fix torch profiler error on close by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2818
  • Fix import for daily test by @snarayan21 in https://github.com/mosaicml/composer/pull/2826
  • [S] Fix how single value tensors are logged by @aspfohl in https://github.com/mosaicml/composer/pull/2831

Deprecations

  • Remove fused layernorm (already deprecated for 2 versions) by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2827

What's Changed

  • Bump transformers version by @dakinggg in https://github.com/mosaicml/composer/pull/2781
  • Bump sphinxext-opengraph from 0.9.0 to 0.9.1 by @dependabot in https://github.com/mosaicml/composer/pull/2784
  • Bump coverage[toml] from 7.3.0 to 7.3.3 by @dependabot in https://github.com/mosaicml/composer/pull/2783
  • Update torch requirement from <2.1.2,>=1.13.1 to >=1.13.1,<2.1.3 by @dependabot in https://github.com/mosaicml/composer/pull/2785
  • [UCVolumes] Rely on databricks-sdk auth for the right requirements by @panchalhp-db in https://github.com/mosaicml/composer/pull/2789
  • Enable system metrics in mosaic mlflow logger by @chenmoneygithub in https://github.com/mosaicml/composer/pull/2775
  • Update parse_uri by @irenedea in https://github.com/mosaicml/composer/pull/2787
  • default to no torch profiler memory timeline by @cli99 in https://github.com/mosaicml/composer/pull/2790
  • Add eot token to ICL generate kwargs by @bmosaicml in https://github.com/mosaicml/composer/pull/2782
  • Add nightly image for torch 2.2.0-12-20-23 by @j316chuck in https://github.com/mosaicml/composer/pull/2791
  • Add torch nightly 12-13 by @j316chuck in https://github.com/mosaicml/composer/pull/2792
  • Add process group as arg to FSDP by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2794
  • Bump coverage[toml] from 7.3.3 to 7.3.4 by @dependabot in https://github.com/mosaicml/composer/pull/2798
  • Fix loadignorekeys with rng by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2803
  • Bump ipykernel from 6.26.0 to 6.28.0 by @dependabot in https://github.com/mosaicml/composer/pull/2806
  • Bump junitparser from 3.1.0 to 3.1.1 by @dependabot in https://github.com/mosaicml/composer/pull/2805
  • Bump pytest from 7.4.3 to 7.4.4 by @dependabot in https://github.com/mosaicml/composer/pull/2807
  • Avoid futures on close for MosaicML logger by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2804
  • Require sync module states with HSDP by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2812
  • Better communication computation overlap by @snarayan21 in https://github.com/mosaicml/composer/pull/2811
  • Improve error message for speed monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2801
  • Bump torch version -- DO NOT RELEASE by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2814
  • Bump torchvision for nightly by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2815
  • Fix mosaicml logger on close by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2816
  • Correct multi-unshard stream patching for torch 2.2.0dev, and stream waiting correctness. by @snarayan21 in https://github.com/mosaicml/composer/pull/2817
  • Fix torch profiler error on close by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2818
  • Bump traitlets from 5.13.0 to 5.14.1 by @dependabot in https://github.com/mosaicml/composer/pull/2822
  • All unshard streams wait on computation every step by @snarayan21 in https://github.com/mosaicml/composer/pull/2823
  • Add encoding=utf-8 by @dakinggg in https://github.com/mosaicml/composer/pull/2824
  • Fix import for daily test by @snarayan21 in https://github.com/mosaicml/composer/pull/2826
  • [MLFlowObjectStore] [1/2] Base implementation for MLFlowObjectStore by @jerrychen109 in https://github.com/mosaicml/composer/pull/2802
  • Remove fused layernorm (already deprecated for 2 versions) by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2827
  • checkpoint saver tracks all checkpoints/intervals in state by @aspfohl in https://github.com/mosaicml/composer/pull/2819
  • code-quality timeout update by @aspfohl in https://github.com/mosaicml/composer/pull/2830
  • [S] Fix how single value tensors are logged by @aspfohl in https://github.com/mosaicml/composer/pull/2831
  • Adds DTensor Support by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2821
  • Remove duplicate checkpoint verifications by @eracah in https://github.com/mosaicml/composer/pull/2828
  • Fix seed for FSDP wrap by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2833
  • Remove fsdp patch for comm overlap by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2836
  • Allow hsdp by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2838
  • Bump torch 2.1.2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2840
  • Upgrade pyright to 1.1.310 by @b-chu in https://github.com/mosaicml/composer/pull/2841
  • [MLFlowObjectStore] [2/2] Support checkpointing with MLFlow by @jerrychen109 in https://github.com/mosaicml/composer/pull/2810
  • update nightly to torch 2.3 by @j316chuck in https://github.com/mosaicml/composer/pull/2842
  • Pin sphinxcontrib applehelp by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2854
  • Fix torch bump by @j316chuck in https://github.com/mosaicml/composer/pull/2855
  • Torch 2.3 patch by @dakinggg in https://github.com/mosaicml/composer/pull/2849
  • Update mosaicml-cli requirement from <0.6,>=0.5.25 to >=0.5.25,<0.7 by @dependabot in https://github.com/mosaicml/composer/pull/2866
  • Rewrite to use individual state functions by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2860
  • Add custom stopping criteria to ICL generate tasks by @bmosaicml in https://github.com/mosaicml/composer/pull/2800
  • Add saveignorekeys by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2868
  • Remome log debug by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2871
  • Update monkeypatch to put barrier in optim load by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2874
  • Remove toml by @b-chu in https://github.com/mosaicml/composer/pull/2872
  • Update license by @b-chu in https://github.com/mosaicml/composer/pull/2875
  • Add ignore_metrics field to the MLflow logger by @ngcgarcia in https://github.com/mosaicml/composer/pull/2869
  • Convert print to log.info by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2876

New Contributors

  • @jerrychen109 made their first contribution in https://github.com/mosaicml/composer/pull/2802

Full Changelog: https://github.com/mosaicml/composer/compare/v0.17.2...v0.18.0

- Python
Published by b-chu over 2 years ago

composer - v0.18.0

What's New

1. Improved DTensor Support (#2821)

Enables elastic saving and loading of DTensors at various mesh sizes.

2. MLFlow Upload and Download (#2802,#2810)

Artifacts, such as checkpoints, can now be logged to Databricks-managed MLFlow.

``` composermodel = MyComposerModel(nlayers=3)

trainer = Trainer( model=composermodel, maxduration='4ba', savefolder='dbfs:/databricks/mlflow-tracking/{mlflowexperimentid}/{mlflowrun_id}/artifacts', loggers=MLFlowLogger(...), ... ) ```

Deprecations

  • Remove fused layernorm (already deprecated for 2 versions) by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2827

Bug Fixes

  • Fix loadignorekeys with rng by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2803
  • Fix torch profiler error on close by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2818
  • Correct multi-unshard stream patching for torch 2.2.0dev, and stream waiting correctness. by @snarayan21 in https://github.com/mosaicml/composer/pull/2817
  • Remove duplicate checkpoint verifications by @eracah in https://github.com/mosaicml/composer/pull/2828
  • [S] Fix how single value tensors are logged by @aspfohl in https://github.com/mosaicml/composer/pull/2831
  • default to no torch profiler memory timeline by @cli99 in https://github.com/mosaicml/composer/pull/2790

What's Changed

  • Bump transformers version by @dakinggg in https://github.com/mosaicml/composer/pull/2781
  • Bump sphinxext-opengraph from 0.9.0 to 0.9.1 by @dependabot in https://github.com/mosaicml/composer/pull/2784
  • Bump coverage[toml] from 7.3.0 to 7.3.3 by @dependabot in https://github.com/mosaicml/composer/pull/2783
  • Update torch requirement from <2.1.2,>=1.13.1 to >=1.13.1,<2.1.3 by @dependabot in https://github.com/mosaicml/composer/pull/2785
  • [UCVolumes] Rely on databricks-sdk auth for the right requirements by @panchalhp-db in https://github.com/mosaicml/composer/pull/2789
  • Enable system metrics in mosaic mlflow logger by @chenmoneygithub in https://github.com/mosaicml/composer/pull/2775
  • Update parse_uri by @irenedea in https://github.com/mosaicml/composer/pull/2787
  • default to no torch profiler memory timeline by @cli99 in https://github.com/mosaicml/composer/pull/2790
  • Add eot token to ICL generate kwargs by @bmosaicml in https://github.com/mosaicml/composer/pull/2782
  • Add nightly image for torch 2.2.0-12-20-23 by @j316chuck in https://github.com/mosaicml/composer/pull/2791
  • Add torch nightly 12-13 by @j316chuck in https://github.com/mosaicml/composer/pull/2792
  • Add process group as arg to FSDP by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2794
  • Bump coverage[toml] from 7.3.3 to 7.3.4 by @dependabot in https://github.com/mosaicml/composer/pull/2798
  • Fix loadignorekeys with rng by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2803
  • Bump ipykernel from 6.26.0 to 6.28.0 by @dependabot in https://github.com/mosaicml/composer/pull/2806
  • Bump junitparser from 3.1.0 to 3.1.1 by @dependabot in https://github.com/mosaicml/composer/pull/2805
  • Bump pytest from 7.4.3 to 7.4.4 by @dependabot in https://github.com/mosaicml/composer/pull/2807
  • Avoid futures on close for MosaicML logger by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2804
  • Require sync module states with HSDP by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2812
  • Better communication computation overlap by @snarayan21 in https://github.com/mosaicml/composer/pull/2811
  • Improve error message for speed monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2801
  • Bump torch version -- DO NOT RELEASE by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2814
  • Bump torchvision for nightly by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2815
  • Fix mosaicml logger on close by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2816
  • Correct multi-unshard stream patching for torch 2.2.0dev, and stream waiting correctness. by @snarayan21 in https://github.com/mosaicml/composer/pull/2817
  • Fix torch profiler error on close by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2818
  • Bump traitlets from 5.13.0 to 5.14.1 by @dependabot in https://github.com/mosaicml/composer/pull/2822
  • All unshard streams wait on computation every step by @snarayan21 in https://github.com/mosaicml/composer/pull/2823
  • Add encoding=utf-8 by @dakinggg in https://github.com/mosaicml/composer/pull/2824
  • Fix import for daily test by @snarayan21 in https://github.com/mosaicml/composer/pull/2826
  • [MLFlowObjectStore] [1/2] Base implementation for MLFlowObjectStore by @jerrychen109 in https://github.com/mosaicml/composer/pull/2802
  • Remove fused layernorm (already deprecated for 2 versions) by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2827
  • checkpoint saver tracks all checkpoints/intervals in state by @aspfohl in https://github.com/mosaicml/composer/pull/2819
  • code-quality timeout update by @aspfohl in https://github.com/mosaicml/composer/pull/2830
  • [S] Fix how single value tensors are logged by @aspfohl in https://github.com/mosaicml/composer/pull/2831
  • Adds DTensor Support by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2821
  • Remove duplicate checkpoint verifications by @eracah in https://github.com/mosaicml/composer/pull/2828
  • Fix seed for FSDP wrap by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2833
  • Remove fsdp patch for comm overlap by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2836
  • Allow hsdp by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2838
  • Bump torch 2.1.2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2840
  • Upgrade pyright to 1.1.310 by @b-chu in https://github.com/mosaicml/composer/pull/2841
  • [MLFlowObjectStore] [2/2] Support checkpointing with MLFlow by @jerrychen109 in https://github.com/mosaicml/composer/pull/2810
  • update nightly to torch 2.3 by @j316chuck in https://github.com/mosaicml/composer/pull/2842
  • Pin sphinxcontrib applehelp by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2854
  • Fix torch bump by @j316chuck in https://github.com/mosaicml/composer/pull/2855
  • Torch 2.3 patch by @dakinggg in https://github.com/mosaicml/composer/pull/2849
  • Update mosaicml-cli requirement from <0.6,>=0.5.25 to >=0.5.25,<0.7 by @dependabot in https://github.com/mosaicml/composer/pull/2866
  • Rewrite to use individual state functions by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2860
  • Add custom stopping criteria to ICL generate tasks by @bmosaicml in https://github.com/mosaicml/composer/pull/2800
  • Add saveignorekeys by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2868
  • Remome log debug by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2871
  • Update monkeypatch to put barrier in optim load by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2874
  • Remove toml by @b-chu in https://github.com/mosaicml/composer/pull/2872
  • Update license by @b-chu in https://github.com/mosaicml/composer/pull/2875
  • Add ignore_metrics field to the MLflow logger by @ngcgarcia in https://github.com/mosaicml/composer/pull/2869
  • Convert print to log.info by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2876

New Contributors

  • @jerrychen109 made their first contribution in https://github.com/mosaicml/composer/pull/2802

Full Changelog: https://github.com/mosaicml/composer/compare/v0.17.2...v0.18.0

- Python
Published by irenedea over 2 years ago

composer - v0.17.2

New Features

1. Torch 2.1.1 Support

Composer now supports torch 2.1.1! This new release primarily fixes several small bugs that we had previously monkeypatched in Composer.

2. Faster OCI Upload/Download

Composer now supports multi-part upload/download to OCI, which should speedup object store times.

3. Memory Profiling

We've expanded the torch profiler integration to support memory profiling. Now, when the profile is enabled, you will get a trace showing how memory utilization is broken down by various components on your GPUs.

Bug Fixes

1. FSDP Initialization with Meta

Previously, our FSDP integration had a bug with initializing weights when using device=meta, which resulted in an additional scaling. This has now been fixed, so device and distributed strategies should not affect parallelization strategy.

What's Changed

  • Override NVIDIA environment variable for CUDA 12.1 images by @bandish-shah in https://github.com/mosaicml/composer/pull/2742
  • Add NVIDIAREQUIRECUDA_OVERRIDE env variable to Composer and Torch nightly Docker images by @bandish-shah in https://github.com/mosaicml/composer/pull/2744
  • Remove duplicated for loop in lr_monitor.py by @priba in https://github.com/mosaicml/composer/pull/2738
  • Fix console logger for small datasets. by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2746
  • Add metadata logging for wandb by @jjanezhang in https://github.com/mosaicml/composer/pull/2747
  • Ignore load ignore keys by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2748
  • Bump torch to 2.1.1 version by @j316chuck in https://github.com/mosaicml/composer/pull/2717
  • Add more info when run doesnt complete by @aspfohl in https://github.com/mosaicml/composer/pull/2751
  • Lower sequence generation length on code gen to be dependent on max canonical solution length by @bmosaicml in https://github.com/mosaicml/composer/pull/2682
  • Remove flatten params by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2761
  • Fix GPU tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2767
  • Fix GPU v2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2768
  • Use time.tokens for speedmonitor instead of dataset length by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2762
  • Remove BreakEpochException by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2759
  • time to clean up time parsing 😉 by @aspfohl in https://github.com/mosaicml/composer/pull/2770
  • Upgrade RunConfig compute specification by @aspfohl in https://github.com/mosaicml/composer/pull/2772
  • Use async logging in MLflowLogger by @chenmoneygithub in https://github.com/mosaicml/composer/pull/2693
  • Fix FSDP paraminit_fn to not reinit parameters multiple times by @dakinggg in https://github.com/mosaicml/composer/pull/2765
  • Gate FSDP param init test on torch 2.1 by @dakinggg in https://github.com/mosaicml/composer/pull/2774
  • Parallelize OCI multipart download by @coryMosaicML in https://github.com/mosaicml/composer/pull/2750
  • [UCVolumes] Add support for list API by @panchalhp-db in https://github.com/mosaicml/composer/pull/2769
  • Add the memory timeline profiling support through the PyTorch profiler. by @cli99 in https://github.com/mosaicml/composer/pull/2771
  • Improve torch memory profiling arguments processing by @cli99 in https://github.com/mosaicml/composer/pull/2777
  • Bump aws of nccl version and enable aws platform support by @willgleich in https://github.com/mosaicml/composer/pull/2776
  • Extend checkpoint loading to accept a validation function by @irenedea in https://github.com/mosaicml/composer/pull/2726
  • Fix checkpoint validation tests for torch 1.13 by @irenedea in https://github.com/mosaicml/composer/pull/2779
  • Bump version to 0.17.2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2780

New Contributors

  • @chenmoneygithub made their first contribution in https://github.com/mosaicml/composer/pull/2693

Full Changelog: https://github.com/mosaicml/composer/compare/v0.17.1...v0.17.2

- Python
Published by mvpatel2000 over 2 years ago

composer - v0.17.1

Bug Fixes

1. MosaicML Logger Robustness (https://github.com/mosaicml/composer/pull/2728)

We've improved the MosaicML logger to be more robust to faulty serialization.

What's Changed

  • Add train finished run event by @jjanezhang in https://github.com/mosaicml/composer/pull/2714
  • Override nvidia env var for 11.8 by @dakinggg in https://github.com/mosaicml/composer/pull/2722
  • Update file exists checkpointing error messages to be more helpful by @irenedea in https://github.com/mosaicml/composer/pull/2668
  • [S] Add tag support to MLFlowLogger by @aspfohl in https://github.com/mosaicml/composer/pull/2716
  • Use raise ... from e to preserve stack trace by @irenedea in https://github.com/mosaicml/composer/pull/2725
  • add 0.17 to bcompat tests by @eracah in https://github.com/mosaicml/composer/pull/2723
  • Add support for canned ACL environment variable by @nik-mosaic in https://github.com/mosaicml/composer/pull/2729
  • Check serialization for JSON in mosaicml logger by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2728
  • Fix profiler issue by @j316chuck in https://github.com/mosaicml/composer/pull/2735
  • Fix activation cpu offloading by @cli99 in https://github.com/mosaicml/composer/pull/2724
  • Bump version 0.17.1 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2741

Full Changelog: https://github.com/mosaicml/composer/compare/v0.17.0...v0.17.1

- Python
Published by mvpatel2000 over 2 years ago

composer - v0.17.0

What's New

1. Hybrid Sharded Data Parallel (HSDP) Integration (#2648)

Composer now supports Hybrid Sharded Data Parallel (HSDP), where a model is both sharded and replicated across blocks of controllable size. By default, this will shard a model within a node and replicate across nodes, but Composer will accept a tuple of process groups to specify custom shard/replicate sizes. This can be specified in the FSDP config.

``` composermodel = MyComposerModel(nlayers=3)

fsdpconfig = { 'shardingstrategy': 'HYBRID_SHARD', }

trainer = Trainer( model=composermodel, maxduration='4ba', fsdpconfig=fsdpconfig, ... ) ```

HYBRID_SHARD will FULL_SHARD a model whereas _HYBRID_SHARD_ZERO2 will SHARD_GRAD_OP within the shard block.

2. Train Loss NaN Monitor (#2704)

Composer has a new callback which will raise a value error if your loss NaNs out. This is very useful to avoid wasting compute if your training run diverges or fails for numerical reasons.

``` from composer.callbacks import NaNMonitor

composermodel = MyComposerModel(nlayers=3)

trainer = Trainer( model=composermodel, maxduration='4ba', callbacks=NaNMonitor(), ... ) ```

Bug Fixes

  • Fix MPS with dict loss by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2706
  • Squelch Memory Monitor warnings if device=meta by @hanlint in https://github.com/mosaicml/composer/pull/2529
  • Switch mosaicml logger to use futures to enable better error handling by @j316chuck in https://github.com/mosaicml/composer/pull/2702

What's Changed

  • Add partial state dict functionality for FSDP by @b-chu in https://github.com/mosaicml/composer/pull/2637
  • Update monai requirement from <1.3,>=0.9.1 to >=0.9.1,<1.4 by @dependabot in https://github.com/mosaicml/composer/pull/2643
  • Bump pytest-codeblocks from 0.16.1 to 0.17.0 by @dependabot in https://github.com/mosaicml/composer/pull/2645
  • Remove checkpoint on close by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2646
  • Update latest to 2.1 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2650
  • HSDP Support by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2648
  • Log profile averages by @j316chuck in https://github.com/mosaicml/composer/pull/2647
  • Daily API key by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2655
  • Add automatic remote uploader downloader for composer profiler by @j316chuck in https://github.com/mosaicml/composer/pull/2653
  • Update the AWSOFINCCL version and add in the MPI HWLOC install by @willgleich in https://github.com/mosaicml/composer/pull/2651
  • Fix GCP tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2658
  • Allow no eval_loader when eval is disabled by @b-chu in https://github.com/mosaicml/composer/pull/2657
  • Gate HSDP by torch 2.1.0 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2656
  • Fix FSDP arg default to match torch by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2660
  • Bump pypandoc from 1.11 to 1.12 by @dependabot in https://github.com/mosaicml/composer/pull/2664
  • Bump vit-pytorch from 0.35.8 to 1.6.1 by @dependabot in https://github.com/mosaicml/composer/pull/2662
  • Upgrade to transformers 4.34.1 by @dakinggg in https://github.com/mosaicml/composer/pull/2635
  • Update docker readme by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2669
  • Add script to validate remote object store paths by @irenedea in https://github.com/mosaicml/composer/pull/2667
  • Torch 2.1 Resumption Support by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2665
  • Bump gitpython from 3.1.37 to 3.1.40 by @dependabot in https://github.com/mosaicml/composer/pull/2663
  • Fix dist by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2670
  • Add torch nightly for torch 2.2.0 10-24 by @j316chuck in https://github.com/mosaicml/composer/pull/2671
  • Adding Model Data Init and Training Progress to MosaicMLLogger by @jjanezhang in https://github.com/mosaicml/composer/pull/2633
  • Bump pytest from 7.4.2 to 7.4.3 by @dependabot in https://github.com/mosaicml/composer/pull/2678
  • Bump sphinxext-opengraph from 0.8.2 to 0.9.0 by @dependabot in https://github.com/mosaicml/composer/pull/2677
  • Bump traitlets from 5.10.0 to 5.12.0 by @dependabot in https://github.com/mosaicml/composer/pull/2674
  • Bump cryptography from 41.0.4 to 41.0.5 by @dependabot in https://github.com/mosaicml/composer/pull/2675
  • Secure Code Eval changes by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2679
  • Lazy validation of code eval metric by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2681
  • Upgrade transformers to 4.35 by @dakinggg in https://github.com/mosaicml/composer/pull/2684
  • Bump traitlets from 5.12.0 to 5.13.0 by @dependabot in https://github.com/mosaicml/composer/pull/2687
  • Bump ipykernel from 6.25.2 to 6.26.0 by @dependabot in https://github.com/mosaicml/composer/pull/2686
  • Add Kwargs to upload_object by @nik-mosaic in https://github.com/mosaicml/composer/pull/2692
  • Add version number to composer metadata logs by @j316chuck in https://github.com/mosaicml/composer/pull/2565
  • Add distributed barrier test fixture to ensure pytest cleans up resources properly by @j316chuck in https://github.com/mosaicml/composer/pull/2694
  • Properly handle empty metricnames passed to Trainer.filter_metrics by @irenedea in https://github.com/mosaicml/composer/pull/2700
  • Train loss NaN checking callback by @coryMosaicML in https://github.com/mosaicml/composer/pull/2704
  • Adding logging and force flushing for run events by @jjanezhang in https://github.com/mosaicml/composer/pull/2703
  • [daily-test fix] Add rank 0 gating to testelasticresumption state dict comparison by @eracah in https://github.com/mosaicml/composer/pull/2705
  • Fix MPS with dict loss by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2706
  • Update types to follow PEP 585 by @b-chu in https://github.com/mosaicml/composer/pull/2697
  • Bump yamllint from 1.32.0 to 1.33.0 by @dependabot in https://github.com/mosaicml/composer/pull/2708
  • Update wandb requirement from <0.16,>=0.13.2 to >=0.13.2,<0.17 by @dependabot in https://github.com/mosaicml/composer/pull/2709
  • Squelch Memory Monitor warnings if device=meta by @hanlint in https://github.com/mosaicml/composer/pull/2529
  • Fix NaN monitor for loss dicts. by @coryMosaicML in https://github.com/mosaicml/composer/pull/2712
  • Switch mosaicml logger to use futures to enable better error handling by @j316chuck in https://github.com/mosaicml/composer/pull/2702
  • Fetching arguments for FSDP by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2710
  • Bump version to 0.17 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2711

New Contributors

  • @willgleich made their first contribution in https://github.com/mosaicml/composer/pull/2651
  • @jjanezhang made their first contribution in https://github.com/mosaicml/composer/pull/2633

Full Changelog: https://github.com/mosaicml/composer/compare/v0.16.4...v0.17.0

- Python
Published by mvpatel2000 over 2 years ago

composer - v0.16.4

What's New

1. Torch 2.1 Support

Composer officially supports PyTorch 2.1! We support several new features from 2.1, including CustomPolicy which supports granular wrapping with FSDP.

What's Changed

  • Add 0.16 checkpoint to backwards compatibility tests by @eracah in https://github.com/mosaicml/composer/pull/2567
  • Updating FSDP monkeypatch by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2571
  • Add Databricks UC Volume Object Store by @panchalhp-db in https://github.com/mosaicml/composer/pull/2548
  • Fix pytest disk space OOM issue by adding tmppathretention_policy=None by @j316chuck in https://github.com/mosaicml/composer/pull/2583
  • Change daily nightly test version by @j316chuck in https://github.com/mosaicml/composer/pull/2596
  • Add save and register wrappers to mlflow logger by @dakinggg in https://github.com/mosaicml/composer/pull/2579
  • Missing () fo or in auto microbatching gate by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2574
  • Simplify FSDP Gradient Clipping by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2586
  • Use FSDP CustomPolicy to support custom kwargs passed to different wrapped modules by @cli99 in https://github.com/mosaicml/composer/pull/2585
  • Free outputs callback by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2598
  • Merge branch 'dev' into spr/dev/458c4e36 by @b-chu in https://github.com/mosaicml/composer/pull/2595
  • Fix a bug when batch type is dict and one of the values is the list by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2599
  • Readme update by @ejyuen in https://github.com/mosaicml/composer/pull/2581
  • Add chain of thought eval by @bmosaicml in https://github.com/mosaicml/composer/pull/2466
  • Add torch 2.1.0 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2602
  • Change pr cpu and pr gpu test docker images by @j316chuck in https://github.com/mosaicml/composer/pull/2611
  • Change the tokenizer json file to read binary by @dakinggg in https://github.com/mosaicml/composer/pull/2608
  • [Docs] MLflow casing by @aspfohl in https://github.com/mosaicml/composer/pull/2609
  • Call generate callback at end of training by @aspfohl in https://github.com/mosaicml/composer/pull/2607
  • Refactor save interval and eval interval to share code by @dakinggg in https://github.com/mosaicml/composer/pull/2600
  • Deprecate many datasets and models by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2605
  • Clean up gpu tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2612
  • Remove apex test by @j316chuck in https://github.com/mosaicml/composer/pull/2616
  • Patch default precision by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2628
  • Add logging for generate callbacks by @aspfohl in https://github.com/mosaicml/composer/pull/2630
  • Expose inputnames and outputnames when exporting to ONNX by @antoinebrl in https://github.com/mosaicml/composer/pull/2601
  • Bump version to 0.16.4 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2627

New Contributors

  • @panchalhp-db made their first contribution in https://github.com/mosaicml/composer/pull/2548
  • @cli99 made their first contribution in https://github.com/mosaicml/composer/pull/2585

Full Changelog: https://github.com/mosaicml/composer/compare/v0.16.3...v0.16.4

- Python
Published by mvpatel2000 over 2 years ago

composer - v0.16.3

What's New

1. Add pass@k for HumanEval

HumanEval now supports pass@k. We also support first-class integration with the MosaicML platform for secure code evaluation.

2. log_model with MLFlow

The MLFlow integration now supports log_model at the end of the run.

What's Changed

  • Update checkpoint.py by @b-chu in https://github.com/mosaicml/composer/pull/2540
  • Add log image to mlflow by @eracah in https://github.com/mosaicml/composer/pull/2416
  • Log runtime estimator units by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2542
  • Bump traitlets from 5.9.0 to 5.10.0 by @dependabot in https://github.com/mosaicml/composer/pull/2547
  • Bump gitpython from 3.1.35 to 3.1.36 by @dependabot in https://github.com/mosaicml/composer/pull/2546
  • Bump ipykernel from 6.25.1 to 6.25.2 by @dependabot in https://github.com/mosaicml/composer/pull/2544
  • Add providers param to ONNX Session in tests by @nik-mosaic in https://github.com/mosaicml/composer/pull/2553
  • Bump flash attn by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2551
  • Remove pin by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2554
  • Change filter to include pullrequesttarget by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2557
  • Downgrade nightly to previous version by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2556
  • MCLI Code Eval by @rishab-partha in https://github.com/mosaicml/composer/pull/2479
  • Bump cryptography from 41.0.3 to 41.0.4 by @dependabot in https://github.com/mosaicml/composer/pull/2559
  • Bump gitpython from 3.1.36 to 3.1.37 by @dependabot in https://github.com/mosaicml/composer/pull/2560
  • Update numpy requirement from <1.26.0,>=1.21.5 to >=1.21.5,<1.27.0 by @dependabot in https://github.com/mosaicml/composer/pull/2561
  • Update support for HumanEval by @mcarbin in https://github.com/mosaicml/composer/pull/2550
  • Add log_model to MLFlowLogger by @dakinggg in https://github.com/mosaicml/composer/pull/2541
  • Bump version to 0.16.3 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2566

New Contributors

  • @mcarbin made their first contribution in https://github.com/mosaicml/composer/pull/2550

Full Changelog: https://github.com/mosaicml/composer/compare/v0.16.2...v0.16.3

- Python
Published by mvpatel2000 over 2 years ago

composer - v0.16.2

What's New

1. PyTorch Nightly Support

Composer now supports PyTorch Nightly and Cuda 12! Along with new docker images based on nightly PyTorch versions and release candidates, we've updated our PyTorch monkeypatches to support the latest version of PyTorch. These monkeypatches add additional functionality in finer-grain FSDP wrapping and patch bugs related to sharded checkpoints. We are in the process of upstreaming these changes into PyTorch.

Bug Fixes

1. MosaicML Logger Robustness

MosaicML logger now is robust to platform timeouts and other errors. Additionally, it can now be disabled by setting the environment variable MOSAICML_PLATFORM to 'False' when training on the MosaicML platform.

2. GCS Integration

GCS authentication is now supported with HMAC keys, patching a bug in the previous implementation.

3. Optimizer Monitor Norm Calculation (https://github.com/mosaicml/composer/pull/2531)

Previously, the optimizer monitor incorrectly reduced norms across GPUs. It now correctly computes norms in a distributed setting.

What's Changed

  • fix: when there is no train_metrics, do not checkpoint by @furkanbiten in https://github.com/mosaicml/composer/pull/2502
  • Remove metric saving by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2514
  • Fix daily tests by removing gpu marker by @j316chuck in https://github.com/mosaicml/composer/pull/2515
  • Refactor mosaic_fsdp.py by @b-chu in https://github.com/mosaicml/composer/pull/2506
  • Disable slack notifications for PRs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2517
  • Add custom sharding to ChunkShardingSpec by @b-chu in https://github.com/mosaicml/composer/pull/2507
  • Update nightly docker image to torch nightly 09-03-23 by @j316chuck in https://github.com/mosaicml/composer/pull/2518
  • Update pre-commit in setup.py by @b-chu in https://github.com/mosaicml/composer/pull/2522
  • Add FSDP custom wrap with torch 2.1 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2460
  • Fix GCSObjectStore bug where hmac keys auth doesn't work by @eracah in https://github.com/mosaicml/composer/pull/2519
  • Bump gitpython from 3.1.34 to 3.1.35 by @dependabot in https://github.com/mosaicml/composer/pull/2525
  • Bump pytest from 7.4.0 to 7.4.2 by @dependabot in https://github.com/mosaicml/composer/pull/2523
  • Upgrade to MLFlow version 2.5.0 by @ngcgarcia in https://github.com/mosaicml/composer/pull/2528
  • Disable cifar daily test by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2527
  • Mosaicml logger robustness improvements by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2530
  • Fix metrics keys sort in DecoupledAdamW for OptimizerMonitor FSDP metric agreggation by @m1kol in https://github.com/mosaicml/composer/pull/2531
  • Fix github actions for GCS integration testing by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2532
  • Fix GCS tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2535
  • Change cast for mosaicml logger by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2538
  • Bump Version to 0.16.2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2537
  • Bump transformers version by @dakinggg in https://github.com/mosaicml/composer/pull/2539

New Contributors

  • @ngcgarcia made their first contribution in https://github.com/mosaicml/composer/pull/2528
  • @m1kol made their first contribution in https://github.com/mosaicml/composer/pull/2531

Full Changelog: https://github.com/mosaicml/composer/compare/v0.16.1...v0.16.2

- Python
Published by mvpatel2000 almost 3 years ago

composer - v0.16.1

New Features

1. HPU (Habana Gaudi) Support (https://github.com/mosaicml/composer/pull/2444)

Composer now supports Habana Gaudi chips! To enable HPUs, device needs to be specified as 'hpu':

``` composermodel = MyComposerModel(nlayers=3)

trainer = Trainer( model=composer_model, device='hpu', ... ) ```

2. Generate Callback (https://github.com/mosaicml/composer/pull/2449)

We've added a new callback which runs generate on a language model at a given frequency to visualize outputs:

``` from composer.callbacks import Generate

composermodel = MyComposerModel(nlayers=3) generate_callback = Generate(prompts=['How good is my model?'], interval='5ba')

trainer = Trainer( model=composermodel, callbacks = generatecallback, ... ) ```

Bug Fixes

1. Checkpoint Fixes

Elastic sharded checkpointing now disables torchmetric saving to avoid issues with torchmetrics tensors being sharded. Additionally, checkpointing now falls back on the old path which does not convert torchmetrics tensors to numpy. Checkpointing also no longer materializes optimizer state when saving weights only.

2. MLFlow Performance Improvements

MLFlow integration has significant performance improvements in logging frequency and system metrics collected.

What's Changed

  • Hpu support by @vivekgoe in https://github.com/mosaicml/composer/pull/2444
  • Change input_ids to a kwarg in HuggingFaceModel.generate by @dakinggg in https://github.com/mosaicml/composer/pull/2459
  • Add log_table by @irenedea in https://github.com/mosaicml/composer/pull/2437
  • Enable composer to work with torch nightly builds, torch 2.1.0, and cuda 12.1. by @j316chuck in https://github.com/mosaicml/composer/pull/2463
  • Materialize only model statedict in memory for `saveweights_only` by @eracah in https://github.com/mosaicml/composer/pull/2450
  • Improve performance of MLflow logging by @dbczumar in https://github.com/mosaicml/composer/pull/2442
  • Fail fast if scheduler warmup and max duration are incompatible by @dakinggg in https://github.com/mosaicml/composer/pull/2458
  • Add nightly docker image by @j316chuck in https://github.com/mosaicml/composer/pull/2452
  • Fix local eval by @rishab-partha in https://github.com/mosaicml/composer/pull/2465
  • Add torch 2.1.0 args for github release-docker workflow by @j316chuck in https://github.com/mosaicml/composer/pull/2470
  • Log system metrics on each event by @prithvikannan in https://github.com/mosaicml/composer/pull/2412
  • Fix torch 2.1.0 docker tag by @j316chuck in https://github.com/mosaicml/composer/pull/2472
  • Upstream Generate Callback by @irenedea in https://github.com/mosaicml/composer/pull/2449
  • Bump torch nightly docker image by @j316chuck in https://github.com/mosaicml/composer/pull/2476
  • Test pytorch 2.1.0 docker images on ci/cd by @j316chuck in https://github.com/mosaicml/composer/pull/2469
  • Fix huggingface tokenizer loading for slow tokenizers by @dakinggg in https://github.com/mosaicml/composer/pull/2483
  • Deprecate Fused LayerNorm by @nik-mosaic in https://github.com/mosaicml/composer/pull/2475
  • Transformers upgrade by @dakinggg in https://github.com/mosaicml/composer/pull/2489
  • Update RTD build config with build.os by @bandish-shah in https://github.com/mosaicml/composer/pull/2490
  • Upgrade torch docker version and tests by @j316chuck in https://github.com/mosaicml/composer/pull/2488
  • upgrade node by @j316chuck in https://github.com/mosaicml/composer/pull/2492
  • Gating tying modules w/ FSDP for torch 2.0 by @bcui19 in https://github.com/mosaicml/composer/pull/2467
  • Removing min_params by @bcui19 in https://github.com/mosaicml/composer/pull/2494
  • Fix torchmetrics backwards compatibility issue by @eracah in https://github.com/mosaicml/composer/pull/2468
  • Adding some fixes to FSDP tests by @bcui19 in https://github.com/mosaicml/composer/pull/2495
  • Fail count on mosaicml logger by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2496
  • Remove PR curve metrics from backward compatibility test and skip torch 1.13 by @eracah in https://github.com/mosaicml/composer/pull/2497
  • filter warning by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2500
  • Bump version to 0.16.1 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2498
  • Skip metrics in state dict by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2501
  • Add peak memory stats by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2504
  • Fix sharded ckpt by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2505
  • Bump gitpython from 3.1.31 to 3.1.34 by @dependabot in https://github.com/mosaicml/composer/pull/2509
  • Annotate torch_prof_remote_file_name as Optional by @srstevenson in https://github.com/mosaicml/composer/pull/2512

New Contributors

  • @vivekgoe made their first contribution in https://github.com/mosaicml/composer/pull/2444
  • @irenedea made their first contribution in https://github.com/mosaicml/composer/pull/2437
  • @j316chuck made their first contribution in https://github.com/mosaicml/composer/pull/2463
  • @dbczumar made their first contribution in https://github.com/mosaicml/composer/pull/2442

Full Changelog: https://github.com/mosaicml/composer/compare/v0.16.0...v0.16.1

- Python
Published by mvpatel2000 almost 3 years ago

composer - v0.16.0

What's New

1. New Events (#2264)

Composer now has the events EVAL_BEFORE_ALL and EVAL_AFTER_ALL, which lets users control logging of certain bespoke evaluation information across all evalutors.

2. Elastic Sharded Checkpointing

Traditionally, checkpoints are stored as giant monoliths. For large model training, moving the entire model to 1 node may be infeasible and writing one large file from 1 node may be slow. Composer now supports elastic sharded checkpoints with FSDP, where every rank writes a single shard of the checkpoint. This checkpointing strategy is elastic, which means even if you resume on a different number of GPUs, Composer will handle resumption. To enable sharded checkpointing, it must be specified in the FSDP Config as 'state_dict_type': 'sharded':

``` composermodel = MyComposerModel(nlayers=3)

fsdpconfig = { 'shardingstrategy': 'FULLSHARD', 'statedicttype': 'sharded', 'shardedckptprefixdir': 'ba{batch}-shards' # will save each set of shards checkpoint to a unique folder based on batch }

trainer = Trainer( model=composermodel, maxduration='4ba' fsdpconfig=fsdpconfig, savefolder='checkpoints', saveinterval='2ba', ... ) ```

See the docs for more information in how to integrate this with your project.

Bug Fixes

  • Fixes runtime estimator when using multiple evaluators in https://github.com/mosaicml/composer/pull/2331
  • Fix autoresume docs link in https://github.com/mosaicml/composer/pull/2332
  • Use Enum value when logging hyper-parameters in https://github.com/mosaicml/composer/pull/2386
  • Fix GCSObjectStore to match function signatures of other object stores in https://github.com/mosaicml/composer/pull/2445
  • Cast to float32 before numpy() to avoid bf16 errors in https://github.com/mosaicml/composer/pull/2441

What's Changed

  • Update numpy requirement from <1.25.0,>=1.21.5 to >=1.21.5,<1.26.0 by @dependabot in https://github.com/mosaicml/composer/pull/2316
  • Bump ipykernel from 6.23.1 to 6.23.2 by @dependabot in https://github.com/mosaicml/composer/pull/2317
  • Bump sphinxcontrib-katex from 0.9.5 to 0.9.6 by @dependabot in https://github.com/mosaicml/composer/pull/2319
  • Pin Apex by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2322
  • CodeQL on PRs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2323
  • Add secrets check as part of pre-commit by @karan6181 in https://github.com/mosaicml/composer/pull/2324
  • Update local rank 0 to be elastic by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2321
  • Bump pytest from 7.3.1 to 7.4.0 by @dependabot in https://github.com/mosaicml/composer/pull/2330
  • Bump ipykernel from 6.23.2 to 6.23.3 by @dependabot in https://github.com/mosaicml/composer/pull/2329
  • Auto add mosaicml logger by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2325
  • Add precision config arg for FP8 by @julian-q in https://github.com/mosaicml/composer/pull/2335
  • Fixes daily test failures with respect to autoadd mosaicml logger by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2339
  • In-line group to avoid OOM by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2320
  • Set offloadtocpu True for statedicttype=sharded by @eracah in https://github.com/mosaicml/composer/pull/2338
  • Update version to 15.1 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2341
  • Fix mapi mocking by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2342
  • Change gpu timeout by @rishab-partha in https://github.com/mosaicml/composer/pull/2343
  • Fix testfsdploadoldcheckpoint test to fix daily tests by @eracah in https://github.com/mosaicml/composer/pull/2347
  • Add spaces between sentences in eval label warning by @srstevenson in https://github.com/mosaicml/composer/pull/2327
  • Avoid overwriting seed==0 by @tbenthompson in https://github.com/mosaicml/composer/pull/2352
  • Small Documentation Typo Fixes by @sarthak-314 in https://github.com/mosaicml/composer/pull/2349
  • Fix wandb errror with autoresume issue by @eracah in https://github.com/mosaicml/composer/pull/2353
  • Bump ipykernel from 6.23.3 to 6.24.0 by @dependabot in https://github.com/mosaicml/composer/pull/2360
  • raise min mcli by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2362
  • Add node rank to signal files by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2363
  • Move pydantic pin to deepspeed by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2366
  • Batch log metrics calls in speed_monitor.py by @prithvikannan in https://github.com/mosaicml/composer/pull/2367
  • Read Composer run name env var by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2372
  • Fix typing for args in streaming by @dakinggg in https://github.com/mosaicml/composer/pull/2373
  • Add distributed sync during waitforworkers to avoid timeout for large checkpoints by @dakinggg in https://github.com/mosaicml/composer/pull/2368
  • Update torchmetrics requirement from <0.12,>=0.10.0 to >=0.10.0,<1.1 by @dependabot in https://github.com/mosaicml/composer/pull/2358
  • Add code eval dataset and metric by @rishab-partha in https://github.com/mosaicml/composer/pull/2301
  • Isolate env var in unit tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2379
  • Add extra steps for space free up by @XiaohanZhangCMU in https://github.com/mosaicml/composer/pull/2382
  • regex changed in time.py by @megha95 in https://github.com/mosaicml/composer/pull/2378
  • Support no param models by making optimizer optional by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2374
  • pin identify version to resolve codequality failures by @XiaohanZhangCMU in https://github.com/mosaicml/composer/pull/2391
  • Add ls to object stores by @dakinggg in https://github.com/mosaicml/composer/pull/2376
  • Change transformers by @rishab-partha in https://github.com/mosaicml/composer/pull/2383
  • Respect MLFLow experiment environment variable by @aspfohl in https://github.com/mosaicml/composer/pull/2377
  • Change code eval apikey by @rishab-partha in https://github.com/mosaicml/composer/pull/2394
  • Moves pytest-cpu slack notifications to issues from helpdesk by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2398
  • Add code eval docs by @rishab-partha in https://github.com/mosaicml/composer/pull/2397
  • fixed pre-commit issues with modifications to pretty-format-json args. by @snarayan21 in https://github.com/mosaicml/composer/pull/2392
  • Fix LOCALWORLDSIZE in pytest by @rishab-partha in https://github.com/mosaicml/composer/pull/2407
  • Add code eval secrets to workflows by @rishab-partha in https://github.com/mosaicml/composer/pull/2399
  • Enable Elastic Sharded Checkpointing by @eracah in https://github.com/mosaicml/composer/pull/2262
  • Remove computeonstep from MAP by @priba in https://github.com/mosaicml/composer/pull/2390
  • Save metadata and integration when saveweightsonly is set by @eracah in https://github.com/mosaicml/composer/pull/2396
  • remove unused Trainer docstring arg loadfsdpmonolithrank0only by @eracah in https://github.com/mosaicml/composer/pull/2408
  • torch2.0.1 custom auto wrap by @vchiley in https://github.com/mosaicml/composer/pull/2400
  • Add ruff pre-commit by @Skylion007 in https://github.com/mosaicml/composer/pull/2414
  • Switch google cloud backend from libcloud to google cloud storage API by @XiaohanZhangCMU in https://github.com/mosaicml/composer/pull/2340
  • Updates GPU test timeout to use mcloud flag by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2420
  • Add a EVAL_STANDALONE_START and EVAL_STANDALONE_END events and change RUD to not wait_for_workers every eval by @dakinggg in https://github.com/mosaicml/composer/pull/2418
  • Throttle optimizer monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2419
  • Adding extra condition to avoid running evaltrainmetrics by @furkanbiten in https://github.com/mosaicml/composer/pull/2411
  • fp8 on Ada by @dskhudia in https://github.com/mosaicml/composer/pull/2424
  • Bump coverage[toml] from 7.2.7 to 7.3.0 by @dependabot in https://github.com/mosaicml/composer/pull/2432
  • Bump cryptography from 38.0.4 to 41.0.3 by @dependabot in https://github.com/mosaicml/composer/pull/2436
  • Bump ipykernel from 6.24.0 to 6.25.1 by @dependabot in https://github.com/mosaicml/composer/pull/2434
  • Multilingual compatibility and batching for Code Evaluation by @rishab-partha in https://github.com/mosaicml/composer/pull/2410
  • Update max duration on tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2429
  • Update timeout by @rishab-partha in https://github.com/mosaicml/composer/pull/2438
  • add dist.barrier to rotate_checkpoints by @eracah in https://github.com/mosaicml/composer/pull/2440
  • Bump version to 0.16 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2439
  • Fix notebooks by @rishab-partha in https://github.com/mosaicml/composer/pull/2446
  • Fix notebooks v2 by @rishab-partha in https://github.com/mosaicml/composer/pull/2448

New Contributors

  • @eltociear made their first contribution in https://github.com/mosaicml/composer/pull/2333
  • @antoinebrl made their first contribution in https://github.com/mosaicml/composer/pull/2334
  • @julian-q made their first contribution in https://github.com/mosaicml/composer/pull/2335
  • @srstevenson made their first contribution in https://github.com/mosaicml/composer/pull/2327
  • @tbenthompson made their first contribution in https://github.com/mosaicml/composer/pull/2352
  • @sarthak-314 made their first contribution in https://github.com/mosaicml/composer/pull/2349
  • @prithvikannan made their first contribution in https://github.com/mosaicml/composer/pull/2367
  • @XiaohanZhangCMU made their first contribution in https://github.com/mosaicml/composer/pull/2382
  • @megha95 made their first contribution in https://github.com/mosaicml/composer/pull/2378
  • @snarayan21 made their first contribution in https://github.com/mosaicml/composer/pull/2392
  • @priba made their first contribution in https://github.com/mosaicml/composer/pull/2390
  • @Skylion007 made their first contribution in https://github.com/mosaicml/composer/pull/2414
  • @furkanbiten made their first contribution in https://github.com/mosaicml/composer/pull/2411

Full Changelog: https://github.com/mosaicml/composer/compare/v0.15.0...v0.16.0

- Python
Published by mvpatel2000 almost 3 years ago

composer - v0.15.1

Bug Fixes

This is a patch release that mainly fixes a bug related to autoresume, and changes the default to offload_to_cpu for PyTorch version >2 sharded checkpoints.

What's Changed

  • Fixes daily test failures with respect to autoadd mosaicml logger by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2339
  • Set offloadtocpu True for statedicttype=sharded by @eracah in https://github.com/mosaicml/composer/pull/2338
  • Update version by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2341
  • Fix MAPI mocking by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2342
  • Change GPU timeout by @rishab-partha in https://github.com/mosaicml/composer/pull/2343
  • Add cpu call by @eracah in https://github.com/mosaicml/composer/pull/2347
  • Add spaces between sentences in eval label warning by @srstevenson in https://github.com/mosaicml/composer/pull/2327
  • Avoid overwriting seed=0 by @tbenthompson in https://github.com/mosaicml/composer/pull/2352
  • Small documentation typo fixes by @sarthak-314 in https://github.com/mosaicml/composer/pull/2349
  • Fix wandb errror with autoresume issue by @eracah in https://github.com/mosaicml/composer/pull/2353

Full Changelog: https://github.com/mosaicml/composer/compare/v0.15.0...v0.15.1

- Python
Published by dakinggg almost 3 years ago

composer - v0.15.0

🚀 Composer v0.15.0

What's New

  1. Exact Eval (https://github.com/mosaicml/composer/pull/2218)

Composer now supports exact evaluation! Now, evaluation will give the exact same results regardless of the number of GPUs by removing any duplicated samples from the dataloader.

  1. Monolithic Checkpoint Loading (https://github.com/mosaicml/composer/pull/2288)

When training large models, loading the model and optimizer on every rank can use up all the system memory. With FSDP, Composer can now load the model and optimizer on only rank 0 and broadcast it to all other ranks. To enable:

```python from composer import Trainer

# Construct Trainer trainer = Trainer( ..., fsdpconfig={ loadmonolithrank0only: True }, )

# Train! trainer.fit() ```

and ensure the model on rank 0 is on CPU/GPU (as opposed to meta).

  1. Spin Dataloaders

By default, Composer spins dataloaders back to the current timestamp to ensure deterministic resumption. However, dataloader spinning can be very slow, so Trainer now has a new flag to disable spinning if determinism is not required. To enable:

```python from composer import Trainer

# Construct Trainer trainer = Trainer( ..., spin_dataloaders=False, )

# Train! trainer.fit() ```

Deprecations

  • HealthChecker is now deprecated and will be removed in v0.17.0

Bug Fixes

  • Add support for saving HF info in state dict when using DDP by @dakinggg in https://github.com/mosaicml/composer/pull/2206
  • Change state dict loading default to strict by @dakinggg in https://github.com/mosaicml/composer/pull/2216
  • CE loss vs CE metric equivalence by @dakinggg in https://github.com/mosaicml/composer/pull/2241
  • Move sharded checkpoints into their own intermediate prefix folder by @eracah in https://github.com/mosaicml/composer/pull/2205
  • Fix typo depricated -> deprecated by @eracah in https://github.com/mosaicml/composer/pull/2270
  • Spin dataloader arg by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2267
  • Confirming the output variable has two dimensions before confirming the shape of the second element. by @jimmiemunyi in https://github.com/mosaicml/composer/pull/2275
  • Add loss_dict keyword to closure lambda function by @Landanjs in https://github.com/mosaicml/composer/pull/1952
  • Strip spacing icl by @bmosaicml in https://github.com/mosaicml/composer/pull/2306

What's Changed

  • Update FFCV by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2197
  • Add support for saving HF info in state dict when using DDP by @dakinggg in https://github.com/mosaicml/composer/pull/2206
  • Bump junitparser from 3.0.0 to 3.1.0 by @dependabot in https://github.com/mosaicml/composer/pull/2212
  • Bump sentencepiece from 0.1.98 to 0.1.99 by @dependabot in https://github.com/mosaicml/composer/pull/2208
  • Add docs for Checkpointing with Cloudflare R2 by @eracah in https://github.com/mosaicml/composer/pull/2215
  • Working slack link by @growlix in https://github.com/mosaicml/composer/pull/2217
  • Change state dict loading default to strict by @dakinggg in https://github.com/mosaicml/composer/pull/2216
  • Fix typo in evaluation docs by @dakinggg in https://github.com/mosaicml/composer/pull/2225
  • Clean soft cross entropy by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2227
  • add cmake by @dakinggg in https://github.com/mosaicml/composer/pull/2229
  • Upgrade to mcli0.4, smaller mcli improvements by @aspfohl in https://github.com/mosaicml/composer/pull/2226
  • Bump to torch 2.0.1 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2235
  • Deprecate healthchecker by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2236
  • Update torch 2.0.1 workflows by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2239
  • Log wandb URL to metadata by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2240
  • Bump ipykernel from 6.22.0 to 6.23.1 by @dependabot in https://github.com/mosaicml/composer/pull/2244
  • Update transformers requirement from <4.29,>=4.11 to >=4.11,<4.30 by @dependabot in https://github.com/mosaicml/composer/pull/2245
  • CE loss vs CE metric equivalence by @dakinggg in https://github.com/mosaicml/composer/pull/2241
  • Exact Eval by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2218
  • bump torchmetrics pin by @nik-mosaic in https://github.com/mosaicml/composer/pull/2247
  • Remove deprecated code / torch 1.11 / torch 1.12 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2234
  • Rename backwards_create_graph description by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2248
  • Move sharded checkpoints into their own intermediate prefix folder by @eracah in https://github.com/mosaicml/composer/pull/2205
  • Fix daily tests by fixing testfsdploadoldcheckpoint by @eracah in https://github.com/mosaicml/composer/pull/2249
  • Support for multiple optimizer groups in torch 2.0 + FSDP by @sashaDoubov in https://github.com/mosaicml/composer/pull/2230
  • Change AdamW step to a tensor instead of an int by @eracah in https://github.com/mosaicml/composer/pull/2237
  • Update to cuda 11.8 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2250
  • Fix daily tests by adding s3 secrets to daily-gpu tests by @eracah in https://github.com/mosaicml/composer/pull/2254
  • Typo in s3_prefix: epemeral -> ephemeral 🤦‍♂️ by @eracah in https://github.com/mosaicml/composer/pull/2255
  • Bump yamllint from 1.31.0 to 1.32.0 by @dependabot in https://github.com/mosaicml/composer/pull/2256
  • Bump coverage[toml] from 7.2.5 to 7.2.6 by @dependabot in https://github.com/mosaicml/composer/pull/2258
  • Add callbacks for EVALBEFOREALL and EVALAFTERALL by @rishab-partha in https://github.com/mosaicml/composer/pull/2264
  • Update torch device naming convention for h100 gpus by @vchiley in https://github.com/mosaicml/composer/pull/2265
  • Fix typo depricated -> deprecated by @eracah in https://github.com/mosaicml/composer/pull/2270
  • alerts for daily tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2272
  • Fix daily tests by patching cupy version by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2274
  • Skip ffcv notebook by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2277
  • Spin dataloader arg by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2267
  • Confirming the output variable has two dimensions before confirming the shape of the second element. by @jimmiemunyi in https://github.com/mosaicml/composer/pull/2275
  • Bump coverage[toml] from 7.2.6 to 7.2.7 by @dependabot in https://github.com/mosaicml/composer/pull/2282
  • Patch for tokenizers that have python files in save_pretrained output by @dakinggg in https://github.com/mosaicml/composer/pull/2279
  • fix get file(overwite=True) to properly handle pre-existing files by @bmosaicml in https://github.com/mosaicml/composer/pull/2284
  • Fix Checkpointing Docs Link by @rishab-partha in https://github.com/mosaicml/composer/pull/2278
  • Add errors for Mixed Dataloader Eval by @rishab-partha in https://github.com/mosaicml/composer/pull/2269
  • Fix autoresume with slashed directory by @rishab-partha in https://github.com/mosaicml/composer/pull/2287
  • Delete symlinks when not saving checkpoints locally by @rishab-partha in https://github.com/mosaicml/composer/pull/2285
  • fixed adding tokenizer to hf by @KuuCi in https://github.com/mosaicml/composer/pull/2290
  • New Console Logger Test + Discard before Eval by @rishab-partha in https://github.com/mosaicml/composer/pull/2273
  • Enabled kv caching during generate to speed up QA Task by @bmosaicml in https://github.com/mosaicml/composer/pull/2293
  • Update monai requirement from <1.2,>=0.9.1 to >=0.9.1,<1.3 by @dependabot in https://github.com/mosaicml/composer/pull/2298
  • Bump sphinxcontrib-katex from 0.9.4 to 0.9.5 by @dependabot in https://github.com/mosaicml/composer/pull/2296
  • Training Checkpoint Fix by @KuuCi in https://github.com/mosaicml/composer/pull/2294
  • Update transformers requirement from <4.30,>=4.11 to >=4.11,<4.31 by @dependabot in https://github.com/mosaicml/composer/pull/2295
  • Fixed how savecheckpointtosavefolder called CheckpointSaver object to save state and logger by @KuuCi in https://github.com/mosaicml/composer/pull/2300
  • Update Slack link in README.md by @ejyuen in https://github.com/mosaicml/composer/pull/2261
  • Change progress bar logger to print all eval metrics by @rishab-partha in https://github.com/mosaicml/composer/pull/2286
  • Add pytest clear cache by @rishab-partha in https://github.com/mosaicml/composer/pull/2305
  • Fix tests for wandb and mlflow loggers by @b-chu in https://github.com/mosaicml/composer/pull/2302
  • Monolithic Loading by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2288
  • Add loss_dict keyword to closure lambda function by @Landanjs in https://github.com/mosaicml/composer/pull/1952
  • Strip spacing icl by @bmosaicml in https://github.com/mosaicml/composer/pull/2306
  • Add additional error with auto microbatching by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2308
  • Group autoresume messages by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2307
  • Move deepspeed enabled to state by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2309
  • Jiggling tests and adding gc collect by @bcui19 in https://github.com/mosaicml/composer/pull/2312
  • Monolithic loading improvements by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2313
  • Update version to 0.15 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2315

New Contributors

  • @aspfohl made their first contribution in https://github.com/mosaicml/composer/pull/2226
  • @sashaDoubov made their first contribution in https://github.com/mosaicml/composer/pull/2230
  • @rishab-partha made their first contribution in https://github.com/mosaicml/composer/pull/2264
  • @jimmiemunyi made their first contribution in https://github.com/mosaicml/composer/pull/2275
  • @KuuCi made their first contribution in https://github.com/mosaicml/composer/pull/2290
  • @b-chu made their first contribution in https://github.com/mosaicml/composer/pull/2302

Full Changelog: https://github.com/mosaicml/composer/compare/v0.14.1...v0.15.0

- Python
Published by mvpatel2000 about 3 years ago

composer - v0.14.1

Bug Fixes

Fixes a bug related to sentpiece tokenizers and ICL eval.

What's Changed

  • Update docs to remove gradient clipping in events by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2193
  • remove explorer info from readme by @nik-mosaic in https://github.com/mosaicml/composer/pull/2174
  • bugfix sentpiece by @bmosaicml in https://github.com/mosaicml/composer/pull/2198
  • Fix Broken Training Loop Image Link by @eracah in https://github.com/mosaicml/composer/pull/2199
  • Fix broken image link for GLU by @eracah in https://github.com/mosaicml/composer/pull/2201
  • bugfix sentpiece (#2198) by @bmosaicml in https://github.com/mosaicml/composer/pull/2200
  • Bump version to v0.14.1 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2202
  • Pin protobuf by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2203

Full Changelog: https://github.com/mosaicml/composer/compare/v0.14.0...v0.14.1

- Python
Published by mvpatel2000 about 3 years ago

composer - v0.14.0

:rocket: Composer v0.14.0

Composer v0.14.0 is released! Install via pip:

bash pip install composer==0.14.0

The legacy package name still works via pip:

bash pip install mosaicml==0.14.0

New Features

  1. 🆕 PyTorch 2.0 Support (#2172)

We're thrilled to announce official support for PyTorch 2.0! We've got all initial unit tests passing and run through our examples. We've also made some updates to start taking advantage of all the great new features.

Initial support also includes: * Support for torch.compile | Model | Dataset | Without compile thoughput/samplespersec | With compile thoughput/samplespersec | Performance % | | ------------ | -------- | ----------------------------------------- | -------------------------------------- | ------------- | | ResNet50 | ImageNet | 5557 | 7424 | 33.60% | | DeepLab V3 | ADE20K | 81.60 | 98.82 | 21.10% | | HF BERT | C4 | 3360 | 4259 | 26.75% | | HF Causal LM | C4 | 50.61 | 103.29 | 100.05% |

  To start using, simply add `compile_config` argument to the `Trainer`:
  ```python
    # To use default `torch.compile` config
    trainer = Trainer(
       ...,
       compile_config={},
    )

    # To use custom `torch.compile` config, provide an argument as a dictionary, for example:
    trainer = Trainer(
       ...,
       compile_config={'mode': 'reduce-overhead'},
    )

  ```

  The `Trainer` also supports pre-compiled models passed via the `models` argument. If the model has been pre-compiled, the `compile_config` argument is ignored if provided.

  **Note**: We recommend baselining your model with and without `torch.compile` as there are scenarios where enabling compile does not yield any throughput improvements and in some cases where this can lead to a regression.
  • PyTorch 2.0 Docker Images

    We've added the following new official MosaicML Docker Images with PyTorch 2.0 support: | Linux Distro | Flavor | PyTorch Version | CUDA Version | Python Version | Docker Tags | |----------------|----------|-------------------|---------------------|------------------|---------------------------------------------------------------------------------------------------| | Ubuntu 20.04 | Base | 2.0.0 | 11.7.1 (Infiniband) | 3.10 | mosaicml/pytorch:2.0.0_cu117-python3.10-ubuntu20.04 | | Ubuntu 20.04 | Base | 2.0.0 | 11.7.1 (EFA) | 3.10 | mosaicml/pytorch:2.0.0_cu117-python3.10-ubuntu20.04-aws | | Ubuntu 20.04 | Base | 2.0.0 | cpu | 3.10 | mosaicml/pytorch:2.0.0_cpu-python3.10-ubuntu20.04 | | Ubuntu 20.04 | Vision | 2.0.0 | 11.7.1 (Infiniband) | 3.10 | mosaicml/pytorch_vision:2.0.0_cu117-python3.10-ubuntu20.04 | | Ubuntu 20.04 | Vision | 2.0.0 | cpu | 3.10 | mosaicml/pytorch_vision:2.0.0_cpu-python3.10-ubuntu20.04 |

  1. 🦾 New Callbacks

    • Activation monitor (#2066)

    Monitors activations in the network. Every interval batches it will attach a forwards hook and logs the max, average, l2 norm, and kurtosis for the input and output activations. To enable:

    ```python from composer import Trainer from composer.callbacks import ActivationMonitor

    # Construct Trainer trainer = Trainer( ..., callbacks=[ActivationMonitor()], )

    # Train! trainer.fit() ```

  • Slack Logger (#2133)

    You can now send custom training metrics using Slack! To enable:

    ```python from composer import Trainer from composer.loggers import SlackLogger

    transform = transforms.Compose([transforms.ToTensor()])

    trainer = Trainer( ... loggers=[ SlackLogger( loginterval="10ba", # or 1ep, 2ep includekeys=["algorithmtraces", "loss"], formatterfunc=(lambda data, *kwargs: [ { "type": "section", "text": {"type": "mrkdwn", "text": f"{k}:* {v}"} } for k, v in data.items() ]) ) ], )

    trainer.fit() ```

    Please see PR #2133 for additional details.

API changes

  • The grad_accum argument has been removed from Trainer, users are now required to use device_train_microbatch_size instead (#2040)

Deprecations

  • We no longer support PyTorch 1.11 and 1.12 due to security vulnerabilities. New features will not be tested against these versions.

Bug Fixes

  • Eval subset num batches bug fix (#2028)
  • Protect for missing slack_sdk import (#2031)
  • Adjust HuggingFaceModel token embedding resizing to only occur when necessary (#2027)
  • Update FSDP meta weight tying tests to include precision testing (#2050)
  • Backward Compat with Torchmetrics (#2046)
  • Busy wait for local rank 0 download to avoid timeout on large file download (#2054)
  • Fix OCIObjectStore save_overwrite=False bug (#2053)
  • Busy wait so that non local rank zeros don't timeout while local rank zero downloads a monolithic checkpoint (#2071)
  • Skip extra downloads when not using a format string (#2073)
  • fix nameorpath usage in HF save/load usage (#2075)
  • Fix EMA resumption issue with calling trainer.eval() before trainer.fit() (#2088)
  • Patch EMA with FSDP (#2091)
  • Updating gradient clipping to be torch 2.0 compatible (#2089)
  • Adding checks for weight tying s.t. we don't think None attributes are weight tied (#2103)
  • gate the extra forward call specifically for fsdp (#2102)
  • Allow user to set ONNX opset version when Exporting for Inference (#2101)
  • Runtime estimator (#2124)
  • Use state_dict Torchmetrics Serialization (#2116)
  • Fix filelock in checkpoint download (#2184)

What's Changed

  • Eval subset num batches bug fix by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2028
  • Protect for missing slack_sdk import by @hanlint in https://github.com/mosaicml/composer/pull/2031
  • switch code quality workflow to dev target and smoketest by @dakinggg in https://github.com/mosaicml/composer/pull/2032
  • Generate composer PyPi package by @bandish-shah in https://github.com/mosaicml/composer/pull/2034
  • HealthChecker should only send test message on global rank zero by @hanlint in https://github.com/mosaicml/composer/pull/2035
  • Bump version to 0.13.1 by @bandish-shah in https://github.com/mosaicml/composer/pull/2033
  • Use follow in mcp script by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2022
  • Bump pytest from 7.2.1 to 7.2.2 by @dependabot in https://github.com/mosaicml/composer/pull/2039
  • Bump pypandoc from 1.10 to 1.11 by @dependabot in https://github.com/mosaicml/composer/pull/2038
  • Adds a PR guidelines section to contributing.md by @dakinggg in https://github.com/mosaicml/composer/pull/1993
  • Adjust HuggingFaceModel token embedding resizing to only occur when necessary by @dakinggg in https://github.com/mosaicml/composer/pull/2027
  • Remove deprecated code by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2026
  • test and fix composer package name usage in composercollectenv by @dakinggg in https://github.com/mosaicml/composer/pull/2049
  • Log nodename information in composer by @eracah in https://github.com/mosaicml/composer/pull/2043
  • Update FSDP meta weight tying tests to include precision testing by @bcui19 in https://github.com/mosaicml/composer/pull/2050
  • Backward Compat with Torchmetrics by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2046
  • update fsdp mixed precision by @vchiley in https://github.com/mosaicml/composer/pull/2047
  • Checkpoints Simplified by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2041
  • Add composer PyPI package tests to daily workflow by @bandish-shah in https://github.com/mosaicml/composer/pull/2052
  • Delete composer package GPU workflow by @dakinggg in https://github.com/mosaicml/composer/pull/2055
  • Revert "Checkpoints Simplified (#2041)" by @dakinggg in https://github.com/mosaicml/composer/pull/2056
  • Raise error if attempting to export FSDP model by @hanlint in https://github.com/mosaicml/composer/pull/2051
  • Busy wait for local rank 0 download to avoid timeout on large file download by @dakinggg in https://github.com/mosaicml/composer/pull/2054
  • Fix OCIObjectStore save_overwrite=False bug by @eracah in https://github.com/mosaicml/composer/pull/2053
  • Update docs with non-rank zero logs instructions by @hanlint in https://github.com/mosaicml/composer/pull/2058
  • Pin torchmetrics by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2065
  • Add NO_REENTRANT activation checkpointing by @bmosaicml in https://github.com/mosaicml/composer/pull/2042
  • Allow LPLayerNorm and LPGroupNorm to support self.bias or self.weight = None by @abhi-mosaic in https://github.com/mosaicml/composer/pull/2044
  • Checkpoints Simplified by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2059
  • Add device and dtype back to LPLayerNorm by @abhi-mosaic in https://github.com/mosaicml/composer/pull/2067
  • Revert "Checkpoints Simplified (#2059)" by @dakinggg in https://github.com/mosaicml/composer/pull/2070
  • Busy wait so that non local rank zeros don't timeout while local rank zero downloads a monolithic checkpoint by @dakinggg in https://github.com/mosaicml/composer/pull/2071
  • Add support + test for autoresume with FSDP sharded checkpoints by @dakinggg in https://github.com/mosaicml/composer/pull/2072
  • Skip extra downloads when not using a format string by @dakinggg in https://github.com/mosaicml/composer/pull/2073
  • Bump version to v0.13.2 by @bandish-shah in https://github.com/mosaicml/composer/pull/2068
  • Pin transformers package to <4.27 by @dakinggg in https://github.com/mosaicml/composer/pull/2076
  • Bump coverage[toml] from 7.2.1 to 7.2.2 by @dependabot in https://github.com/mosaicml/composer/pull/2082
  • Update datasets CODEOWNERS by @dakinggg in https://github.com/mosaicml/composer/pull/2084
  • fix nameorpath usage in HF save/load usage by @dakinggg in https://github.com/mosaicml/composer/pull/2075
  • Remove grad accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2040
  • Add support for ICL QA tasks and generation during evaluation with HuggingFaceModel by @dakinggg in https://github.com/mosaicml/composer/pull/2045
  • make composer fsdp work with latest torch by @dskhudia in https://github.com/mosaicml/composer/pull/2078
  • Fix EMA resumption issue with calling trainer.eval() before trainer.fit() by @coryMosaicML in https://github.com/mosaicml/composer/pull/2088
  • Disable wrapping for fsdp if specified by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2086
  • skip fsdp tests for <1.13 by @dakinggg in https://github.com/mosaicml/composer/pull/2090
  • Patch EMA with FSDP by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2091
  • Update Wandb docs with incorrect default by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2092
  • Fix typo by @nik-mosaic in https://github.com/mosaicml/composer/pull/2098
  • Replace broken explorer link by @nik-mosaic in https://github.com/mosaicml/composer/pull/2099
  • Updating gradient clipping to be torch 2.0 compatible by @bcui19 in https://github.com/mosaicml/composer/pull/2089
  • Adding checks for weight tying s.t. we don't think None attributes are weight tied by @bcui19 in https://github.com/mosaicml/composer/pull/2103
  • gate the extra forward call specifically for fsdp by @dakinggg in https://github.com/mosaicml/composer/pull/2102
  • Allow user to set ONNX opset version when Exporting for Inference by @nik-mosaic in https://github.com/mosaicml/composer/pull/2101
  • Seed the fewshot sampling in the ICL datasets by @dakinggg in https://github.com/mosaicml/composer/pull/2100
  • pin mcp by @dakinggg in https://github.com/mosaicml/composer/pull/2111
  • adjust decoding for eval forward by @dakinggg in https://github.com/mosaicml/composer/pull/2107
  • Add sentencepiece support to HuggingFaceModel by @dakinggg in https://github.com/mosaicml/composer/pull/2093
  • Bump yamllint from 1.28.0 to 1.30.0 by @dependabot in https://github.com/mosaicml/composer/pull/2094
  • update transformers to latest version by @dakinggg in https://github.com/mosaicml/composer/pull/2109
  • Bump version to 0.13.3 by @bandish-shah in https://github.com/mosaicml/composer/pull/2115
  • update numpy by @dakinggg in https://github.com/mosaicml/composer/pull/2108
  • Update Export NLP tests by @nik-mosaic in https://github.com/mosaicml/composer/pull/1904
  • Activation monitor by @bcui19 in https://github.com/mosaicml/composer/pull/2066
  • Relax streaming package version check to major version by @karan6181 in https://github.com/mosaicml/composer/pull/2119
  • Bump to 13.4 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2121
  • Auto Microbatching -- The Final Form by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2117
  • add logic for direct instantiation by @dakinggg in https://github.com/mosaicml/composer/pull/2122
  • Runtime estimator by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2124
  • Fix early stopper docs links by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2126
  • Removes MCLI pin by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2127
  • Bump pytest from 7.2.2 to 7.3.0 by @dependabot in https://github.com/mosaicml/composer/pull/2128
  • Bump nbsphinx from 0.8.12 to 0.9.1 by @dependabot in https://github.com/mosaicml/composer/pull/2129
  • Bump ipykernel from 6.20.1 to 6.22.0 by @dependabot in https://github.com/mosaicml/composer/pull/2130
  • Add batch log interval to optimizer monitor by @dakinggg in https://github.com/mosaicml/composer/pull/2132
  • Flush checkpoint on kill by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2125
  • Bump deepspeed from 0.7.7 to 0.8.3 by @dependabot in https://github.com/mosaicml/composer/pull/2131
  • Add flexibility for FSDP Auto Wrap in Composer by @bcui19 in https://github.com/mosaicml/composer/pull/2134
  • Mcloud logger dest by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2136
  • Better defaults for get_num_tokens_in_batch by @dakinggg in https://github.com/mosaicml/composer/pull/2139
  • Adding sharded grad scaler by @bcui19 in https://github.com/mosaicml/composer/pull/2138
  • Bump pytest from 7.3.0 to 7.3.1 by @dependabot in https://github.com/mosaicml/composer/pull/2144
  • Make sure the timestamps of the checkpoints are the same when loading by @eracah in https://github.com/mosaicml/composer/pull/2146
  • Add torch.compile support for torch 2.0 by @karan6181 in https://github.com/mosaicml/composer/pull/2118
  • Fix broken URLs due to docs site refactor by @bandish-shah in https://github.com/mosaicml/composer/pull/2150
  • Ece icl by @bmosaicml in https://github.com/mosaicml/composer/pull/2135
  • Update wandb requirement from <0.14,>=0.13.2 to >=0.13.2,<0.15 by @dependabot in https://github.com/mosaicml/composer/pull/2097
  • Add support for eval_interval and save_interval in tokens by @dakinggg in https://github.com/mosaicml/composer/pull/2149
  • Upgrade to transformers 4.28 by @dakinggg in https://github.com/mosaicml/composer/pull/2152
  • Add PyTorch 2.0.0 image, deprecate PyTorch 1.10 and 1.11 images by @bandish-shah in https://github.com/mosaicml/composer/pull/2077
  • Log Time Attrs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2155
  • EMA + FSDP support by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2157
  • Mvpatel2000/ema fix final by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2158
  • Bump sphinx-copybutton from 0.5.0 to 0.5.2 by @dependabot in https://github.com/mosaicml/composer/pull/2159
  • Bump junitparser from 2.8.0 to 3.0.0 by @dependabot in https://github.com/mosaicml/composer/pull/2160
  • Update wandb requirement from <0.15,>=0.13.2 to >=0.13.2,<0.16 by @dependabot in https://github.com/mosaicml/composer/pull/2161
  • Bump yamllint from 1.30.0 to 1.31.0 by @dependabot in https://github.com/mosaicml/composer/pull/2163
  • Bump sphinxext-opengraph from 0.7.4 to 0.8.2 by @dependabot in https://github.com/mosaicml/composer/pull/2162
  • Bump version to v0.13.5 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2166
  • Icl subcategories by @bmosaicml in https://github.com/mosaicml/composer/pull/2145
  • Add SlackLogger w/ custom formatting to composer/logger by @waiwuc in https://github.com/mosaicml/composer/pull/2133
  • Use state_dict Torchmetrics Serialization by @nik-mosaic in https://github.com/mosaicml/composer/pull/2116
  • Adding in deprecation warning for min_params by @bcui19 in https://github.com/mosaicml/composer/pull/2167
  • Update auto microbatching warning by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2123
  • Add support for torch 2.0 by @dakinggg in https://github.com/mosaicml/composer/pull/2172
  • Fix the daily tests by @dakinggg in https://github.com/mosaicml/composer/pull/2173
  • Fix remote path in daily test by @dakinggg in https://github.com/mosaicml/composer/pull/2177
  • Template icl by @bmosaicml in https://github.com/mosaicml/composer/pull/2137
  • Fix ICL eval for sentencepiece tokenizers by @dakinggg in https://github.com/mosaicml/composer/pull/2178
  • bump flash attentionv ersion by @dakinggg in https://github.com/mosaicml/composer/pull/2180
  • Another attempt to fix the daily tests by @dakinggg in https://github.com/mosaicml/composer/pull/2181
  • Skip backward compatible checkpointing test on older torch versions by @dakinggg in https://github.com/mosaicml/composer/pull/2182
  • Fix space continuation issue for few shot ICL by @dakinggg in https://github.com/mosaicml/composer/pull/2183
  • Bump coverage[toml] from 7.2.2 to 7.2.5 by @dependabot in https://github.com/mosaicml/composer/pull/2188
  • Bump sentencepiece from 0.1.97 to 0.1.98 by @dependabot in https://github.com/mosaicml/composer/pull/2186
  • Fix filelock in checkpoint download by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2184
  • Update warning->info for number of tokens by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2192
  • Bump version to 0.14.0 by @bandish-shah in https://github.com/mosaicml/composer/pull/2190

New Contributors

  • @waiwuc made their first contribution in https://github.com/mosaicml/composer/pull/2133

Full Changelog: https://github.com/mosaicml/composer/compare/v0.13.5...v0.14.0

- Python
Published by bandish-shah about 3 years ago

composer - v0.13.5

Full Changelog: https://github.com/mosaicml/composer/compare/v0.13.4...v0.13.5 - Add support for EMA + FSDP

- Python
Published by mvpatel2000 about 3 years ago

composer - v0.13.4

Full Changelog: https://github.com/mosaicml/composer/compare/v0.13.3...v0.13.4

Bumps streaming version pin to <1.0

- Python
Published by mvpatel2000 about 3 years ago

composer - v0.13.3

:rocket: Composer v0.13.3

Introducing the composer PyPi package!

Composer v0.13.3 is released!

Composer can also now be installed using the new composer PyPi package via pip:

bash pip install composer==0.13.3

The legacy package name still works via pip:

bash pip install mosaicml==0.13.3

Bug Fixes

  • add sentencepiece support by @dakinggg in #2093

What's Changed

  • Bump version to 0.13.3 by @bandish-shah in #2115
  • add missing import by @dakinggg in #2113
  • add sentencepiece support by @dakinggg in #2093
  • Pin mcli version until API change is resolved by @dakinggg in #2111

Full Changelog: https://github.com/mosaicml/composer/compare/v0.13.2...v0.13.3

- Python
Published by bandish-shah about 3 years ago

composer - v0.13.2

:rocket: Composer v0.13.2

Introducing the composer PyPi package!

Composer v0.13.2 is released!

Composer can also now be installed using the new composer PyPi package via pip:

bash pip install composer==0.13.2

The legacy package name still works via pip:

bash pip install mosaicml==0.13.2

Bug Fixes

  • test and fix composer package name usage in composercollectenv (#2049)
  • Backward Compat with Torchmetrics by @mvpatel2000 (#2046)
  • Fix OCIObjectStore save_overwrite=False bug (#2053)
  • busy wait for the rank 0 download (#2071)
  • Skip extra downloads when not using a format string (#2073)

What's Changed

  • Pin transformers package to <4.27 by @dakinggg in #2076
  • Bump version to v0.13.2 (#2068) by @bandish-shah
  • Skip extra downloads when not using a format string by @dakinggg in #2073
  • add support for autoresume + FSDP + sharding by @dakinggg in #2072
  • busy wait for the rank 0 download by @dakinggg in #2071
  • Revert "Checkpoints Simplified (#2059)" by @dakinggg in #2070
  • Add device and dtype back to LPLayerNorm (#2067) by @abhi-mosaic
  • Checkpoints Simplified by @mvpatel2000 in #2059
  • Allow LPLayerNorm and LPGroupNorm to support self.bias or self.weight = None (#2044) by @abhi-mosaic
  • Add NO_REENTRANT activation checkpointing (#2042) by @bmosaicml
  • pin torchmetrics by @mvpatel2000 in #2065
  • Update docs with non-rank zero logs instructions by @hanlint in #2058
  • Fix OCIObjectStore save_overwrite=False bug by @eracah in #2053
  • Busy wait for local rank 0 download to avoid timeout on large file download by @dakinggg in #2054
  • Raise error if attempting to export FSDP model by @hanlint in #2051
  • Revert "Checkpoints Simplified (#2041)" by @dakinggg in #2056
  • Delete composer package GPU workflow by @dakinggg in #2055
  • Add composer PyPI package tests to daily workflow (#2052) by @bandish-shah
  • Checkpoints Simplified by @mvpatel2000 in #2041
  • update fsdp mixed precision by @vchiley in #2047
  • Backward Compat with Torchmetrics by @mvpatel2000 in #2046
  • Update FSDP meta weight tying tests to include precision testing by @bcui19 in #2050
  • Log nodename information in composer by @eracah in #2043
  • test and fix composer package name usage in composercollectenv by @dakinggg in #2049
  • Adjust how HuggingFaceModel handles embedding resizing by @dakinggg in #2027
  • Adds a PR guidelines section to contributing.md by @dakinggg in #1993
  • Bump pypandoc from 1.10 to 1.11 (#2038) by @dependabot[bot]
  • Bump pytest from 7.2.1 to 7.2.2 (#2039) by @dependabot[bot]
  • Use follow in mcp script by @mvpatel2000 in #2022

Full Changelog: https://github.com/mosaicml/composer/compare/v0.13.1...v0.13.2

- Python
Published by bandish-shah about 3 years ago

composer - v0.13.1

:rocket: Composer v0.13.1

Introducing the composer PyPi package!

Composer v0.13.1 is released!

Composer can also now be installed using the new composer PyPi package via pip:

bash pip install composer==0.13.1

The legacy package name still works via pip:

bash pip install mosaicml==0.13.1 Note: The mosaicml==0.13.0 PyPi package was yanked due to some minor packaging issues discovered after release. The package was re-released as Composer v0.13.1, thus these release notes contain details for both v0.13.0 and v0.13.1.

New Features

  1. 🤙 New and Updated Callbacks
* *New `HealthChecker` Callback (#2002)*

    The callback will log a warning if the GPUs on a given node appear to be in poor health (low utilization). The callback can also be configured to send a Slack message!

    ```python
    from composer import Trainer
    from composer.callbacks import HealthChecker

    # Warn if GPU utilization difference drops below 10%
    health_checker = HealthChecker(
        threshold = 10
    )

    # Construct Trainer
    trainer = Trainer(
        ...,
        callbacks=health_checker,
    )

    # Train!
    trainer.fit()
    ```

* *Updated `MemoryMonitor` to use GigaBytes (GB) units (#1940)*

* *New `RuntimeEstimator` Callback (#1991)*

    Estimate the remaining runtime of your job!  Approximates the time remaining by observing the throughput and comparing to the number of batches remaining.

    ```python
    from composer import Trainer
    from composer.callbacks import RuntimeEstimator

    # Construct trainer with RuntimeEstimator callback
    trainer = Trainer(
        ...,
        callbacks=RuntimeEestimator(),
    )

    # Train!
    trainer.fit()
    ```

* *Updated `SpeedMonitor` throughput metrics (#1987)*

    Expands throughput metrics to track relative to several different time units and per device:
    * `throughput/batches_per_sec` and `throughput/device/batches_per_sec`
    * `throughput/tokens_per_sec` and `throughput/device/tokens_per_sec`
    * `throughput/flops_per_sec` and `throughput/device/flops_per_sec`
    * `throughput/device/samples_per_sec`

    Also adds `throughput/device/mfu` metric to compute per device MFU.  Simply enable the `SpeedMonitor` callback per usual to log these new metrics! Please see [SpeedMonitor](https://docs.mosaicml.com/en/latest/api_reference/generated/composer.callbacks.SpeedMonitor.html#composer.callbacks.SpeedMonitor) documentation for more information.
  1. ⣿ FSDP Sharded Checkpoints (#1902)

    Users can now specify the state_dict_type in the fsdp_config dictionary to enable sharded checkpoints. For example:

    ```python from composer import Trainer

    fsdpconfnig = { 'shardingstrategy': 'FULLSHARD', 'statedict_type': 'local', }

    trainer = Trainer( ..., fsdpconfig=fsdpconfig, savefolder='checkpoints', savefilename='ba{batch}rank{rank}.pt', saveinterval='10ba', ) ```

    Please see the PyTorch FSDP docs and Composer's Distributed Training notes for more information.

  2. 🤗 HuggingFace Improvements

*  Update `HuggingFaceModel` class to support encoder-decoder batches without `decoder_input_ids` (#1950)
*  Allow evaluation metrics to be passed to `HuggingFaceModel` directly (#1971)
*  Add a utility function to load a Composer checkpoint of a `HuggingFaceModel` and write out the expected `config.json` and `pytorch_model.bin` in the HuggingFace pretrained folder (#1974)
  1. 🛟 Nvidia H100 Alpha Support - Added amp_fp8 data type

    In preparation for H100's arrival, we've added the amp_fp8 precision type. Currently setting amp_fp8 specifies a new precision context using transformer_engine.pytorch.fp8_autocast. For more details, please see Nvidia's new Transformer Engine and the specific fp8 recipe we utilize.

    ```python from composer import Trainer

    trainer = Trainer( ..., precision='amp_fp8', ) ```

API changes

  • The torchmetrics package has been upgraded to 0.11.x.

    The torchmetrics.Accuracy metric now requires a task argument which can take on a value of binary, multiclass or multilabel. Please see Torchmetrics Accuracy docs for details.

    Additonally, since specifying value='multiclass' requires an additional field of num_classes to be specified, we've had to update ComposerClassifier to accept the additional num_classes argument. Please see PR's #2017 and #2025 for additional details

  • Surgery algorithms used in functional form return a value of None (#1543)

Deprecations

  • Deprecate HFCrossEntropy and Perplexity (#1857)
  • Remove Jenkins CI (#1943, #1954)
  • Change Deprecation Warnings to Warnings for specifying ProgressBarLogger and ConsoleLogger to loggers (#1846)

Bug Fixes

  • Fixed an issue introduced in 0.12.1 where HuggingFaceModel crashes if config.return_dict = False (#1948)
  • Refactor EMA to improve memory efficiency (#1941)
  • Make wandb checkpoint logging compatible with wandb model registry (#1973)
  • Fix ICL race conditions (#1978)
  • Update epoch metric name to trainer/epoch (#1986)
  • reset scaler (#1999)
  • Bug/sync optimization logger across ranks (#1970)
  • Update Docker images to fix resolve vulnerability scan issues (#2007)
  • Fix eval duplicate logging issue (#2018)
  • extend test and patch bug (#2028)
  • Protect for missing slack_sdk import (#2031)

## Known Issues

  • Docker Image Security Vulnerability
    • CVE-2022-45907: The mosaicml/pytorch:1.12.1*, mosaicml/pytorch:1.11.0*, mosaicml/pytorch_vision:1.12.1* and mosaicml/pytorch_vision:1.11.0* images are impacted and currently supported for legacy use cases. We recommend users upgrade to images with PyTorch >1.13. The affected images will be removed in the next Composer release.

What's Changed

  • Raise error if max duration is in epochs and dataloader is infinite by @dakinggg in https://github.com/mosaicml/composer/pull/1942
  • Bump traitlets from 5.8.0 to 5.9.0 by @dependabot in https://github.com/mosaicml/composer/pull/1946
  • Deprecate HFCrossEntropy and Perplexity by @dakinggg in https://github.com/mosaicml/composer/pull/1857
  • Change functional surgery method return values to None by @nik-mosaic in https://github.com/mosaicml/composer/pull/1543
  • Retire Jenkins by @bandish-shah in https://github.com/mosaicml/composer/pull/1943
  • Update MCP GHA Name by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1951
  • update memory monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1940
  • Move ffcv up in test order by @dskhudia in https://github.com/mosaicml/composer/pull/1953
  • Fix memory monitor test by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1957
  • Fix model surgery failure due to functional API change by @nik-mosaic in https://github.com/mosaicml/composer/pull/1949
  • Change how we check for forwards args in models for HF models by @bcui19 in https://github.com/mosaicml/composer/pull/1955
  • add return dict false test and bug fix by @dakinggg in https://github.com/mosaicml/composer/pull/1948
  • remove jenkins ci by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1954
  • add support for enc-dec batches without decoderinputids by @dakinggg in https://github.com/mosaicml/composer/pull/1950
  • Refactor EMA to improve memory efficiency by @coryMosaicML in https://github.com/mosaicml/composer/pull/1941
  • Add warning for untrusted checkpoints by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1959
  • permit opt tokenizer by @bmosaicml in https://github.com/mosaicml/composer/pull/1958
  • GHA Docker build flow for PR's by @bandish-shah in https://github.com/mosaicml/composer/pull/1883
  • Update download badge link to pepy by @karan6181 in https://github.com/mosaicml/composer/pull/1966
  • Update python version in setup.py and fixed pypi download badge by @karan6181 in https://github.com/mosaicml/composer/pull/1969
  • allow eval metrics to be passed in to HuggingFaceModel directly by @dakinggg in https://github.com/mosaicml/composer/pull/1971
  • Make wandb checkpoint logging compatible with wandb model registry by @growlix in https://github.com/mosaicml/composer/pull/1973
  • Add support for FP8 on H100 using NVidia's TransformerEngine by @dskhudia in https://github.com/mosaicml/composer/pull/1965
  • Util for writing HuggingFace save_pretrained from a composer checkpoint by @dakinggg in https://github.com/mosaicml/composer/pull/1974
  • Enable sharded checkpoint save and load (support local, sharded, and full state dicts for FSDP) by @eracah in https://github.com/mosaicml/composer/pull/1902
  • Bump custom-inherit from 2.4.0 to 2.4.1 by @dependabot in https://github.com/mosaicml/composer/pull/1981
  • Bump gitpython from 3.1.30 to 3.1.31 by @dependabot in https://github.com/mosaicml/composer/pull/1982
  • Fix ICL race conditions by @dakinggg in https://github.com/mosaicml/composer/pull/1978
  • add map location to huggingface utils by @dakinggg in https://github.com/mosaicml/composer/pull/1980
  • fix log epoch by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1986
  • GHA release workflow, refactor PR and Daily workflows by @bandish-shah in https://github.com/mosaicml/composer/pull/1968
  • Remove python-version input from Daily CPU tests by @bandish-shah in https://github.com/mosaicml/composer/pull/1989
  • Add some logic to pass the correct github ref to mcp script by @bandish-shah in https://github.com/mosaicml/composer/pull/1990
  • Fix typo in docstring for eval with missing space by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1992
  • Fix failing sharded_checkpoint tests that fail when pytorch 1.13 is not installed by @eracah in https://github.com/mosaicml/composer/pull/1988
  • Add merge_group event trigger to GHA daily workflow by @bandish-shah in https://github.com/mosaicml/composer/pull/1996
  • Runtime estimator by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1991
  • Reset scaler state by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1999
  • Speed monitor refactor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1987
  • Test hf fsdp by @dakinggg in https://github.com/mosaicml/composer/pull/1972
  • Bug/sync optimization logger across ranks by @bmosaicml in https://github.com/mosaicml/composer/pull/1970
  • Fix optimizer monitor test gating with FSDP by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2000
  • Low precision groupnorm by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1976
  • Bump coverage[toml] from 7.1.0 to 7.2.1 by @dependabot in https://github.com/mosaicml/composer/pull/2008
  • Update docs to include runtime estimator by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2009
  • Tag surgery algorithms LPLN and LPGN by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2011
  • Update SpeedMonitor short-description for docs table by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2010
  • Update Low Precision LayerNorm arguments by @nik-mosaic in https://github.com/mosaicml/composer/pull/1994
  • Medical Segmentation Example Typo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2014
  • Update wallclock logging to default hours by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2005
  • Add HealthChecker Callback by @hanlint in https://github.com/mosaicml/composer/pull/2002
  • Allow FX graph mode post-training dynamic quantisation of BlurConv2d operations. by @BrettRyland in https://github.com/mosaicml/composer/pull/1995
  • Add multi-gpu testing to testalgorithmresumption by @eracah in https://github.com/mosaicml/composer/pull/2016
  • Add backwards compatible checkpoint loading for EMA by @coryMosaicML in https://github.com/mosaicml/composer/pull/2012
  • fsdp with custom process groups by @vchiley in https://github.com/mosaicml/composer/pull/2006
  • Patch Speed Monitor MFU by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2013
  • Remove runtime estimator state dict by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2015
  • Update Docker images to fix resolve vulnerability scan issues by @bandish-shah in https://github.com/mosaicml/composer/pull/2007
  • Change Deprecation Warnings to Warnings for specifying ProgressBarLogger and ConsoleLogger to loggers by @eracah in https://github.com/mosaicml/composer/pull/1846
  • Fix eval duplicate logging issue by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2018
  • Add workflow_dispatch trigger to pr-docker workflow by @bandish-shah in https://github.com/mosaicml/composer/pull/2019
  • Bump streaming version to less than 0.4.0 by @karan6181 in https://github.com/mosaicml/composer/pull/2020
  • Upgrade ipython installed in Docker images by @bandish-shah in https://github.com/mosaicml/composer/pull/2021
  • Upgrade torchmetrics by @nik-mosaic in https://github.com/mosaicml/composer/pull/2017
  • Complete upgrade of torchmetrics accuracy by @nik-mosaic in https://github.com/mosaicml/composer/pull/2025
  • Bump version to v0.13.0 by @bandish-shah in https://github.com/mosaicml/composer/pull/2024

New Contributors

  • @BrettRyland made their first contribution in https://github.com/mosaicml/composer/pull/1995

Full Changelog: https://github.com/mosaicml/composer/compare/v0.12.1...v0.13.1

- Python
Published by bandish-shah over 3 years ago

composer - v0.13.0

This release has been yanked due to a minor packaging issue, please skip directly to Composer v0.13.1

What's Changed

  • Raise error if max duration is in epochs and dataloader is infinite by @dakinggg in https://github.com/mosaicml/composer/pull/1942
  • Bump traitlets from 5.8.0 to 5.9.0 by @dependabot in https://github.com/mosaicml/composer/pull/1946
  • Deprecate HFCrossEntropy and Perplexity by @dakinggg in https://github.com/mosaicml/composer/pull/1857
  • Change functional surgery method return values to None by @nik-mosaic in https://github.com/mosaicml/composer/pull/1543
  • Retire Jenkins by @bandish-shah in https://github.com/mosaicml/composer/pull/1943
  • Update MCP GHA Name by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1951
  • update memory monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1940
  • Move ffcv up in test order by @dskhudia in https://github.com/mosaicml/composer/pull/1953
  • Fix memory monitor test by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1957
  • Fix model surgery failure due to functional API change by @nik-mosaic in https://github.com/mosaicml/composer/pull/1949
  • Change how we check for forwards args in models for HF models by @bcui19 in https://github.com/mosaicml/composer/pull/1955
  • add return dict false test and bug fix by @dakinggg in https://github.com/mosaicml/composer/pull/1948
  • remove jenkins ci by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1954
  • add support for enc-dec batches without decoderinputids by @dakinggg in https://github.com/mosaicml/composer/pull/1950
  • Refactor EMA to improve memory efficiency by @coryMosaicML in https://github.com/mosaicml/composer/pull/1941
  • Add warning for untrusted checkpoints by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1959
  • permit opt tokenizer by @bmosaicml in https://github.com/mosaicml/composer/pull/1958
  • GHA Docker build flow for PR's by @bandish-shah in https://github.com/mosaicml/composer/pull/1883
  • Update download badge link to pepy by @karan6181 in https://github.com/mosaicml/composer/pull/1966
  • Update python version in setup.py and fixed pypi download badge by @karan6181 in https://github.com/mosaicml/composer/pull/1969
  • allow eval metrics to be passed in to HuggingFaceModel directly by @dakinggg in https://github.com/mosaicml/composer/pull/1971
  • Make wandb checkpoint logging compatible with wandb model registry by @growlix in https://github.com/mosaicml/composer/pull/1973
  • Add support for FP8 on H100 using NVidia's TransformerEngine by @dskhudia in https://github.com/mosaicml/composer/pull/1965
  • Util for writing HuggingFace save_pretrained from a composer checkpoint by @dakinggg in https://github.com/mosaicml/composer/pull/1974
  • Enable sharded checkpoint save and load (support local, sharded, and full state dicts for FSDP) by @eracah in https://github.com/mosaicml/composer/pull/1902
  • Bump custom-inherit from 2.4.0 to 2.4.1 by @dependabot in https://github.com/mosaicml/composer/pull/1981
  • Bump gitpython from 3.1.30 to 3.1.31 by @dependabot in https://github.com/mosaicml/composer/pull/1982
  • Fix ICL race conditions by @dakinggg in https://github.com/mosaicml/composer/pull/1978
  • add map location to huggingface utils by @dakinggg in https://github.com/mosaicml/composer/pull/1980
  • fix log epoch by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1986
  • GHA release workflow, refactor PR and Daily workflows by @bandish-shah in https://github.com/mosaicml/composer/pull/1968
  • Remove python-version input from Daily CPU tests by @bandish-shah in https://github.com/mosaicml/composer/pull/1989
  • Add some logic to pass the correct github ref to mcp script by @bandish-shah in https://github.com/mosaicml/composer/pull/1990
  • Fix typo in docstring for eval with missing space by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1992
  • Fix failing sharded_checkpoint tests that fail when pytorch 1.13 is not installed by @eracah in https://github.com/mosaicml/composer/pull/1988
  • Add merge_group event trigger to GHA daily workflow by @bandish-shah in https://github.com/mosaicml/composer/pull/1996
  • Runtime estimator by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1991
  • Reset scaler state by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1999
  • Speed monitor refactor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1987
  • Test hf fsdp by @dakinggg in https://github.com/mosaicml/composer/pull/1972
  • Bug/sync optimization logger across ranks by @bmosaicml in https://github.com/mosaicml/composer/pull/1970
  • Fix optimizer monitor test gating with FSDP by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2000
  • Low precision groupnorm by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1976
  • Bump coverage[toml] from 7.1.0 to 7.2.1 by @dependabot in https://github.com/mosaicml/composer/pull/2008
  • Update docs to include runtime estimator by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2009
  • Tag surgery algorithms LPLN and LPGN by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2011
  • Update SpeedMonitor short-description for docs table by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2010
  • Update Low Precision LayerNorm arguments by @nik-mosaic in https://github.com/mosaicml/composer/pull/1994
  • Medical Segmentation Example Typo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2014
  • Update wallclock logging to default hours by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2005
  • Add HealthChecker Callback by @hanlint in https://github.com/mosaicml/composer/pull/2002
  • Allow FX graph mode post-training dynamic quantisation of BlurConv2d operations. by @BrettRyland in https://github.com/mosaicml/composer/pull/1995
  • Add multi-gpu testing to testalgorithmresumption by @eracah in https://github.com/mosaicml/composer/pull/2016
  • Add backwards compatible checkpoint loading for EMA by @coryMosaicML in https://github.com/mosaicml/composer/pull/2012
  • fsdp with custom process groups by @vchiley in https://github.com/mosaicml/composer/pull/2006
  • Patch Speed Monitor MFU by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2013
  • Remove runtime estimator state dict by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2015
  • Update Docker images to fix resolve vulnerability scan issues by @bandish-shah in https://github.com/mosaicml/composer/pull/2007
  • Change Deprecation Warnings to Warnings for specifying ProgressBarLogger and ConsoleLogger to loggers by @eracah in https://github.com/mosaicml/composer/pull/1846
  • Fix eval duplicate logging issue by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2018
  • Add workflow_dispatch trigger to pr-docker workflow by @bandish-shah in https://github.com/mosaicml/composer/pull/2019
  • Bump streaming version to less than 0.4.0 by @karan6181 in https://github.com/mosaicml/composer/pull/2020
  • Upgrade ipython installed in Docker images by @bandish-shah in https://github.com/mosaicml/composer/pull/2021
  • Upgrade torchmetrics by @nik-mosaic in https://github.com/mosaicml/composer/pull/2017
  • Complete upgrade of torchmetrics accuracy by @nik-mosaic in https://github.com/mosaicml/composer/pull/2025
  • Bump version to v0.13.0 by @bandish-shah in https://github.com/mosaicml/composer/pull/2024

New Contributors

  • @BrettRyland made their first contribution in https://github.com/mosaicml/composer/pull/1995

Full Changelog: https://github.com/mosaicml/composer/compare/v0.12.1...v0.13.0

- Python
Published by bandish-shah over 3 years ago

composer - v0.12.1

🚀 Composer v0.12.1

Composer v0.12.1 is released! Install via pip:

bash pip install --upgrade mosaicml==0.12.1

New Features

  1. 📚 In-Context Learning (#1876)

With Composer and MosaicML Cloud you can now evaluate LLMs on in-context learning tasks (LAMBADA, HellaSwag, PIQA, and more) hundreds of times faster than other evaluation harnesses. Please see our "Blazingly Fast LLM Evaluation for In-Context Learning" blog post for more details!

  1. 💾 Added support for Coreweave Object Storage (#1915)

Coreweave object store is compatible with boto3. Uploading objects to Coreweave object store is almost exactly like writing to using S3, except an endpoint_url must be set via the S3_ENDPOINT_URLenvironment variable. For example:

```python import os os.environ['S3ENDPOINTURL'] = 'https://object.las1.coreweave.com'

from composer.trainer import Trainer

# Save checkpoints every epoch to s3://mybucket/checkpoints trainer = Trainer( model=model, traindataloader=traindataloader, maxduration='10ep', savefolder='s3://mybucket/checkpoints', saveinterval='1ep', saveoverwrite=True, savefilename='ep{epoch}.pt', savenumcheckpointsto_keep=0, # delete all checkpoints locally )

trainer.fit()

```

Please see our checkpointing documentation for more details.

  1. 🪵 Automatic logging of Trainer hparams (#1855)

Hyperparameter arguments passed to the Trainer are now automatically logged. Simply set the Trainer argument auto_log_hparams=True.

Bug Fixes

  • Update Docker images to use ‘posix_prefix’ paths (#1854)
  • Disable new notebook in CI (#1875)
  • [Fix] Enable logging of metrics from Callbacks to ConsoleLogging (#1884)
  • Ensure loggers run init event before callbacks in Engine (#1890)
  • Raise an error in FSDP meta tensor initialization if there's no initialization functions, fix associated flaky FSDP test (#1905)
  • Add primitive list support (#1906)
  • Add logic for shifting labels before computing metrics (#1913)
  • Fixes mis specified dependency (#1919)
  • pin setuptools in build requirements (#1926)
  • Pin pip<23 in Docker images (#1936)
  • Fix bug in trainer.eval and add test cases for testconsolelogger (#1937)

What's Changed

  • Rename GradMonitor -> OptimizerMonitor; add functionality to log optimizer-specific metrics to assist loss spike investigation by @bmosaicml in https://github.com/mosaicml/composer/pull/1743
  • Add GCS uri support for loading and saving checkpoints by @eracah in https://github.com/mosaicml/composer/pull/1833
  • HF factory function tests by @dakinggg in https://github.com/mosaicml/composer/pull/1832
  • Fix doc issue, Trainer hparam logtoconsole defaults to False by @eracah in https://github.com/mosaicml/composer/pull/1840
  • Removed YAHP references from Docs by @bandish-shah in https://github.com/mosaicml/composer/pull/1841
  • Typo by @nguyenhoan1988 in https://github.com/mosaicml/composer/pull/1843
  • Fix source code links in docs by @bandish-shah in https://github.com/mosaicml/composer/pull/1844
  • add importorskip by @dakinggg in https://github.com/mosaicml/composer/pull/1847
  • Update Docker images to use ‘posix_prefix’ paths by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1854
  • Fix typo by @standardAI in https://github.com/mosaicml/composer/pull/1849
  • ConsoleLogger: log first batch and first epoch when using consoleloginterval by @eracah in https://github.com/mosaicml/composer/pull/1860
  • Simpler auto log hparams by @eracah in https://github.com/mosaicml/composer/pull/1855
  • Fix typos by @cclauss in https://github.com/mosaicml/composer/pull/1850
  • Bump sphinxext-opengraph from 0.7.3 to 0.7.4 by @dependabot in https://github.com/mosaicml/composer/pull/1851
  • Bump coverage[toml] from 6.5.0 to 7.0.1 by @dependabot in https://github.com/mosaicml/composer/pull/1853
  • Bump traitlets from 5.7.0 to 5.8.0 by @dependabot in https://github.com/mosaicml/composer/pull/1852
  • Bump ipython from 7.32.0 to 8.8.0 by @dependabot in https://github.com/mosaicml/composer/pull/1865
  • Update monai requirement from <0.10,>=0.9.1 to >=0.9.1,<1.2 by @dependabot in https://github.com/mosaicml/composer/pull/1869
  • Bump sphinxcontrib-katex from 0.9.3 to 0.9.4 by @dependabot in https://github.com/mosaicml/composer/pull/1868
  • Bump coverage[toml] from 7.0.1 to 7.0.4 by @dependabot in https://github.com/mosaicml/composer/pull/1867
  • Upgrade docker images to torch==1.13.1 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1863
  • add more useful info to state by @dakinggg in https://github.com/mosaicml/composer/pull/1848
  • Feature/lambada evaluator by @bmosaicml in https://github.com/mosaicml/composer/pull/1845
  • multi-node distributed training, submitit & composer integration demo by @YilunKuang in https://github.com/mosaicml/composer/pull/1753
  • Daily tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1870
  • Disable new notebook in CI by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1875
  • Update deepspeed by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1864
  • fix fail fast in daily by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1880
  • Fix getting started docs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1878
  • Speed up testlmtask_evaluation by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1879
  • Fix unprotected import by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1874
  • add ignore_modules to fsdp by @vchiley in https://github.com/mosaicml/composer/pull/1877
  • Change vision image by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1881
  • Fix eval_forward in the ComposerModel ABC by @eracah in https://github.com/mosaicml/composer/pull/1871
  • Fix fsdp weight tying by @bcui19 in https://github.com/mosaicml/composer/pull/1856
  • Bump pytest from 7.2.0 to 7.2.1 by @dependabot in https://github.com/mosaicml/composer/pull/1886
  • Bump ipykernel from 6.19.2 to 6.20.1 by @dependabot in https://github.com/mosaicml/composer/pull/1887
  • Bump gitpython from 3.1.28 to 3.1.30 by @dependabot in https://github.com/mosaicml/composer/pull/1888
  • Update Vision Image in Pytest by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1882
  • Streaming data tests by @dakinggg in https://github.com/mosaicml/composer/pull/1842
  • Add NLP Algorithms Tests by @nik-mosaic in https://github.com/mosaicml/composer/pull/1839
  • rename HF notebook by @dakinggg in https://github.com/mosaicml/composer/pull/1873
  • Ensure loggers run init event before callbacks in Engine by @eracah in https://github.com/mosaicml/composer/pull/1890
  • [Fix] Enable logging of metrics from Callbacks to ConsoleLogging by @eracah in https://github.com/mosaicml/composer/pull/1884
  • Updating how we load metrics in a state_dict so we don't add extra memory overhead by @bcui19 in https://github.com/mosaicml/composer/pull/1892
  • Getting daily tests passing by @dakinggg in https://github.com/mosaicml/composer/pull/1893
  • Bump nbsphinx from 0.8.10 to 0.8.12 by @dependabot in https://github.com/mosaicml/composer/pull/1897
  • Fix docker image by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1894
  • Add primitive list support by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1906
  • Raise an error in FSDP meta tensor initialization if there's no initialization functions, fix associated flaky FSDP test by @bcui19 in https://github.com/mosaicml/composer/pull/1905
  • Gpu Test by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1907
  • Update docker with FFCV fix by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1908
  • Restore GPU tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1909
  • Update workflow names by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1910
  • Enable daily gpu tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1911
  • Tweak daily GPU tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1912
  • Daily GPU Tests -- Change to Git Commit by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1914
  • Add logic for shifting labels before computing metrics by @alextrott16 in https://github.com/mosaicml/composer/pull/1913
  • Add coreweave object store support. by @eracah in https://github.com/mosaicml/composer/pull/1915
  • Fixes mis specified dependency by @dakinggg in https://github.com/mosaicml/composer/pull/1919
  • Bump coverage[toml] from 7.0.4 to 7.1.0 by @dependabot in https://github.com/mosaicml/composer/pull/1923
  • Update importlib-metadata requirement from <6,>=5.0.0 to >=5.0.0,<7 by @dependabot in https://github.com/mosaicml/composer/pull/1921
  • pin setuptools in build requirements by @dakinggg in https://github.com/mosaicml/composer/pull/1926
  • Remove synthetic testing infrastructure for HF/NLP by @dakinggg in https://github.com/mosaicml/composer/pull/1895
  • Add upgrade flags to pip installs by @dakinggg in https://github.com/mosaicml/composer/pull/1916
  • Temporarily pin pip to <23 by @dakinggg in https://github.com/mosaicml/composer/pull/1930
  • add link protection by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1927
  • Cleaning up error checking for FSDP sharding strategies with fp32 precision by @bcui19 in https://github.com/mosaicml/composer/pull/1925
  • Fix mcp script to avoid follow by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1932
  • Emit Eval progress in console logging by @eracah in https://github.com/mosaicml/composer/pull/1917
  • Remove Fused LayerNorm deprecation by @nik-mosaic in https://github.com/mosaicml/composer/pull/1931
  • Add EFA Support for Multinode in AWS by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1891
  • remove jenkins gpu tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1933
  • Typo due to stale MCLI docs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1934
  • Pin pip<23 in Docker images by @bandish-shah in https://github.com/mosaicml/composer/pull/1936
  • Fix bug in trainer.eval and add test cases for testconsolelogger by @eracah in https://github.com/mosaicml/composer/pull/1937
  • Add few shot and multiple choice to ICL evaluation by @bmosaicml in https://github.com/mosaicml/composer/pull/1876
  • Disable teststreamingdatasets in pytest-daily by @bandish-shah in https://github.com/mosaicml/composer/pull/1939

New Contributors

  • @bmosaicml made their first contribution in https://github.com/mosaicml/composer/pull/1743
  • @nguyenhoan1988 made their first contribution in https://github.com/mosaicml/composer/pull/1843
  • @standardAI made their first contribution in https://github.com/mosaicml/composer/pull/1849
  • @cclauss made their first contribution in https://github.com/mosaicml/composer/pull/1850
  • @YilunKuang made their first contribution in https://github.com/mosaicml/composer/pull/1753
  • @vchiley made their first contribution in https://github.com/mosaicml/composer/pull/1877

Full Changelog: https://github.com/mosaicml/composer/compare/v0.12.0...v0.12.1

- Python
Published by bandish-shah over 3 years ago

composer - v0.12.0

:rocket: Composer v0.12.0

Composer v0.12.0 is released! Install via pip:

bash pip install mosaicml==0.12.0

New Features

  1. 🪵 Logging and ObjectStore Enhancements

    There are multiple improvements to our logging and object store support in this release.

- *Image visualization using our `CometMLLogger` ([#1710](https://github.com/mosaicml/composer/pull/1710))*

    We've added support for using our `ImageVisualizer` callback with [CometML](https://www.comet.com/site/) to log images and segmentation masks to CometML.
    ```python
    from composer.trainer import Trainer

    trainer = Trainer(...,
        callbacks=[ImageVisualizer()],
        loggers=[CometMLLogger()]
    )
    ```

- *Added direct support for [Oracle Cloud Infrastructure (OCI)](https://www.oracle.com/cloud/storage/object-storage/) as an `ObjectStore` ([#1774](https://github.com/mosaicml/composer/pull/1774)) and support for Google Cloud Storage (GCS) via URI ([#1833](https://github.com/mosaicml/composer/pull/1833))*

    To use, you can simply set your `save_folder` or `load_path` to a URI beginning with `oci://` or `gs://`, to save and load with OCI and GCS respectively.
    ```python
    from composer.trainer import Trainer

    # Checkpoint saving to Google Cloud Storage.
    trainer = Trainer(
        model=model,
        save_folder="gs://my-bucket/{run_name}/checkpoints",
        run_name='my-run',
        save_interval="1ep",
        save_filename="ep{epoch}.pt",
        save_num_checkpoints_to_keep=0,  # delete all checkpoints locally
        ...
    )

    trainer.fit()
    ```

- *Added basic support for logging with [MLFlow](https://www.mlflow.org/docs/latest/tracking.html) ([#1795](https://github.com/mosaicml/composer/pull/1795))*

    We've added basic support for using MLFlow to log experiment metrics.
    ```python
    from composer.loggers import MLFlowLogger
    from composer.trainer import Trainer

    mlflow_logger = MLFlowLogger(experiment_name=mlflow_exp_name,
                                 run_name=mlflow_run_name,
                                 tracking_uri=mlflow_uri)
    trainer = Trainer(..., loggers=[mlflow_logger])
    ```

- *Simplified console and progress bar logging ([#1694](https://github.com/mosaicml/composer/pull/1694))*

    To turn off the progress bar, set `progress_bar=False`. To turn on logging directly to the console, set `log_to_console=True`. To control the frequency of logging to console, set `console_log_interval` (e.g. to `1ep` or `1ba`).

- *[`getfile`](https://docs.mosaicml.com/en/latest/apireference/generated/composer.utils.getfile.html) supports URIs ([#1750](https://github.com/mosaicml/composer/pull/1750))*

    Our `get_file` utility now supports URIs directly (`s3://`, `oci://`, and `gs://`) for downloading files.
  1. 🏃‍♀️ Support for Mid-Epoch Resumption with the latest release of Streaming

    We've added support in Composer for the latest release of our Streaming library. This includes awesome new features like instant mid epoch resumption and deterministic shuffling, regardless of the number of nodes. See the Streaming release notes for more!

  2. 🚨 New algorithm - GyroDropout!

    Thanks to @jelite for adding a new algorithm, GyroDropout to Composer! Please see the method card for more details.

  3. 🤗 HuggingFace + Composer improvements

    We've added a new utility to load a 🤗 HuggingFace model and tokenizer out of a Composer checkpoint (#1754), making the pretraining -> finetuning workflow even easier in Composer. Check out the docs for more details, and our example notebook for a full tutorial (#1775)!

  4. 🎓 GradMonitor -> OptimizerMonitor

    Renames our GradMonitor callback to OptimizerMonitor, and adds the ability to track optimizer specific metrics. Check out the docs for more details, and add to your code just like any other callback! ```python from composer.callbacks import OptimizerMonitor from composer.trainer import Trainer

    trainer = Trainer( ..., callbacks=[OptimizerMonitor(logoptimizermetrics=logoptimizermetrics)] ) ```

  5. 🐳 New PyTorch and CUDA versions

    We've expanded our library of Docker images with support for PyTorch 1.13 + CUDA 11.7:

    • mosaicml/pytorch:1.13.0_cu117-python3.10-ubuntu20.04
    • mosaicml/pytorch:1.13.0_cpu-python3.10-ubuntu20.04

    The mosaicml/pytorch:latest, mosaicml/pytorch:cpu_latest and mosaicml/composer:0.12.0 tags are now built from PyTorch 1.13 based images. Please see our DockerHub repository for additional details.

API changes

  1. Replace grad_accum with device_train_microbatch_size (#1749, #1776)

    We're deprecating the grad_accum Trainer argument in favor of the more intuitive device_train_microbatch_size. Instead of thinking about how to divide your specified minibatch into microbatches, simply specify the size of your microbatch. For example, let's say you want to split your minibatch of 2048 into two microbatches of 1024:

    ```python from composer import Trainer

    trainer = Trainer( ..., devicetrainmicrobatch_size=1024, ) ```

    If you want Composer to tune the microbatch for you automatically, enable automatic microbatching as follows:

    ```python from composer import Trainer

    trainer = Trainer( ..., devicetrainmicrobatch_size='auto', ) ```

    The grad_accum argument is still supported but will be deprecated in the next Composer release.

  2. Renamed precisions (#1761)

    We've renamed precision attributes for clarity. The following values have been removed: ['amp', 'fp16', bf16'].

    We have added the following values, prefixed with 'amp' to clarify when an Automatic Mixed Precision type is being used: ['amp_fp16', 'amp_bf16'].

    The fp32 precision value remains unchanged.

Deprecations

  1. Removed support for YAHP (#1512)
  2. Removed COCO and SSD datasets (#1717)
  3. Fully removed Streaming v1 support, please see the mosaicml/streaming project for our next-gen streaming datasets (#1787)
  4. Deprecated FusedLayerNorm algorithm (#1789)
  5. Fully removed grad_clip_norm training argument, please use the GradientClipping algorithm instead (#1768)
  6. Removed data_fit, data_epoch, and data_batch from Logger (#1826)

Bug Fixes

  • Fix FSDP checkpoint strategy (#1734)
  • Fix gradient clipping with FSDP (#1740)
  • Adds more supported FSDP config flags (sync_module_states, forward_prefecth, limit_all_gathers) (#1794)
  • Allow FULL precision with FSDP (#1796)
  • Fix eval_microbatch modification on EVAL_BEFORE_FORWARD event (#1739)
  • Fix algorithm API backwards compatibility in checkpoints (#1741)
  • Fixes a bad None check preventing setting device_id to 0 (#1767)
  • Unregister engine to make cleaning up memory easier (#1769)
  • Fix issue if metric_names is not a list (#1798)
  • Match implementation for list and tensor batch splitting (#1804)
  • Fixes infinite eval issue (#1815)

What's Changed

  • Update installation constraints for streaming by @karan6181 in https://github.com/mosaicml/composer/pull/1661
  • Update decoupledweightdecay.md by @jacobfulano in https://github.com/mosaicml/composer/pull/1672
  • Notebooks part 2 by @dakinggg in https://github.com/mosaicml/composer/pull/1659
  • Add trainer arg for engine passes by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1673
  • Autoload algorithms by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1658
  • Faster metrics calculations + Fix warnings added by the new version of torchmetrics by @dskhudia in https://github.com/mosaicml/composer/pull/1674
  • Update coolname requirement from <2,>=1.1.0 to >=1.1.0,<3 by @dependabot in https://github.com/mosaicml/composer/pull/1666
  • Bump ipykernel from 6.16.0 to 6.16.1 by @dependabot in https://github.com/mosaicml/composer/pull/1667
  • Bump traitlets from 5.4.0 to 5.5.0 by @dependabot in https://github.com/mosaicml/composer/pull/1668
  • Image viz by @dakinggg in https://github.com/mosaicml/composer/pull/1676
  • Update checks for Gated Linear Units Method by @jacobfulano in https://github.com/mosaicml/composer/pull/1575
  • ADE20k streaming factory method by @Landanjs in https://github.com/mosaicml/composer/pull/1626
  • Deyahpify cifar10 by @growlix in https://github.com/mosaicml/composer/pull/1677
  • Nuke YAHP by @hanlint in https://github.com/mosaicml/composer/pull/1512
  • Imagenet streaming factory method by @codestar12 in https://github.com/mosaicml/composer/pull/1649
  • Bump ipykernel from 6.16.1 to 6.16.2 by @dependabot in https://github.com/mosaicml/composer/pull/1683
  • Bump pytest from 7.1.3 to 7.2.0 by @dependabot in https://github.com/mosaicml/composer/pull/1684
  • Bump pypandoc from 1.9 to 1.10 by @dependabot in https://github.com/mosaicml/composer/pull/1680
  • Update py-cpuinfo requirement from <9,>=8.0.0 to >=8.0.0,<10 by @dependabot in https://github.com/mosaicml/composer/pull/1681
  • Uncomment and clean up algorithms documentation by @growlix in https://github.com/mosaicml/composer/pull/1685
  • Update glu check by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1689
  • fix backwards compatability by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1693
  • Fix engine pass registration by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1692
  • Add Low Precision LayerNorm by @nik-mosaic in https://github.com/mosaicml/composer/pull/1525
  • Update codeowners by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1691
  • Add nccl env var by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1695
  • Fix eval timestamp by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1697
  • Update distributed docs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1696
  • Return empty dict if wandb disabled by @dakinggg in https://github.com/mosaicml/composer/pull/1698
  • Autoresume related error messages by @dakinggg in https://github.com/mosaicml/composer/pull/1687
  • Add log_image to wandb, cometml, and LoggerDestination by @eracah in https://github.com/mosaicml/composer/pull/1675
  • Pin PyTorch and supporting package versions by @bandish-shah in https://github.com/mosaicml/composer/pull/1688
  • Add in unit tests for log_image function for CometMLLogger and WandBLogger by @eracah in https://github.com/mosaicml/composer/pull/1701
  • refactor devices by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1699
  • remove as in device by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1704
  • Fix device imports by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1705
  • Fix typing in EMA's moveparamstodevice() by @coryMosaicML in https://github.com/mosaicml/composer/pull/1707
  • Add docs for saving and loading checkpoints with GCS by @eracah in https://github.com/mosaicml/composer/pull/1702
  • Clean up imports by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1700
  • Add rud docs by @eracah in https://github.com/mosaicml/composer/pull/1709
  • Bump cryptography from 38.0.1 to 38.0.3 by @dependabot in https://github.com/mosaicml/composer/pull/1712
  • GHA workflow for code quality checks by @bandish-shah in https://github.com/mosaicml/composer/pull/1719
  • Add support for Path in CheckpointSaver by @cojennin in https://github.com/mosaicml/composer/pull/1721
  • Docs Typo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1723
  • Bump nbsphinx from 0.8.9 to 0.8.10 by @dependabot in https://github.com/mosaicml/composer/pull/1725
  • Bump sphinx-argparse from 0.3.2 to 0.4.0 by @dependabot in https://github.com/mosaicml/composer/pull/1726
  • Simple nlp tests by @dakinggg in https://github.com/mosaicml/composer/pull/1716
  • Build Streaming CIFAR10 Factory Function by @growlix in https://github.com/mosaicml/composer/pull/1729
  • Change build_streaming_cifar10_dataloader() to use v2 by default by @growlix in https://github.com/mosaicml/composer/pull/1730
  • Clear the Optimizer before wrapping with FSDP by @bcui19 in https://github.com/mosaicml/composer/pull/1732
  • Add inf eval check by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1733
  • Fix fsdp checkpoint strategy by @bcui19 in https://github.com/mosaicml/composer/pull/1734
  • Assign eval microbatch to self.state.batch by @dakinggg in https://github.com/mosaicml/composer/pull/1739
  • Add masks to wandblogger.logimage and cometmllogger.logimage and refactor ImageVisualizer to use log_image [WIP] by @eracah in https://github.com/mosaicml/composer/pull/1710
  • Protect backwards compatability by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1741
  • Add composer version state by @dakinggg in https://github.com/mosaicml/composer/pull/1742
  • Adds auto object store creation to get_file by @dakinggg in https://github.com/mosaicml/composer/pull/1750
  • Log console interval by @eracah in https://github.com/mosaicml/composer/pull/1694
  • Bump sphinxcontrib-katex from 0.9.0 to 0.9.3 by @dependabot in https://github.com/mosaicml/composer/pull/1757
  • Bump pandoc from 2.2 to 2.3 by @dependabot in https://github.com/mosaicml/composer/pull/1756
  • Bump cryptography from 38.0.3 to 38.0.4 by @dependabot in https://github.com/mosaicml/composer/pull/1755
  • Add more event tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1762
  • Add python 3.10, pytorch 1.13, cuda 11.7 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1735
  • Add huggingface info to state dict by @dakinggg in https://github.com/mosaicml/composer/pull/1744
  • Global batch size by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1746
  • Add device to state by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1765
  • Rename precisions by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1761
  • Device id none by @dakinggg in https://github.com/mosaicml/composer/pull/1767
  • Autoload HuggingFace model/tokenizer by @dakinggg in https://github.com/mosaicml/composer/pull/1754
  • Supporting train_device_microbatch_size by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1749
  • Switch flash attention to tag by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1766
  • remove grad clip norm by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1768
  • unregister engine for memory cleanup by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1769
  • Fix hf tokenizer test for new hf version by @dakinggg in https://github.com/mosaicml/composer/pull/1772
  • Decrease microbatch size if batch size is smaller by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1771
  • remove deprecated code by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1773
  • cache call to cpuinfo by @dakinggg in https://github.com/mosaicml/composer/pull/1778
  • device train microbatch size pt 2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1776
  • Huggingface pretrain + finetune notebook by @dakinggg in https://github.com/mosaicml/composer/pull/1775
  • Bump traitlets from 5.5.0 to 5.6.0 by @dependabot in https://github.com/mosaicml/composer/pull/1781
  • Bump deepspeed from 0.7.5 to 0.7.6 by @dependabot in https://github.com/mosaicml/composer/pull/1780
  • Minor docs fix for deepspeed typo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1784
  • Update Auto Microbatching by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1785
  • Adding GyroDropout as an algorithm to Composer by @jelite in https://github.com/mosaicml/composer/pull/1718
  • Add Deprecation warning for Fused LayerNorm by @nik-mosaic in https://github.com/mosaicml/composer/pull/1789
  • Update error msgs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1791
  • Change gyro emoji by @nik-mosaic in https://github.com/mosaicml/composer/pull/1792
  • Speeding up tests by @dakinggg in https://github.com/mosaicml/composer/pull/1779
  • Add durations arg to pytest by @dakinggg in https://github.com/mosaicml/composer/pull/1793
  • Properly implement gradient clipping for FSDP by @bcui19 in https://github.com/mosaicml/composer/pull/1740
  • Updating FSDP supported config flags by @bcui19 in https://github.com/mosaicml/composer/pull/1794
  • Remove streaming v1 datasets. by @knighton in https://github.com/mosaicml/composer/pull/1787
  • Remove references to validate in docs by @dakinggg in https://github.com/mosaicml/composer/pull/1800
  • Install latest Git in Docker images by @bandish-shah in https://github.com/mosaicml/composer/pull/1770
  • move to pypi release for flash attn by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1777
  • Check and make sure that metric names is a list of strings by @dakinggg in https://github.com/mosaicml/composer/pull/1798
  • Adding in the possibility of 'None' for MixedPrecision FSDP by @bcui19 in https://github.com/mosaicml/composer/pull/1796
  • Updating assertion check for gradient clipping and updating gradient clip tests for FSDP by @bcui19 in https://github.com/mosaicml/composer/pull/1802
  • Moving Pytest CPU to GHA by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1790
  • Bump sphinxext-opengraph from 0.6.3 to 0.7.3 by @dependabot in https://github.com/mosaicml/composer/pull/1760
  • Update distributed_training.rst by @lupesko in https://github.com/mosaicml/composer/pull/1731
  • Use streaming v3 by @knighton in https://github.com/mosaicml/composer/pull/1797
  • Bump traitlets from 5.6.0 to 5.7.0 by @dependabot in https://github.com/mosaicml/composer/pull/1806
  • Bump ipykernel from 6.16.2 to 6.19.2 by @dependabot in https://github.com/mosaicml/composer/pull/1810
  • Update packaging requirement from <22,>=21.3.0 to >=21.3.0,<23 by @dependabot in https://github.com/mosaicml/composer/pull/1808
  • match list batch splitting and tensor batch splitting by @dakinggg in https://github.com/mosaicml/composer/pull/1804
  • Add type ignore for onnx import by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1811
  • Remove pip install all from coverage action by @dakinggg in https://github.com/mosaicml/composer/pull/1805
  • Remove coco and ssd by @growlix in https://github.com/mosaicml/composer/pull/1717
  • Rename matrix by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1813
  • Add OCI ObjectStore by @eracah in https://github.com/mosaicml/composer/pull/1774
  • Add MLFlowLogger by @eracah in https://github.com/mosaicml/composer/pull/1795
  • Object store docs by @dakinggg in https://github.com/mosaicml/composer/pull/1817
  • fix inf eval by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1815
  • Add fsdp_config to state and add fsdp_config to trainer docstring by @growlix in https://github.com/mosaicml/composer/pull/1821
  • Add SHARP support to docker by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1818
  • Testing Infra Cleanup by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1822
  • Remove dead code in dockerfile by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1823
  • Fix Export Docs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1824
  • Remove old deprecated logger methods by @eracah in https://github.com/mosaicml/composer/pull/1826
  • NLP metrics tests by @dakinggg in https://github.com/mosaicml/composer/pull/1830
  • Nlp pipeline test by @dakinggg in https://github.com/mosaicml/composer/pull/1828
  • Add tests for uri helper functions by @eracah in https://github.com/mosaicml/composer/pull/1827
  • Add pip targets to installation.rst docs by @eracah in https://github.com/mosaicml/composer/pull/1829

New Contributors

  • @cojennin made their first contribution in https://github.com/mosaicml/composer/pull/1721
  • @jelite made their first contribution in https://github.com/mosaicml/composer/pull/1718

Full Changelog: https://github.com/mosaicml/composer/compare/v0.11.1...v0.12.0

- Python
Published by bandish-shah over 3 years ago

composer - v0.11.1

🚀 Composer v0.11.1

Composer v0.11.1 is released! Install via pip:

bash pip install --upgrade mosaicml==0.11.1

Bug Fixes

  • Fixes for Notebooks (#1659)
  • Documentation updates and fixes (#1685, #1696, #1702, #1709)
  • Addressed warnings and speed improvements for Torchmetrics (#1674)
  • Fixes to Gated Linear Units method (#1575, #1689)
  • Set NCCL_ASYNC_ERROR_HANDLING ENV variable in Composer launcher to enable distributed timeout (#1695)
  • Fix epoch count when eval is called before fit (#1697)
  • Constrain PyTorch package versions to avoid unintended upgrades (#1688)
  • Fix Optimizer state sharding issue with FSDP (#1732)
  • Rase ValueError with if evaluation dataloader of infinite length is specified

Full Changelog: https://github.com/mosaicml/composer/compare/v0.11.0...v0.11.1

- Python
Published by bandish-shah over 3 years ago

composer - v0.11.0

🚀 Composer v0.11.0

Composer v0.11.0 is released! Install via pip:

bash pip install --upgrade mosaicml==0.11.0

New Features

  1. 🧰 FSDP Beta Support

    Composer now supports PyTorch FSDP! PyTorch FSDP is a strategy for distributed training, similar to PyTorch DDP, that distributes work using data-parallelism only. On top of this, FSDP uses model, gradient, and optimizer sharding to dramatically reduce device memory requirements, and enables users to easily scale and train large models.

    Here's how easy it is to use FSDP with Composer: ```python import torch.nn as nn from composer import Trainer

    class Block (nn.Module): ...

    Your custom model

    class Model(nn.Module): def init(self, nlayers): super().init() self.blocks = nn.ModuleList([ Block(...) for _ in range(nlayers) ]), self.head = nn.Linear(...) def forward(self, inputs): ...

    # FSDP Wrap Function
    def fsdp_wrap_fn(self, module):
        return isinstance(module, Block)
    
    # Activation Checkpointing Function
    def activation_checkpointing_fn(self, module):
        return isinstance(module, Block)
    

    ComposerModel wrapper, used by the Trainer

    to compute loss, metrics, etc.

    class MyComposerModel(ComposerModel):

    def __init__(self, n_layers):
        super().__init__()
        self.model = Model(n_layers)
        ...
    
    def forward(self, batch):
        ...
    
    def eval_forward(self, batch, outputs=None):
        ...
    
    def loss(self, outputs, batch):
        ...
    

    Pass your ComposerModel and fsdp_config into the Trainer

    composermodel = MyComposerModel(nlayers=3) fsdpconfig = { 'shardingstrategy': 'FULLSHARD', 'minparams': 1e8, 'cpuoffload': False, # Not supported yet 'mixedprecision': 'DEFAULT', 'backwardprefetch': 'BACKWARDPOST', 'activationcheckpointing': False, 'activationcpu_offload': False, 'verbose': True }

    trainer = Trainer( model=composermodel, fsdpconfig=fsdp_config, ... )

    trainer.fit()

    ```

    For more information, please see our FSDP docs.

  2. 🚰 Streaming v0.1

    We've spun off Streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in for Torch IterableDataset, enabling users to stream training data from cloud based object stores. Streaming is shipping with built-in support for popular open source datasets (ADE20K, C4, COCO, Enwiki, ImageNet, etc.)

    To get started, install the Streaming PyPi package: bash pip install mosaicml-streaming

    You can use the streaming Dataset class with the PyTorch native DataLoader class as follows: ```python import torch from streaming import Dataset

    dataloader = torch.utils.data.DataLoader(dataset=Dataset(remote='s3://...')) ```

    For more information, please check out the Streaming docs.

  3. ✔👉 Simplified Checkpointing Interface

    With this release we’ve greatly simplified configuration of loading and saving checkpoints in Composer.

    To save checkpoints to S3, all you need to do is:

    • Specify with save_folder your full URI to your save directory destination (e.g. 's3://my-bucket/{run_name}/checkpoints')
    • Optionally, set save_filename to the pattern you want for your checkpoint file names

    ```python from composer.trainer import Trainer

    Checkpoint saving to S3.

    trainer = Trainer( model=model, savefolder="s3://my-bucket/{runname}/checkpoints", runname='my-run', saveinterval="1ep", savefilename="ep{epoch}.pt", savenumcheckpointsto_keep=0, # delete all checkpoints locally ... )

    trainer.fit() ```

    Likewise, to load checkpoints from S3, all you have to do is: - Set load_path to the full URI to your desired checkpoint file (e.g.'s3://my-bucket/my-run/checkpoints/epoch13.pt')

    ```python from composer.trainer import Trainer

    Checkpoint loading from S3.

    newtrainer = Trainer( model=model, traindataloader=traindataloader, maxduration="10ep", load_path="s3://my-bucket/my-run/checkpoints/ep13.pt", )

    new_trainer.fit()
    

    ```

    For more information, please see our Checkpointing guide.

  4. 𐄳 Improved Distributed Experience

    We’ve made it easier to write your own custom distributed entry points by exposing our distributed API. You can now leverage all of our helpful distributed functions and contexts.

    For example, let's say we want to need to download a dataset in a distributed training application. To avoid race conditions where different ranks try to write the dataset to the same place, we need to ensure that only rank 0 downloads the dataset first:

    ```python import datetime from composer.trainer.devices import DeviceGPU from composer.utils import dist

    dist.initialize(DeviceGPU(), datetime.timedelta(seconds=30)) # Initialize distributed module

    if dist.getlocalrank() == 0: # Download dataset on rank zero dataset = downloadmydataset() dist.barrier() # All ranks wait until dataset is downloaded

    Create and train your model!

    ```

    For more information, please check out our Distributed API docs.

Bug Fixes

  • fix loss and eval_forward for HF models (#1597)
  • add more robust casting to int for fsdp min_params (#1608)
  • Deepspeed Docs Typo (#1605)
  • Fix mmdet typo (#1618)
  • Blurpool idempotent (#1625)
  • When model is not on meta device, initialization should occur on compute device not CPU (#1623)
  • Auto resumption (#1615)
  • Adjust speed monitor (#1645)
  • Hot fix console logging (#1643)
  • Lazy Logging + pretty print dict for hparams (#1653)
  • Fix many failing notebook tests (#1646)

What's Changed

  • Bump coverage[toml] from 6.4.4 to 6.5.0 by @dependabot in https://github.com/mosaicml/composer/pull/1583
  • Bump furo from 2022.9.15 to 2022.9.29 by @dependabot in https://github.com/mosaicml/composer/pull/1584
  • Add English Wikipedia 2020-01-01 dataset by @knighton in https://github.com/mosaicml/composer/pull/1572
  • Add pull request template by @dakinggg in https://github.com/mosaicml/composer/pull/1588
  • Bump ipykernel from 6.15.3 to 6.16.0 by @dependabot in https://github.com/mosaicml/composer/pull/1587
  • Update importlib-metadata requirement from <5,>=4.11.0 to >=5.0,<6 by @dependabot in https://github.com/mosaicml/composer/pull/1585
  • Bump sphinx-argparse from 0.3.1 to 0.3.2 by @dependabot in https://github.com/mosaicml/composer/pull/1586
  • Add step explicitly to ImageVisualizer logging calls by @dakinggg in https://github.com/mosaicml/composer/pull/1591
  • Image viz test by @dakinggg in https://github.com/mosaicml/composer/pull/1592
  • Remove unused fixture by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1594
  • Fixes RandAugment API by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1596
  • fix loss and eval_forward for HF models by @dskhudia in https://github.com/mosaicml/composer/pull/1597
  • Remove tensorflow-io from setup.py by @eracah in https://github.com/mosaicml/composer/pull/1577
  • Fixes enwiki for the newly processed wiki dataset by @dskhudia in https://github.com/mosaicml/composer/pull/1600
  • Change install to all by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1599
  • Remove log level and shouldlogartifact by @dakinggg in https://github.com/mosaicml/composer/pull/1603
  • Add more robust casting to int for fsdp min_params by @dblalock in https://github.com/mosaicml/composer/pull/1608
  • Deepspeed Docs Typo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1605
  • Object store logger refactor by @dakinggg in https://github.com/mosaicml/composer/pull/1601
  • Bump gitpython from 3.1.27 to 3.1.28 by @dependabot in https://github.com/mosaicml/composer/pull/1609
  • Bump tabulate from 0.8.10 to 0.9.0 by @dependabot in https://github.com/mosaicml/composer/pull/1610
  • Log the number of GPUs and nodes Composer running on. by @eracah in https://github.com/mosaicml/composer/pull/1604
  • Update MLPerfCallback for v2.1 by @hanlint in https://github.com/mosaicml/composer/pull/1607
  • Remove object store cls by @dakinggg in https://github.com/mosaicml/composer/pull/1606
  • Add LAMB Optimizer by @hanlint in https://github.com/mosaicml/composer/pull/1613
  • Mmdet adapter by @A-Jacobson in https://github.com/mosaicml/composer/pull/1545
  • Fix mmdet typo by @Landanjs in https://github.com/mosaicml/composer/pull/1618
  • update torchmetrics requirement by @hanlint in https://github.com/mosaicml/composer/pull/1620
  • Add distributed sampler error by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1598
  • Landan/deeplabv3 ade20k example by @Landanjs in https://github.com/mosaicml/composer/pull/1593
  • Upgrade CodeQL Action to version 2 by @karan6181 in https://github.com/mosaicml/composer/pull/1628
  • Blurpool idempotent by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1625
  • Defaulting streaming dataset version to 2 by @karan6181 in https://github.com/mosaicml/composer/pull/1616
  • Abhi/fsdp bugfix 0 11 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1623
  • Remove warning when master_port is auto selected by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1629
  • Remove unused import by @dakinggg in https://github.com/mosaicml/composer/pull/1630
  • Usability improvements to intitialize_dist() by @growlix in https://github.com/mosaicml/composer/pull/1619
  • Remove Graph in Auto Grad Accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1631
  • Auto resumption by @dakinggg in https://github.com/mosaicml/composer/pull/1615
  • add stop method by @hanlint in https://github.com/mosaicml/composer/pull/1627
  • S3 Checkpoint Saving By URI by @eracah in https://github.com/mosaicml/composer/pull/1614
  • S3 Checkpoint loading from URI by @eracah in https://github.com/mosaicml/composer/pull/1624
  • Add mvpatel2000 as codeowner for algos by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1640
  • Adjust speed monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1645
  • Adding in FSDP Docs by @bcui19 in https://github.com/mosaicml/composer/pull/1621
  • Attempt to fix flaky doctest by @dakinggg in https://github.com/mosaicml/composer/pull/1647
  • Fix Missing Underscores in FSDP Docs by @bcui19 in https://github.com/mosaicml/composer/pull/1648
  • Fixed html path for make host command for docs by @karan6181 in https://github.com/mosaicml/composer/pull/1642
  • Fix hyperparameters logged to console even when progressbar and logto_console are False by @eracah in https://github.com/mosaicml/composer/pull/1643
  • Fix ImageNet Example normalization values by @Landanjs in https://github.com/mosaicml/composer/pull/1641
  • Python log level by @dakinggg in https://github.com/mosaicml/composer/pull/1651
  • Changed default logging to WARN for doctests by @eracah in https://github.com/mosaicml/composer/pull/1644
  • Add Event.AFTER_LOAD by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1652
  • Lazy Logging + pretty print dict for hparams by @eracah in https://github.com/mosaicml/composer/pull/1653
  • Fix todo in memory monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1654
  • Tests for Idempotent Surgery by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1639
  • Remove c4 dataset by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1635
  • Update torchmetrics by @hanlint in https://github.com/mosaicml/composer/pull/1656
  • Search index filtered by project by @nqn in https://github.com/mosaicml/composer/pull/1549
  • FSDP Tests by @bcui19 in https://github.com/mosaicml/composer/pull/1650
  • Add composer version to issue template by @dakinggg in https://github.com/mosaicml/composer/pull/1657
  • Fix many failing notebook tests by @dakinggg in https://github.com/mosaicml/composer/pull/1646
  • Re-build the Docker images to resolve pip version error by @bandish-shah in https://github.com/mosaicml/composer/pull/1655

Full Changelog: https://github.com/mosaicml/composer/compare/v0.10.1...v0.11.0

- Python
Published by bandish-shah over 3 years ago

composer - v0.10.1

🚀 Composer v0.10.1

Composer v0.10.1 is released! Install via pip:

bash pip install --upgrade mosaicml==0.10.1

New Features

  1. 𐄷 Weight Standardization

Weight Standardization reparametrizes convolutional weights such that the fan-in dimensions have zero mean and unit standard deviation. This could slightly improve performance at the expensive of 5% lower throughput. This has been used in several papers to train with smaller batch sizes, with normalization layers besides batch norm, and for transfer learning.

Using Weight Standardization with the Composer Trainer:

```python import composer

# Apply Weight Standardization (when training is initialized) weight_std = composer.algorithms.WeightStandardization()

# Train with Weight Standardization trainer = composer.trainer.Trainer( ... algorithms=[weight_std] ) trainer.fit() ```

Using Weight Standardization with the Composer functional interface:

```python import composer from torchvision.models import resnet50

my_model = resnet50()

# Apply weight standardization to model mymodel = composer.functional.weightstandardization(my_model) ```

Please see the Weight Standardization Method Card for more details.

Bug Fixes

  • Fix for checkpoints not being saved automatically at the end of a run (#1552)
  • Fix Onnx export for Composer HuggingFaceModels (#1557)
  • Fix for MIoU metric producing NaN's (#1558)
  • CometML logger documentation updates and fixes (#1567, #1570, #1571)
  • WandB image visualizer fix (#1591)

What's Changed

  • Update evaluate_periodically() when eval interval is of type Duration by @karan6181 in https://github.com/mosaicml/composer/pull/1523
  • Quality of life updates to EMA by @coryMosaicML in https://github.com/mosaicml/composer/pull/1524
  • Add ADE20K and COCO v2 dataset behind a version flag by @karan6181 in https://github.com/mosaicml/composer/pull/1528
  • Pinned setuptools version to fix distutils version error by @karan6181 in https://github.com/mosaicml/composer/pull/1536
  • Less strict name formatting by @hanlint in https://github.com/mosaicml/composer/pull/1535
  • Defaulting streaming dataset version to 1 and add a deprecation warning by @karan6181 in https://github.com/mosaicml/composer/pull/1532
  • Changing 'stable' to 'latest' in notebooks in examples by @bcui19 in https://github.com/mosaicml/composer/pull/1534
  • Bump furo from 2022.6.21 to 2022.9.15 by @dependabot in https://github.com/mosaicml/composer/pull/1540
  • Bump fasteners from 0.17.3 to 0.18 by @dependabot in https://github.com/mosaicml/composer/pull/1538
  • Add Pandoc to Docker images, bump version to 2.19.2 by @bandish-shah in https://github.com/mosaicml/composer/pull/1550
  • Removed streaming version 2 from yaml since version 1 is default by @karan6181 in https://github.com/mosaicml/composer/pull/1551
  • Bump ipykernel from 6.15.2 to 6.15.3 by @dependabot in https://github.com/mosaicml/composer/pull/1548
  • Bump yamllint from 1.27.1 to 1.28.0 by @dependabot in https://github.com/mosaicml/composer/pull/1546
  • Bump traitlets from 5.3.0 to 5.4.0 by @dependabot in https://github.com/mosaicml/composer/pull/1539
  • Object Store Logger Race Condition + EMA Fix by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1552
  • Adding in erroring for when using GradMonitor and DeepSpeed by @bcui19 in https://github.com/mosaicml/composer/pull/1555
  • Bump pypandoc from 1.8.1 to 1.9 by @dependabot in https://github.com/mosaicml/composer/pull/1559
  • Update context to raise errror by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1561
  • Fix MIoU metric when self.total_union==0 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1558
  • Move dataloader initialize_object to factory methods by @hanlint in https://github.com/mosaicml/composer/pull/1510
  • Weight Standardization method by @Landanjs in https://github.com/mosaicml/composer/pull/1562
  • Update comet links to include query params and point to main site by @dakinggg in https://github.com/mosaicml/composer/pull/1567
  • remove dead line in alibi by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1568
  • GLU Fixes by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1564
  • Add FSDP strategy by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1553
  • Comet example by @dakinggg in https://github.com/mosaicml/composer/pull/1570
  • Add missing enabled flag, postclose, and clean up comet ml tests by @dakinggg in https://github.com/mosaicml/composer/pull/1571
  • Consistent Method Card Style by @growlix in https://github.com/mosaicml/composer/pull/1407
  • add missing return in context by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1574
  • Remove eval batch split by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1576
  • Fix Onnx Export for Composer HuggingFaceModels by @nik-mosaic in https://github.com/mosaicml/composer/pull/1557
  • Revert checkpoint rename by @hanlint in https://github.com/mosaicml/composer/pull/1579

New Contributors

  • @bcui19 made their first contribution in https://github.com/mosaicml/composer/pull/1534

Full Changelog: https://github.com/mosaicml/composer/compare/v0.10.0...v0.10.1

- Python
Published by bandish-shah over 3 years ago

composer - v0.10.0

🚀 Composer v0.10.0

Composer v0.10.0 is out! This latest release adds support for CometML Experiment tracking, automatic selection of evaluation batch size, API enhancements for Evaluation/Logging/Metrics and a preview of our new streaming datasets repository!

bash pip install --upgrade mosaicml==0.10.0

New Features

  1. :comet: Comet Experiment Tracking (#1490)

    We've added support for the popular Comet experiment tracker! To enable, simply create the logger and pass it to the Trainer object at initialization:

    ```python from composer import Trainer from composer.loggers import CometMLLogger

    cometml_logger = CometMLLogger()

    trainer = Trainer( ... loggers=[cometml_logger], ) ```

    Please see our Logging and CometMLLogger docs pages for details on usage.

  2. :magic_wand: Automatic Evaluation Batch Size Selection (#1417)

    Composer now supports eval_batch_size='auto', which will choose the right evaluation batch size to avoid CUDA OOMs! Now, in conjunction with grad_accum='auto', you can run the same code on any hardware with no changes necessary. This makes it easy to add evaluation to a training script without having to pick and choose the right batch sizes to avoid CUDA OOMs.

  3. :dart: Evaluation API Changes (#1479)

    The Evaluation API has been updated to be consistent with the Trainer API. If the eval_dataloader was provided to the Trainer during initialization, eval can be invoked without needing to provide anything additional:

    python trainer = Trainer( eval_dataloader=... ) trainer.eval()

    Alternatively, the eval_dataloader can be passed directly to the eval() method:

    python trainer = Trainer( ... ) trainer.eval( eval_dataloader=... )

    The eval_dataloader can be a pytorch dataloader, or for multiple metrics, a list of Evaluator objects.

  4. :wood: Simplified Logging (#1416)

    We've significantly simplified our internal logging interface:

    • Removed the use of LogLevel throughout the logging, which was a mostly unused feature. Filtering logs are the responsibility of the logger.
    • For better compatibility with external logging interfaces such as CometML or Weights & Biases, loggers now support the following methods: log_metrics, log_hyperparameters, and log_artifacts. Previous calls to data_fit, data_epeoch, .. have been removed.
  5. :dart: validate --> eval_forward (#1411 , #1419)

    Previously, ComposerModel implemented the validate(batch: Any) -> Tuple[Any, Any] method which returns an (input, target) tuple, and the Trainer handles updating the metrics. In v0.10, we return the metrics updating control to the user.

    Now, models instead implement def eval_forward(batch: Any) which returns the outputs of evaluation, and also def update_metric(batch, outputs, metric) which updates the metric.

    An example implementation for classification can be found in our ComposerClassifer base class:

    ```python def update_metric(self, batch: Any, outputs: Any, metric: Metric) -> None: _, targets = batch metric.update(outputs, targets)

    def eval_forward(self, batch: Any, outputs: Optional[Any] = None) -> Any:
        return outputs if outputs is not None else self.forward(batch)
    

    ```

  6. :female_detective: Evaluator changes

    The Evaluator class now stores evaluation metric names instead of metric instances. For example:

    python glue_mrpc_task = Evaluator( label='glue_mrpc', dataloader=mrpc_dataloader, metric_names=['BinaryF1Score', 'Accuracy'] )

    These metric names are matched against the metrics returned by the ComposerModel. The metric instances are now stored as deep copies in the State class as state.train_metrics or state.eval_metrics.

  7. :construction: Streaming Datasets Repository Preview

    We're in the process of splitting out streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in replacement for Torch IterableDataset objects and enables you to stream your training data from cloud based object stores. For an early preview, please checkout the Streaming repo.

  8. :x: YAHP deprecation

    We are deprecating support for yahp, our hyperparameter configuration tool. Support for this will be removed in the following minor version release of Composer. We recommend users migrate to OmegaConf, or Hydra as tools.

Bug Fixes

  • Documentation fixes (#1408, #1422, #1425, #1413, #1432, #1403, #1426, #1396, #1446, #1466, #1443)
  • Upgrade WandB version (#1440)
  • fix import (#1442)
  • fix wrong extra deps group (#1449)
  • wandb bug fix (#1488)
  • Reset train metrics every batch (#1496)
  • fix auto grad accum (#1515)
  • Fix compression file remote download exception handling (#1526)
  • Add Pandoc to Docker images, bump version to 2.19.2 (#1550)

What's Changed

  • current metrics docs by @A-Jacobson in https://github.com/mosaicml/composer/pull/1402
  • merge nlp+hf notebooks by @A-Jacobson in https://github.com/mosaicml/composer/pull/1406
  • Add break epoch exception by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1415
  • Upgrade to torch 1.12.1 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1409
  • Metrics refactor pt1 by @ishanashastri in https://github.com/mosaicml/composer/pull/1411
  • Use state algos by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1412
  • Add default ignore index by @moinnadeem in https://github.com/mosaicml/composer/pull/1421
  • Update default hparams for ResNet model card by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1423
  • update colout link in custom speedup notebook by @A-Jacobson in https://github.com/mosaicml/composer/pull/1408
  • Clean up prose in key files by @dblalock in https://github.com/mosaicml/composer/pull/1422
  • Relax codeowners by @bandish-shah in https://github.com/mosaicml/composer/pull/1424
  • Fix typo by @Landanjs in https://github.com/mosaicml/composer/pull/1425
  • Fix pre-commit checks failing on fresh checkout of dev by @dblalock in https://github.com/mosaicml/composer/pull/1414
  • Have docs use preferred import paths, not longest import paths by @dblalock in https://github.com/mosaicml/composer/pull/1413
  • Fix missing indent by @Landanjs in https://github.com/mosaicml/composer/pull/1432
  • evalbatchsize=auto by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1417
  • Simplify helper for conflicting files by @hanlint in https://github.com/mosaicml/composer/pull/1427
  • add install from dev instructions by @A-Jacobson in https://github.com/mosaicml/composer/pull/1403
  • Style/tone consistency update for tutorial notebooks by @alextrott16 in https://github.com/mosaicml/composer/pull/1426
  • Dynamic quantization + minor improvements in inference APIs by @dskhudia in https://github.com/mosaicml/composer/pull/1433
  • Upgrade WandB version by @moinnadeem in https://github.com/mosaicml/composer/pull/1440
  • Log multiple losses by @Landanjs in https://github.com/mosaicml/composer/pull/1375
  • Fix attribute by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1442
  • Expand evaluation doc by @alextrott16 in https://github.com/mosaicml/composer/pull/1396
  • Metrics Refactor Part 2 by @ishanashastri in https://github.com/mosaicml/composer/pull/1419
  • Create dependabot.yml by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1448
  • Methods overview fix by @growlix in https://github.com/mosaicml/composer/pull/1446
  • Bump custom-inherit from 2.3.2 to 2.4.0 by @dependabot in https://github.com/mosaicml/composer/pull/1451
  • Bump junitparser from 2.4.3 to 2.8.0 by @dependabot in https://github.com/mosaicml/composer/pull/1453
  • Update moto[s3] requirement from <3.2,>=3.1.12 to >=4.0.1,<5 by @dependabot in https://github.com/mosaicml/composer/pull/1450
  • Update monai requirement from <0.9,>=0.8.0 to >=0.9.0,<0.10 by @dependabot in https://github.com/mosaicml/composer/pull/1452
  • Update torch-optimizer requirement from <0.2,>=0.1.0 to >=0.3.0,<0.4 by @dependabot in https://github.com/mosaicml/composer/pull/1454
  • Bump cryptography from 37.0.2 to 37.0.4 by @dependabot in https://github.com/mosaicml/composer/pull/1457
  • Bump sphinxext-opengraph from 0.6.1 to 0.6.3 by @dependabot in https://github.com/mosaicml/composer/pull/1458
  • Bump coverage[toml] from 6.3.2 to 6.4.4 by @dependabot in https://github.com/mosaicml/composer/pull/1460
  • Bump nbsphinx from 0.8.8 to 0.8.9 by @dependabot in https://github.com/mosaicml/composer/pull/1459
  • Fix incorrect deps group in streaming requirement by @hanlint in https://github.com/mosaicml/composer/pull/1449
  • Logger Destination Refactor by @eracah in https://github.com/mosaicml/composer/pull/1416
  • Bump sphinx-markdown-tables from 0.0.15 to 0.0.17 by @dependabot in https://github.com/mosaicml/composer/pull/1463
  • Bump traitlets from 5.1.1 to 5.3.0 by @dependabot in https://github.com/mosaicml/composer/pull/1462
  • Bump vit-pytorch from 0.27 to 0.35.8 by @dependabot in https://github.com/mosaicml/composer/pull/1465
  • Bump furo from 2022.3.4 to 2022.6.21 by @dependabot in https://github.com/mosaicml/composer/pull/1467
  • Bump ipykernel from 6.9.2 to 6.15.1 by @dependabot in https://github.com/mosaicml/composer/pull/1470
  • Bump pytest from 7.1.0 to 7.1.2 by @dependabot in https://github.com/mosaicml/composer/pull/1469
  • Bump sphinxcontrib-katex from 0.8.6 to 0.9.0 by @dependabot in https://github.com/mosaicml/composer/pull/1476
  • Bump tabulate from 0.8.9 to 0.8.10 by @dependabot in https://github.com/mosaicml/composer/pull/1478
  • Bump yamllint from 1.26.3 to 1.27.1 by @dependabot in https://github.com/mosaicml/composer/pull/1481
  • Bump ipykernel from 6.15.1 to 6.15.2 by @dependabot in https://github.com/mosaicml/composer/pull/1482
  • Refactor CheckpointSaver by @hanlint in https://github.com/mosaicml/composer/pull/1428
  • Clean up docs Makefile by @eracah in https://github.com/mosaicml/composer/pull/1466
  • Model surgery info -> debug by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1485
  • Docker image with Flash Attention by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1471
  • Fix WandBLogger bug with inaccurate step count by @eracah in https://github.com/mosaicml/composer/pull/1488
  • Update Eval API by @hanlint in https://github.com/mosaicml/composer/pull/1479
  • Random Names with Fixed Seed by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1487
  • ResNet50 on ImageNet training script example by @Landanjs in https://github.com/mosaicml/composer/pull/1434
  • Remove hparams from test_precision and test_state by @hanlint in https://github.com/mosaicml/composer/pull/1486
  • Clean up save_checkpoint by @hanlint in https://github.com/mosaicml/composer/pull/1484
  • Remove hparams from test_ddp by @hanlint in https://github.com/mosaicml/composer/pull/1489
  • update model token embeddings according to tokenizer len by @ananyahjha93 in https://github.com/mosaicml/composer/pull/1493
  • BERT classifier metrics depend on num_labels by @alextrott16 in https://github.com/mosaicml/composer/pull/1495
  • Reset train metrics every batch by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1496
  • Algolia doc search by @nqn in https://github.com/mosaicml/composer/pull/1443
  • Squelch Engine debug logs by @hanlint in https://github.com/mosaicml/composer/pull/1497
  • Remove TODO by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1499
  • Remove hparams from checkpoint tests by @hanlint in https://github.com/mosaicml/composer/pull/1491
  • [Docs] Training ResNet-50 on AWS tutorial by @bandish-shah in https://github.com/mosaicml/composer/pull/1444
  • Refactor hparams in tests by @hanlint in https://github.com/mosaicml/composer/pull/1498
  • Bump pytest from 7.1.2 to 7.1.3 by @dependabot in https://github.com/mosaicml/composer/pull/1500
  • Improved comments and improved test code by @karan6181 in https://github.com/mosaicml/composer/pull/1502
  • Refactor GLUE fine-tune queuing to improve efficiency and add task-specific seed sweeps by @alextrott16 in https://github.com/mosaicml/composer/pull/1363
  • Raise ValueError for Profiler + Auto Grad Accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1504
  • add yahp deprecation warnings by @hanlint in https://github.com/mosaicml/composer/pull/1505
  • Move logic from initialize_object to object store class by @hanlint in https://github.com/mosaicml/composer/pull/1508
  • Fix run name comment by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1509
  • Add CometML Support by @eracah in https://github.com/mosaicml/composer/pull/1490
  • Raise ValueError if missing a surgery algorithm by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1506
  • remove datasets from gitignore by @hanlint in https://github.com/mosaicml/composer/pull/1513
  • fix auto grad accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1515
  • Use eval context by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1516
  • Update tensorflow-io requirement from <0.27,>=0.26.0 to >=0.26.0,<0.28 by @dependabot in https://github.com/mosaicml/composer/pull/1522
  • Bump cryptography from 37.0.4 to 38.0.1 by @dependabot in https://github.com/mosaicml/composer/pull/1521
  • Fix SAM loss by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1518
  • Fixed remote path in streaming dataloader facesynthetics jupyter notebook by @karan6181 in https://github.com/mosaicml/composer/pull/1519
  • Rework auto grad accum checks by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1517
  • [xs] remove libcloudhparams from test_filehelpers.py by @hanlint in https://github.com/mosaicml/composer/pull/1514
  • Add v2 datasets behind a version flag by @knighton in https://github.com/mosaicml/composer/pull/1507
  • Fix compression file remote download exception handling. by @knighton in https://github.com/mosaicml/composer/pull/1526

New Contributors

  • @ananyahjha93 made their first contribution in https://github.com/mosaicml/composer/pull/1493

Full Changelog: https://github.com/mosaicml/composer/compare/v0.9.0...v0.10.0

- Python
Published by bandish-shah almost 4 years ago

composer - v0.9.0

🚀 Composer v0.9.0

Excited to share the release of Composer v0.9.0, which comes with an Inference Export API, beta support for Apple Silicon and TPU training, as well as expanded usability of NLP-related speed-up methods. This release includes 175 commits from 34 contributors, including 10 new contributors :raised_hands: !

bash pip install --upgrade mosaicml==0.9.0 Alternatively, install Composer with Conda:

bash conda install -c mosaicml mosaicml=0.9.0

New Features

  1. :package: Export for inference APIs

    Train with Composer and deploy anywhere! We have added a dedicated export API as well as an export training callback to allow you to export Composer-trained models for inference, supporting popular formats such as torchscript and ONNX.

    For example, here’s how to export a model in torchscript format:

    ```python from composer.utils import exportforinference

    Invoking export with a trained model

    exportforinference(model=model, saveformat='torchscript', savepath=modelsavepath) ```

    Here’s an example of using the training callback, which automatically exports the model at the end of training to ONNX format:

    ```python from composer.callbacks import ExportForInferenceCallback

    Initializing Trainer with the export callback

    callback = ExportForInferenceCallback(saveformat='onnx', savepath=modelsavepath) trainer = Trainer(model=model, callbacks=callback, traindataloader=dataloader, maxduration='10ep')

    Model will be exported at the end of training

    trainer.fit() ```

    Please see our Exporting for Inference notebook for more information.

  2. :chartwithupwards_trend: ALiBi support for BERT training

    You can now use ALiBi (Attention with Linear Biases; Press et al., 2021) when training BERT models with Composer, delivering faster training and higher accuracy by leveraging shorter sequence lengths.

    ALiBi improves the quality of BERT pre-training, especially when pre-training uses shorter sequence lengths than the downstream (fine-tuning) task. This allows models with ALiBi to reach higher downstream accuracy with less pre-training time.

    Example of using ALiBi as an algorithm with the Composer Trainer:

    ```python

    Create an instance of a BERT masked language model

    model = composer.models.createbertmlm()

    Apply ALiBi (when training is initialized)

    alibi = composer.algorithms.alibi(maxsequencelength=1024)

    Train with ALiBi

    trainer = composer.trainer.Trainer( model=model, traindataloader=traindataloader, algorithms=[alibi] ) trainer.fit() ```

    Example using the Composer Functional API:

    ```python import composer.functional as cf

    Create an instance of a BERT masked language model

    model = composer.models.createbertmlm()

    Apply ALiBi and expand the model's maximum sequence length to 1024

    cf.applyalibi(model=model, maxsequence_length=1024) ```

    AliBi can also now be extended to work with custom models by registering your attention and embedding layers. Please see our ALiBi method card for more information.

  3. 🧐 Entry point for GLUE tasks pre-training and fine-tuning

    You can now easily pre-train and fine-tune NLP models across all GLUE (General Language Understanding Evaluation) tasks through one simple entry point! The entry point handles model saving and loading, spawns GLUE tasks in parallel across all available GPUs, and delivers a highly efficient evaluation of model performance.

    Example of launching the entrypoint:

    ```bash

    This runs pre-training followed by fine-tuning.

    --training_scheme can take either pretrain, finetune, or all depending on the task!

    python rungluetrainer.py -f glueexample.yaml --trainingscheme all ```

    Please see our GLUE entrypoint notebook for more information.

  4. 🤖 TPU support (in beta)

    You can now use Composer to train your models on TPUs! Support is now available in Beta, and currently only supports single-core TPU training. Try it out, explore optimizations, and share your feedback and feature requests with us so we can make it better for you and for the community.

    To use TPUs with Composer, simply specify a tpu device:

    ```python

    Set device to tpu

    trainer = composer.trainer.Trainer( model=model, traindataloader=traindataloader, maxduration=trainepochs, device='tpu')

    Run fit

    trainer.fit() ```

    Please see our Training with TPUs notebook for more information.

  5. :apple: Apple Silicon support (beta)

    Leverage Apple Silicon chips to train your models with Composer by providing the device='mps' argument:

    python trainer = Trainer( ..., device='mps' )

    We use the latest PyTorch MPS backend to execute the training. This requires torch version ≥1.12, and Max OSX 12.3+.

    For more information on training with Apple M chips, see the PyTorch 1.12 blog and our API Reference for Composer specific details.

  6. :construction: Contrib repository

    Got a new method idea, or published a paper and want those methods to be easily accessible? We’ve created the mcontrib repository, with a lightweight process to contribute new algorithms. We’re happy to work directly with you to benchmark these methods and eventually “promote” them to Composer for use by end customers.

    Please checkout the README for details on how to contribute a new algorithm. For more details on how to write speed-up methods, see our notebook on custom speed-up methods.

Additional API Changes

  1. :1234: Passes Module

    The order in which algorithms are run matters significantly during composition. With this release we refactored algorithm passes into their own passes module. Users can now register custom passes (for custom algorithms) with the Engine. Please see #1377 for more information.

  2. :file_cabinet: Default Checkpoint Extension

    The CheckpointSaver now defaults to using the *.pt extension for checkpoint fienames. Please see #1370 for more information.

  3. :eye: Models Refactor

    Most vision models (ResNet, MNIST, ViT, EfficientNet) have been refactored from classes to a factory function. For example ComposerResNet -> composer_resnet.

    ```python

    before

    from composer.models import ComposerResNet model = ComposerResNet(..)

    from composer.models import composerresnet # after model = composerresnet(..) ```

    The same refactor has been done for NLP as well, e.g. BERTModel -> create_bert_mlm and create_bert_classification.

    See #1227 (vision) and #1130 (NLP) for more details.

  4. :heavyplussign: Misc API Changes

* `BreakEpochException` has been removed.
* `state.is_model_deepspeed` has been moved to `composer.utils.is_model_deepspeed`.
* Helper function `monitored_barrier` has been added to `composer` distributed.

Bug Fixes

  • Add informative error for infer batch size issues (#1401)
  • Fix ImagenetDatasetHparams bug (#1392), resolves #1111
  • Fix hparams error condition checking (#1394)
  • Fix AMP resumption with grad scaler (#1376)
  • Auto Grad Accum Cache Clearing (#1380), fixes issue reported in #1331
  • Fix default precision (#1369)
  • Fix the profiler on multi-node training (#1358), resolves #1270
  • Retry SFTP on Size Mismatch (#1300)
  • Fix scheduler edge cases (#1350), resolves #1077
  • Fix a race condition in the object store logger (#1328)
  • Fix WandB load from checkpoint (#1326)
  • Fix Notebook Progress Bars (#1313)

Commits

What's Changed

  • Fix DeepSpeed typo in docstring by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1188
  • Move grad_accum logging to every step by @coryMosaicML in https://github.com/mosaicml/composer/pull/1187
  • Update STYLE_GUIDE with details on Documentation by @bandish-shah in https://github.com/mosaicml/composer/pull/1183
  • ProgressBar Units by @hanlint in https://github.com/mosaicml/composer/pull/1190
  • Added Xavier Normal initializer by @vladd-i in https://github.com/mosaicml/composer/pull/1196
  • Updated cost figure by @nqn in https://github.com/mosaicml/composer/pull/1180
  • Remove algorithm yamls by @hanlint in https://github.com/mosaicml/composer/pull/1193
  • Fix the Composer Launch Script for the Composer Dockerimage; Default nproc = torch.cuda.device_count() if not specified via env by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1195
  • Bert model card by @A-Jacobson in https://github.com/mosaicml/composer/pull/1198
  • Add Notes on Early Stopping by @anisehsani in https://github.com/mosaicml/composer/pull/1182
  • Stochastic depth that preserves weights by @Landanjs in https://github.com/mosaicml/composer/pull/1085
  • Adding Gated Linear Units as an algorithm by @moinnadeem in https://github.com/mosaicml/composer/pull/1192
  • A utility to fuse parallel linear layers in FX-traced models by @dskhudia in https://github.com/mosaicml/composer/pull/1189
  • Build+push Composer dockerimages to mosaicml/composer_staging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1197
  • Fix the SFTP Object Store by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1202
  • Bert emoji by @A-Jacobson in https://github.com/mosaicml/composer/pull/1205
  • Adding a constant warmup scheduler by @linden-li in https://github.com/mosaicml/composer/pull/1203
  • Fix multi-GPU conflicts when downloading torchvision datasets by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1201
  • Add caveats about automatic gradient accumulation by @hanlint in https://github.com/mosaicml/composer/pull/1207
  • Remove the composer_train entrypoint; put it back in examples by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1211
  • Fix Composer staging dockerimages by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1210
  • Set SFTP Object Store Private Key Filepath from an Environ by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1212
  • [xs] Fix progress bars in get_file by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1216
  • Cleanup SFTP url parsing for StreamingDataset by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1217
  • Fix Symlinks on Non-Libcloud Object Stores by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1209
  • Fix the ObjectStoreLogger with Overwrite=True by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1208
  • Throughput metrics by @linden-li in https://github.com/mosaicml/composer/pull/1215
  • Fix module surgery for training resumptions with optimizers that save state by @dskhudia in https://github.com/mosaicml/composer/pull/1200
  • Update bert-base.yaml by @moinnadeem in https://github.com/mosaicml/composer/pull/1219
  • StreamingDataset: make remote optional, attempt to prettify docstrings. by @knighton in https://github.com/mosaicml/composer/pull/1220
  • Update vision-style StreamingDatasets to subclass VisionDataset by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1223
  • Improve docstrings. by @knighton in https://github.com/mosaicml/composer/pull/1222
  • shardwise zip streaming datasets by @milocress in https://github.com/mosaicml/composer/pull/1177
  • updated mosaic logos to composer logos in docs by @ejyuen in https://github.com/mosaicml/composer/pull/1221
  • Add COMPOSER_KNOWN_HOSTS_FILENAME for setting the sftp known hosts file environ by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1224
  • StreamingDataset: correctly handle exceptions in child download thread. by @knighton in https://github.com/mosaicml/composer/pull/1228
  • hot fix compression 404 by @milocress in https://github.com/mosaicml/composer/pull/1229
  • Treat any dropped SSH/SFTP connection as a transient error by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1225
  • refactor bert and gpt by @A-Jacobson in https://github.com/mosaicml/composer/pull/1130
  • Hotfix for S3 FileNotFoundError by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1233
  • Fix StreamingDataset compression with multi-rank by @milocress in https://github.com/mosaicml/composer/pull/1231
  • Refactor vision models by @Landanjs in https://github.com/mosaicml/composer/pull/1227
  • Update resnet50_medium.yaml by @lupesko in https://github.com/mosaicml/composer/pull/1235
  • Increase default timeout for StreamingC4 to 120s by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1234
  • Add Debug Log Statements; Fix Pyright by @hanlint in https://github.com/mosaicml/composer/pull/1218
  • Hotfix deeplabv3 by @Landanjs in https://github.com/mosaicml/composer/pull/1238
  • Add Tensorboard Logger by @eracah in https://github.com/mosaicml/composer/pull/1194
  • Move the model and optimizers to the device before Event.INIT by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1084
  • Fix bug in streaming iteration/downloading, refactor by @knighton in https://github.com/mosaicml/composer/pull/1239
  • Support sequence of losses in backwards pass by @Landanjs in https://github.com/mosaicml/composer/pull/1240
  • Add device_id param to DeviceGPU by @ishanashastri in https://github.com/mosaicml/composer/pull/1244
  • Update CutMix to work with segmentation style labels by @coryMosaicML in https://github.com/mosaicml/composer/pull/1230
  • Catching ChannelErrors on SFTP Failures by @moinnadeem in https://github.com/mosaicml/composer/pull/1245
  • Make StreamingDataset compression file easier to write/read by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1246
  • [XS] Updating console progressbar logger to use maxduration units by @moinnadeem in https://github.com/mosaicml/composer/pull/1243
  • Catch botocore ClientError 403 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1249
  • Tensorboard Notebook + Tutorial by @eracah in https://github.com/mosaicml/composer/pull/1250
  • Fix repeated words in event.py by @isaac0804 in https://github.com/mosaicml/composer/pull/1254
  • Make progressive resizing quieter by @coryMosaicML in https://github.com/mosaicml/composer/pull/1255
  • fix typo in example by @xloem in https://github.com/mosaicml/composer/pull/1259
  • Create a new boto3.Session() per S3ObjectStore instance by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1260
  • Fix recipe yamls for v0.8, add testing by @hanlint in https://github.com/mosaicml/composer/pull/1257
  • Automatic Stochastic depth on residual blocks by @dskhudia in https://github.com/mosaicml/composer/pull/1253
  • Sequence length warmup update and tests by @alextrott16 in https://github.com/mosaicml/composer/pull/1199
  • ProgressBarLogger UX Enhancements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1264
  • Update to latest pytorch by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1262
  • Add packaging to meta.yaml; add py-cpuinfo max version by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1271
  • Fix Flaky Tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1272
  • Add callback for visualizing image inputs and outputs by @coryMosaicML in https://github.com/mosaicml/composer/pull/1266
  • Add scale_warmup argument to schedulers by @hanlint in https://github.com/mosaicml/composer/pull/1268
  • Switch Jenkins to r1z3 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1277
  • BERT and C4 updates by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1252
  • Default to allow_tf32=True for GPU Devices by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1275
  • Fix grad accum parsing in hparams by @hanlint in https://github.com/mosaicml/composer/pull/1256
  • Fix issue with doctest format in some docstring examples by @Landanjs in https://github.com/mosaicml/composer/pull/1269
  • Adds S3ObjectStore import to util init.py by @codestar12 in https://github.com/mosaicml/composer/pull/1274
  • Add tutorial on exporting for inference by @hanlint in https://github.com/mosaicml/composer/pull/1276
  • HTTPS downloads for streaming datasets by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1258
  • object stores for streaming datasets by @milocress in https://github.com/mosaicml/composer/pull/1248
  • Allow object name prefix for S3ObjectStore by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1278
  • Hotfix CO-658 by @milocress in https://github.com/mosaicml/composer/pull/1273
  • Fix S3 remote paths for StreamingDataset download by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1280
  • Add combo loss to DeepLabv3+ by @Landanjs in https://github.com/mosaicml/composer/pull/1265
  • Checkpoint backwards compatibility for ProgressBar by @hanlint in https://github.com/mosaicml/composer/pull/1287
  • Add missing callbacks by @hanlint in https://github.com/mosaicml/composer/pull/1286
  • Fix S3 prefix upload/download by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1288
  • Fix device inference in module surgery by @hanlint in https://github.com/mosaicml/composer/pull/1290
  • Actual fix to backwards compatibility by @hanlint in https://github.com/mosaicml/composer/pull/1289
  • Bugs in getting_started.ipynb by @rahulvigneswaran in https://github.com/mosaicml/composer/pull/1285
  • Add pytorch 1.12.0 docker image by @linden-li in https://github.com/mosaicml/composer/pull/1247
  • Fix TB Logger + ObjectStore quadratic complexity issue by doing 1 file per flush by @eracah in https://github.com/mosaicml/composer/pull/1283
  • Enable README Doctests with GPUs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1279
  • Fix logging of hparams to object stores by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1297
  • [xs] Reformat the Composer Version String by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1301
  • Add monitored barrier for autograd accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1295
  • [xs] Notebook Fixes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1299
  • [xs] Store the Composer version in one place. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1302
  • model export for inference. Functional API by @dskhudia in https://github.com/mosaicml/composer/pull/1294
  • Add a return_outputs flag to predict() by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1307
  • Integration Testing by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1305
  • Fix get_file_artifact in the WandBLogger to work on all ranks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1304
  • Add documentation about run_name to Composer by @eracah in https://github.com/mosaicml/composer/pull/1298
  • Enforce FusedLayerNorm is ordered last by @alextrott16 in https://github.com/mosaicml/composer/pull/1309
  • Revert monitored barrier by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1311
  • [xs] Build the Composer Docker Image only on dev branch merges by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1308
  • Fix Notebook Progress Bars by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1313
  • Remove pytest-timeout by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1317
  • [Minor] Inference API parameter name change by @dskhudia in https://github.com/mosaicml/composer/pull/1315
  • Matthew/swa readme by @growlix in https://github.com/mosaicml/composer/pull/1292
  • Enable gloo backend by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1321
  • [xs] Fix pytest test filtering; Bump the minimum pytorch version to 1.10 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1320
  • revert gloo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1324
  • Fix WandB load from checkpoint by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1326
  • ALiBi for BERT and ALiBi testing by @alextrott16 in https://github.com/mosaicml/composer/pull/1267
  • Update HF example with read of model eval accuracy by @lupesko in https://github.com/mosaicml/composer/pull/1332
  • Cleanup API Reference Titles by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1336
  • Fix a race condition in the object store logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1328
  • Auto Grad Accum Change to Warning by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1338
  • Add export for inference callback by @nik-mosaic in https://github.com/mosaicml/composer/pull/1323
  • Add save fine-tune model to HuggingFace example by @lupesko in https://github.com/mosaicml/composer/pull/1333
  • Update DWD optimizers by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1339
  • Cap Numpy Version by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1345
  • Update slack link by @hanlint in https://github.com/mosaicml/composer/pull/1344
  • Fix scheduler edge cases by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1350
  • Integration Tests for Object Stores and Loggers by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1322
  • Retry SFTP on Size Mismatch by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1300
  • [xs] Restore the dataloader and training properties in predict() by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1352
  • Add Precision Contexts by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1347
  • Update GLU logging strings by @moinnadeem in https://github.com/mosaicml/composer/pull/1348
  • Add domain-specific codeowners by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1354
  • fix marker by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1359
  • Fix the profiler on multi-node training by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1358
  • Glue Entrypoint by @ishanashastri in https://github.com/mosaicml/composer/pull/1263
  • Yahp v0.1.3 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1346
  • Move metrics to context by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1361
  • Refactor multiple losses to support dictionaries and fix discrepancies by @Landanjs in https://github.com/mosaicml/composer/pull/1349
  • Fix Coverage Reports on Jenkins by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1114
  • JSON Schemas by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1371
  • add filename extension by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1370
  • JSON Schemas pt 2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1373
  • Update Export for Inference methods by @nik-mosaic in https://github.com/mosaicml/composer/pull/1355
  • Fix default precision by @A-Jacobson in https://github.com/mosaicml/composer/pull/1369
  • Clean up unused exception by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1368
  • Revert "Clean up unused exception" by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1378
  • Remove Unused Exception by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1379
  • Auto Grad Accum Cache Clearing by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1380
  • Add ability to register algorithm passes by @hanlint in https://github.com/mosaicml/composer/pull/1377
  • Fix AMP resumption with grad scaler by @hanlint in https://github.com/mosaicml/composer/pull/1376
  • Update CUDA and remove NCCL downgrade from Dockerfile by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1362
  • Add Notes on Artifact Logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1381
  • Print the microbatch size when using Adaptive Gradient Accumulation by @hanlint in https://github.com/mosaicml/composer/pull/1387
  • Cleaner API reference part 1: references with minimal import paths by @dblalock in https://github.com/mosaicml/composer/pull/1385
  • Add Event.BEFORE_DATALOADER by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1388
  • remove private s3 paths by @A-Jacobson in https://github.com/mosaicml/composer/pull/1389
  • Tutorial on training without Local Storage by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1351
  • [inference] Update exportforinference notebook with new APIs by @dskhudia in https://github.com/mosaicml/composer/pull/1360
  • Fix resnet warnings criteria by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1395
  • Fix hparams error by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1394
  • Add knighton to codeowners for datasets by @knighton in https://github.com/mosaicml/composer/pull/1397
  • Fix ImagenetDatasetHparams bug by @nik-mosaic in https://github.com/mosaicml/composer/pull/1392
  • Decouple GLUE entry point saving and loading logic by @ishanashastri in https://github.com/mosaicml/composer/pull/1390
  • Glue example notebook by @ishanashastri in https://github.com/mosaicml/composer/pull/1383
  • Add informative error for infer batch size issues by @hanlint in https://github.com/mosaicml/composer/pull/1401
  • Only sync batchnorm statistics within a node for deeplab by @Landanjs in https://github.com/mosaicml/composer/pull/1391
  • Update DeepLabv3 pretrained weight interface to work with PyTorch 1.12 by @Landanjs in https://github.com/mosaicml/composer/pull/1399
  • tpu single core by @florescl in https://github.com/mosaicml/composer/pull/1400
  • Add support for Apple M chips by @hanlint in https://github.com/mosaicml/composer/pull/1405
  • [xs] Add mps and tpu device to Trainer docstrings by @hanlint in https://github.com/mosaicml/composer/pull/1410

Full Changelog: https://github.com/mosaicml/composer/compare/v0.8.2...v0.9.0

New Contributors

  • @vladd-i made their first contribution in https://github.com/mosaicml/composer/pull/1196
  • @linden-li made their first contribution in https://github.com/mosaicml/composer/pull/1203
  • @ejyuen made their first contribution in https://github.com/mosaicml/composer/pull/1221
  • @lupesko made their first contribution in https://github.com/mosaicml/composer/pull/1235
  • @isaac0804 made their first contribution in https://github.com/mosaicml/composer/pull/1254
  • @xloem made their first contribution in https://github.com/mosaicml/composer/pull/1259
  • @alextrott16 made their first contribution in https://github.com/mosaicml/composer/pull/1199
  • @codestar12 made their first contribution in https://github.com/mosaicml/composer/pull/1274
  • @rahulvigneswaran made their first contribution in https://github.com/mosaicml/composer/pull/1285
  • @nik-mosaic made their first contribution in https://github.com/mosaicml/composer/pull/1323

- Python
Published by bandish-shah almost 4 years ago

composer - v0.8.2

🚀 Composer v0.8.2

Composer v0.8.2 is released! Install via pip:

bash pip install --upgrade mosaicml==0.8.2 Alternatively, install Composer with Conda:

bash conda install -c mosaicml mosaicml=0.8.2

🐛 Bug Fixes

  1. Fixed Notebook Progress Bars in Colab

    Fixes a bug introduced by #1264 which causes Composer running in Colab notebooks to error out with: UnsupportedOperation: fileno.

    Closes #1312. Fixed in PR #1314.

Changelog

https://github.com/mosaicml/composer/compare/v0.8.1...v0.8.2

- Python
Published by bandish-shah almost 4 years ago

composer - v0.8.1

🚀 Composer v0.8.1

Composer v0.8.1 is released! Install via pip:

bash pip install --upgrade mosaicml==0.8.1 Alternatively, install Composer with Conda:

bash conda install -c mosaicml mosaicml=0.8.1

🎁 New Features

  1. 🖼️ Image Visualizer

    The ImageVisualizer callback periodically logs the training and validation images when using the WandB logger. This is great for validating your dataloader pipeline, especially if extensive data augmentations are used. Also, when training on a semantic segmentation task, the callback can log the target segmentation mask and the predicted segmentation mask by setting the argument mode='segmentation'. See PR #1266 for more details. Here is an example of using the ImageVisualizer callback:

    ```python from composer import Trainer from composer.callbacks import ImageVisualizer

    Callback to log 8 training images after every 100 batches

    image_visualizer = ImageVisualizer()

    Construct trainer

    trainer = Trainer( ..., callbacks=image_visualizer )

    Train!

    trainer.fit()

    ```

    Here is an example visualization from the training set of ADE20k:

  2. 📶 TensorBoard Logging

    You can now log metrics and losses from your Composer training runs with Tensorboard! See #1250 and #1283 for more details. All you have to do is create a TensorboardLogger object and add it to the list of loggers in your Trainer object like so:

    ```python from composer import Trainer from composer.loggers import TensorboardLogger

    tblogger = TensorboardLogger(logdir="./mytensorboardlogs")

    trainer = Trainer( ... # Add your Tensorboard Logger to the trainer here. loggers=[tb_logger], )

    trainer.fit() ```

    For more information, see this tutorial.

  3. 🔙 Multiple Losses

    Adds support for multiple losses. If a model returns a tuple of losses, they are summed before the loss.backward() call. See #1240 for more details.

  4. 🌎️ Stream Datasets from HTTP URIs

    You can now specify a HTTP URI for a Streaming Dataset remote. See #1258 for more detials. For example:

    ```python from composer.datasets.streaming import StreamingDataset from torch.utils.data import DataLoader

    Construct the Dataset

    dataset = StreamingDataset( ..., remote="https://example.com/dataset/", )

    Construct the DataLoader

    train_dl = DataLoader(dataset)

    Construct the Trainer

    trainer = Trainer( ..., traindataloader=traindl, )

    Train!

    trainer.fit() ```

    For more information on streaming datasets, see this tutorial.

  5. 🏄️ GPU Devices default to TF32 Matmuls

    Beginning with PyTorch 1.12, the default behavior for computing FP32 matrix multiplies on NVIDIA Ampere devices was switched from TF32 to FP32. See PyTorch documentation here.

    Since Composer is designed specifically for ML training with a focus on efficiency, we choose to preserve the old default of using TF32 on Ampere devices. This leads to significantly higher throughput when training in single precision, without impact training convergence. See PR #1275 for implementation details.

  6. 👋 Set the Device ID for GPU Devices

    Specify the device ID within a DeviceGPU to train on when instantiating a Trainer object instead of using the local ID! For example,

    ```python from composer.trainer.devices.device_gpu import DeviceGPU

    Specify to use GPU 3 to train

    device = DeviceGPU(device_id=3)

    Construct the Trainer

    trainer = Trainer( ..., device = device )

    Train!

    trainer.fit() ```

  7. BERT and C4 Updates

    We make some minor adjustments to our bert-base-uncased.yaml training config. In particular, we make the global train and eval batch sizes a power of 2. This maintains divisibility when using many GPUs in multi-node training. We also adjust the max_duration so that it converts cleanly to 70,000 batches.

    We also upgrade our StreamingDataset C4 conversion script (scripts/mds/c4.py) to use a multi-threaded reader. On a 64-core machine we are able to convert the 770GB train split to .mds format in ~1.5hr.

  8. 📂 Set a prefix when using a S3ObjectStore

    When using S3ObjectStore for applications like checkpointing, it can be useful to provide path prefixes, mimicking folder/subfolder directories like on a local filesystem. When prefix is provided, any objects uploaded with S3ObjectStore will be stored at f's3://{self.bucket}/{self.prefix}{object_name}'.

  9. ⚖️ Scale the Warmup Period of Composer Schedulers

    Added a new flag scale_warmup to schedulers that will scale the warmup period when a scale schedule ratio is applied. Default is False to mirror default behavior. See #1268 for more detials.

  10. 🧊 Stochastic Depth on Residual Blocks

    Residual blocks are detected automatically and replaced with stochastic versions. See #1253 for more details.

🐛 Bug Fixes

  1. Fixed Progress Bars

    Fixed a bug where the the Progress Bars jumped around and did not stream properly when tailing the terminal over the network. Fixed in #1264, #1287, and #1289.

  2. Fixed S3ObjectStore in Multithreaded Environments

    Fixed a bug where the boto3 crashed when creating the default session in multiple threads simultaniously (see https://github.com/boto/boto3/issues/1592). Fixed in #1260.

  3. Retry on ChannelException errors in the SFTPObjectStore

    Catch ChannelException SFTP transient error and retry. Fixed in #1245.

  4. Treating S3 Permission Denied Errors as Not Found Errors

    We update our handling of botocore 403 ClientErrors to interpret them as FileNotFoundErrors. We do this because of a situation that occurs when a user has no S3 credentials configured, and tries to read from a bucket with public files. For privacy, Amazon S3 raises 403 (Permission Denied) instead of 404 (Not Found) errors. As such, PR #1249 treats 403 ClientErrors as FileNotFoundErrors.

  5. Fixed Parsing of grad_accum in the TrainerHparams

    Fixes an error where the command line override --grad_accum lead to incorrect parsing. Fixed in #1256.

  6. Fixed Example YAML Files

    Our recipe configurations (YAML) are updated to the latest version, and a test was added to enforce correctness moving forward. Fixed in #1235 and #1257.

Changelog

https://github.com/mosaicml/composer/compare/v0.8.0...v0.8.1

- Python
Published by bandish-shah almost 4 years ago

composer - v0.8.0

🚀 Composer v0.8.0

Composer v0.8.0 is released! Install via pip:

bash pip install --upgrade mosaicml==0.8.0 Alternatively, install Composer with Conda:

bash conda install -c mosaicml mosaicml=0.8.0

New Features

  1. 🤗 HuggingFace ComposerModel

    Train your HuggingFace models with Composer! We introduced a HuggingFaceModel that converts your existing 🤗 Transformers models into a ComposerModel.

    For example:

    ```python import transformers from composer.models import HuggingFaceModel

    Define the model

    hfmodel = transformers.AutoModelForSequenceClassification.frompretrained('bert-base-uncased', num_labels=2)

    Convert it into a ComposerModel

    model = HuggingFaceModel(hf_model)

    Construct the trainer

    trainer = Trainer( ..., model, )

    Train!

    trainer.fit() ```

    For more information, see the example on fine-tuning a pretrained BERT with Composer.

  2. 🫕 Fused Layer Norm

    Fused LayerNorm replaces implementations of torch.nn.LayerNorm with a apex.normalization.fused_layer_norm. The fused kernel provides increased GPU utilization.

    For example:

    ```python from composer.trainer import Trainer from composer.algorithms import FusedLayerNorm

    Initialize the algorithm

    alg = FusedLayerNorm()

    Construct the trainer

    trainer = Trainer( algorithms=alg, )

    Train!

    trainer.fit() ```

    See the method card for more information.

  3. 💾 Ignore Checkpoint Parameters

    If you have a checkpoint and don't want to restore some elements of the chceckpoint to the state, we added a load_ignore_keys parameter. Any specified (nested) keys will be ignored. Glob syntax is supported!

    For example, to restore a checkpoint without the seed:

    ```python from composer import Trainer

    trainer = Trainer( ..., loadpath="path/to/my/checkpoint.pt", loadignorekeys=["state/rankzero_seed", "rng"], ) ```

    See the Trainer API Reference for more information.

  4. 🪣 Object Stores

    Composer v0.8.0 introduces an abstract Object Store API to support multiple object store drivers, such as boto3 (for Amazon S3) and Paramiko (for SFTP), in addition to the existing libcloud implementation.

    For example, if you are training on AWS where credentials are available in the environment, here's how to to save checkpoints to a S3 object store via Boto3.

    ```python from composer import Trainer from composer.loggers import ObjectStoreLogger from composer.utils.object_store import S3ObjectStore

    logger = ObjectStoreLogger( objectstorecls=S3ObjectStore, objectstorekwargs={ # These arguments will be passed into the S3ObjectStore -- e.g.: # objectstore = S3ObjectStore(**objectstore_kwargs) # Refer to the S3ObjectStore class for documentation 'bucket': 'my-bucket', }, )

    trainer = Trainer( ..., loggers=logger, )

    Train!

    trainer.fit() ```

    See the Object Store API Reference for more information.

  5. 🪨 Artifact Metadata

    Composer automatically logs the epoch, batch, sample, and token counts as metadata when storing artifacts in Weights & Biases. See the API Reference for more information.

API Changes

  1. ✂️ Gradient Clipping is now an Algorithm

    To clean up the Trainer, we moved gradient clipping into an Algorithm. The grad_clip_norm argument in the Trainer is deprecated and will be removed in a future version of Composer. Instead, use the Gradient Clipping algorithm:

    For example:

    ```python from composer.algorithms import GradientClipping from composer.trainer import Trainer

    Configure gradient clipping

    gradient_clipping = GradientClipping()

    Configure the trainer

    trainer = Trainer( ..., algorithms=gradient_clipping, )

    Train!

    trainer.fit() ```

    See the method card for more information.

  2. 🕒️ Removed batch_num_samples and batch_num_tokens from the state.

    State properties batch_num_samples and batch_num_tokens have been removed. Instead, use State.timestamp for token and sample tracking.

  3. 🧑‍🤝‍🧑 DDP Sync Strategy

    We changed the default DDP Sync Strategy to MULTI_AUTO_SYNC, as FORCED_SYNC doesn't work with all algorithms.

  4. 🏃 Moved the run_name into the State

    The run_name has been added to the State object, so it is persisted with checkpoints. It has been removed from the Logger.

Bug Fixes

  • In the Object Store Logger, added in retries for credential validation, and validating credentials only on global rank zero. (#1144)
  • Fixed a bug in the speed monitor where it returned negative wall clock times. (#1123)
  • Fixed how block-wise Stochastic Depth could freeze the trainer. (#1087)
  • Fixed a bug in the [MLPerfCallback] where sample counts were incorrect on per-sharded datasets. (#1156)

Changelog

https://github.com/mosaicml/composer/compare/v0.7.1...v0.8.0

- Python
Published by ravi-mosaicml almost 4 years ago

composer - v0.7.1

🚀 Composer v0.7.1

Composer v0.7.1 is released! Install via pip:

bash pip install --upgrade mosaicml==0.7.1 Alternatively, install Composer with Conda:

bash conda install -c mosaicml mosaicml=0.7.1

Bug Fixes

  • Upgraded wandb>=0.12.17, to fix incompatibility with protobuf >= 4 (https://github.com/wandb/client/pull/3709)

Changelog

https://github.com/mosaicml/composer/compare/v0.7.0...v0.7.1

- Python
Published by ravi-mosaicml about 4 years ago

composer - v0.7.0

🚀 Composer v0.7.0

Composer v0.7.0 is released! Install via pip:

bash pip install --upgrade mosaicml==0.7.0 Alternatively, install Composer with Conda:

bash conda install -c mosaicml mosaicml=0.7.0

New Features

  1. 🏎️ FFCV Integration

    Composer supports FFCV, a fast dataloader for image datasets. We've found FFCV can speed up ResNet-56 training by 16\%, in addition to existing speed-ups already supported by Composer! It's easy to use FFCV with any existing image dataset:

    ```python import ffcv from ffcv.fields.decoders import IntDecoder, SimpleRGBImageDecoder from torchvision.datasets import ImageFolder

    from composer import Trainer from composer.datasets.ffcvutils import writeffcvdataset, ffcvmonkey_patches

    Convert the dataset to FFCV format

    This step needs to be done only once per dataset

    dataset = ImageFolder(...) ffcvdatasetpath = "myffcvdataset.ffcv" writeffcvdataset(dataset=dataset, writepath=ffcvdataset_path)

    In FFCV v0.0.3, len(dataloader) is expensive. Fix that via a monkeypatch

    ffcvmonkeypatches()

    Construct the train dataloader

    traindl = ffcv.Loader( ffcvdataset_path, ... )

    Construct the trainer

    trainer = Trainer( traindataloader=traindl, )

    Train using FFCV!

    trainer.fit() ```

    See our notebook on training with FFCV for a full example.

  2. ✅ Autoresume from Checkpoints

    When setting autoresume=True, Composer can automatically resume from an existing checkpoint before starting a new training run. Specifically, the trainer will look in the save_folder (and any loggers that save artifacts) for the latest checkpoint; if none is found, then it'll start from the beginning.

    This feature does not require a different entrypoint to distinguish between starting a new training run or automatically resuming from an existing one, making it easy to use Composer on spot preemptable cloud instances. Simply set autoresume=True, point the instance to your training script, and Composer will handle the rest!

    ```python from composer import Trainer

    When using autoresume, it is required to specify the

    run_name, so Composer will know which training run to

    resume

    runname = "myautoresumetrainingrun"

    trainer = Trainer( ..., runname=runname, # specify where to save checkpoints savefolder="./myautoresumetrainingrun", autoresume=True, )

    Train! Composer will handle loading an existing

    checkpoint or starting a new training run

    trainer.fit() ```

    See the Trainer API Reference for more information.

  3. ♻️ Reuse the Trainer

    Want to train on multiple dataloaders sequentially? Each trainer object now supports multiple calls to Trainer.fit(), so you can continue training an existing model on a new dataloader, with new schedulers, all while using the same model and trainer object.

    For example:

    ```python from torch.utils.data import DataLoader

    from composer import Trainer

    traindl1 = DataLoader(...) trainer = Trainer( model=model, maxduration='5ep', traindataloader=traindl1, )

    Train once!

    trainer.fit()

    Train again with a new dataloader for another 5 epochs

    traindl2 = DataLoader(...) trainer.fit( traindataloader=traindl_2, duration='5ep', ) ```

    See the Trainer API Reference for more information.

  4. ⚖️ Eval or Predict Only? No Problem

    You can evaluate or predict on an existing model, without having to supply a train dataloader or training duration argument -- they're now optional.

    ```python

    import torchmetrics from torch.utils.data import DataLoader

    from composer import Trainer

    Construct the trainer

    trainer = Trainer(model=model)

    Evaluate!

    evaldl = DataLoader(...) trainer.eval( dataloader=evaldl, metrics=torchmetrics.Accuracy(), )

    Examine evaluation metrics

    print("Eval metrics", trainer.state.metrics['eval'])

    Or, predict!

    predictdl = DataLoader(...) trainer.predict(dataloader=predictdl) ```

    See the Trainer API Reference for more information.

  5. 🛑 Early Stopper and Threshold Stopper Callbacks

    The Early Stopper and Threshold Stopper callbacks end training early when the target metrics are met:

    ```python from composer.callbacks.early_stopper import EarlyStopper from torchmetrics.classification.accuracy import Accuracy

    Construct the callback

    earlystopper = EarlyStopper( monitor="Accuracy", dataloaderlabel="eval", patience=2, )

    Construct the trainer

    trainer = Trainer( ..., callbacks=earlystopper, maxduration="100ep", )

    Train!

    Training will end early if the accuracy does not improve

    over two epochs

    trainer.fit()

  6. 🪵 Load Checkpoints from Loggers

    It's now possible to restore checkpoints from loggers that support file artifacts (such as the Weights & Baises Logger). No need to download your checkpoints manually anymore.

    ```python from composer import Trainer from composer.loggers import WandBLogger

    Configure the W&B Logger

    wandblogger = WandBLogger( # set to True to capture artifacts, like checkpoints logartifacts=True, init_params={ 'project': 'my-wandb-project-name', }, )

    Then, to train and save checkpoints to W&B:

    trainer = Trainer( ..., loggers=wandblogger, savefolder="/tmp/checkpoints", saveinterval="1ep", saveartifact_name="epoch{epoch}.pt", )

    Finally, to load checkpoints from W&B

    trainer = Trainer( ..., loadobjectstore=wandblogger, loadpath="epoch1.pt:latest", ) ```

  7. ⌛ Wall Clock, Evaluation, and Prediction Time Tracking

    The timestamp object measures wall clock time via three new fields: total_wct, epoch_wct, and batch_wct. These fields track the total elapsed training time, the elapsed training time of the current epoch, and the time to train the last batch. Read the wall clock time via a callback:

    ```python from composer import Callback, Trainer

    class MyCallback(Callback): def batchend(self, state, event): print(f"Total wct: {state.timetsamp.totalwct}") print(f"Epoch wct: {state.timetsamp.epochwct}") print(f"Batch wct: {state.timetsamp.batchwct}")

    Construct the trainer with this callback

    trainer = Trainer( ..., callbacks=MyCallback(), )

    Train!

    trainer.fit() ```

    In addition, the training state object has two new fields for tracking time during evaluation and prediction: eval_timestamp and predict_timestamp. These fields, just like any others on the state object, are accessible to algorithms, callbacks, and loggers.

  8. Training DeepLabv3+ on the ADE20k Dataset

    DeepLabv3+ is a common baseline model for semantic segmentation tasks. We provide a ComposerModel implementation for DeepLabv3+ built using torchvision and mmsegmentation for the backbone and head, respectively.

    We found the DeepLabv3+ baseline can be significantly improved using the new PyTorch pre-trained weights. Additional gains are made through a hyperparameter sweep.

    We benchmark our DeepLabv3+ model on a single 8xA100 machine using ADE20k, a popular semantic segmentation dataset. The final results on ADE20k are:

    | Model | mIoU | Time-to-Train | | ---------------------- | -------------- | ------------- | | Unoptimized DeepLabv3+ | 44.17 +/- 0.14 | 6.39 hr | | Optimized DeepLabv3+ | 45.78 +/- 0.26 | 4.67 hr |

    Checkout our documentation for more info!

API Changes

  1. 🍪 Additional Batch Type Support

    Composer v0.7.0 removed the BatchDict and BatchPair types, and now supports any batch type. We're updating our algorithms to support batches of custom formats.

  2. 🏎️ Simplified Profiling Arguments

    To simplify the Trainer constructor, the profiling arguments were replaced with a single profiler argument, which takes an instance of the Profiler.

    ```python from composer.trainer import Trainer from composer.profiler import PRofiler, JSONTraceHandler, cyclic_schedule

    trainer = Trainer( ..., profiler=Profiler( tracehandlers=JSONTraceHandler( folder=composertracedir, overwrite=True, ), schedule=cyclicschedule( wait=0, warmup=1, active=4, repeat=1, ), torchproffolder=torchtracedir, torchprofoverwrite=True, ..., ) ) ```

    See the profiling guide for additional information.

  3. 🚪 Event.FIT_END and Engine.close()

    With support for reusing the trainer for multiple calls to Trainer.fit, callbacks and loggers are no longer closed at the end of a training run.

    Instead, Event.FIT_END was added, which can be used by Callbacks for anything that should happen at the end of each invocation of Trainer.fit. See the Event Guide for aadditional inforrmation.

    Finally, whenever the trainer is garbage collected or Trainer.close is called, Callback.close and Callback.post_close are invoked, ensuring that they will be called only once per trainer.

  4. State.timesamp replaces State.timer

    Removed State.timer and replaced it with State.timestamp, which is now a static Timestamp object. The training loop replaces State.timestamp with a new object on each batch. See the Time Guide for additional information.

  5. 💿 Data Configuration

    Two new proerties, State.dataloader and State.dataloader_label, were added to the state. These properties track the currently active dataloader (e.g. the training dataloader when training; the evaluation dataloader when evaluating).

    In adddition, State.subset_num_batches was renamed to State.dataloader_len to reflect the actual dataloader length that will be used for training and evaluation.

    A helper method State.set_dataloader was added to ensure the dataloader properties are updated correctly.

  6. ⚖️ Removed the Deprecated Scale Schedule Algorithm

    The scale schedule algorithm class, deprecated in v0.4.0, has been removed. Instead, use the scale_schedule_ratio argument when constructing the trainer.

    ```python from composer import Trainer from composer.optim.scheduler import MultiStepScheduler

    trainer = Trainer( ..., maxduration="20ep", schedulers=MultiStepScheduler(milestones=["10ep", "16ep"]), scaleschedule_ratio=0.5, ) ```

    See the Scale Schedule Method Card for additional info.

Bug Fixes

  • Fixed an bug where Event.FIT_END was not being called in the training loop (#1054)
  • Fixed a bug where evaluation would not run at the end of training unless if it aligned with the eval_interval (#1045)
  • Fixed a bug where models trained with SWA could not be used with checkpoints (#1015)
  • Fixed a bug where the Speed Monitor included validation time in the training throughput measurements, resulting in slower reported throughput measurements (#1053)
  • Fixed a bug to make the ComposerClassifier compatible with TorchScript (#1036)
  • Fixed a bug where fractional Time Objects were being truncated instead of raising an exception (#1038)
  • Changed the defaults for Selective Backprop to not scale inputs, so the algorithm can work with non-vision workloads (#896)

New Contributors

  • @ofirpress made their first contribution in https://github.com/mosaicml/composer/pull/955
  • @QiyaoWei made their first contribution in https://github.com/mosaicml/composer/pull/866
  • @pavithranrao made their first contribution in https://github.com/mosaicml/composer/pull/879

Changelog

https://github.com/mosaicml/composer/compare/v0.6.1...v0.7.0

- Python
Published by ravi-mosaicml about 4 years ago

composer - v0.6.1

🚀 Composer v0.6.1

Composer v0.6.1 is released!

Go ahead and upgrade; it's fully backwards compatible with Composer v0.6.0.

Install via pip:

bash pip install --upgrade mosaicml==0.6.1

Alternatively, install Composer with Conda:

bash conda install -c mosaicml mosaicml=0.6.1

What's New?

  1. 📎 Adaptive Gradient Clipping (AGC)

    Adaptive Gradient Clipping (AGC) clips gradients based on the ratio of their norms with weights' norms. This technique helps stabilize training with large batch sizes, especially for models without batchnorm layers.

  2. 🚚 Exponential Moving Average (EMA)

    Exponential Moving Average (EMA) is a model averaging technique that maintains an exponentially weighted moving average of the model parameters during training. The averaged parameters are used for model evaluation. EMA typically results in less noisy validation metrics over the course of training, and sometimes increased generalization.

  3. 🪵 Logger is available in the ComposerModel

    The Logger is bound to the ComposerModel via the self.logger attribute. It is available during training on all methods (other than __init__).

    For example, to log hidden activation:

    ```python class Net(ComposerModel):

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        if self.logger:
            self.logger.data_batch({
                "hidden_activation_norm": x.norm(2).item(),
            })
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x)
    

    ```

  4. 🐛 Environment Collection Script

    Composer v0.6.1 includes an environment collection script which generates a printout of your system configuration and python environment. If you run into a bug, the results from this script will help us debug the issue and fix Composer.

    To collect your environment information:

    bash $ pip install mosaicml # if composer is not already installed $ composer_collect_env

    Then, include the output in your GitHub Issue.

What's Improved?

  1. 📜 TorchScriptable Algorithms

    BlurPool, Ghost BatchNorm, and Stochastic Depth are now TorchScript-compatible. Try exporting your models with these algorithms enabled!

  2. 🏛️ ColOut on Segmentation

    ColOut now supports segmentation-style models.

What's Fixed?

  1. 🚑️ Loggers capture the Traceback

    We fixed a bug so the Loggers, such as the Weights & Biases Logger and the File Logger, will capture the traceback any exception that crashes the training process.

  2. 🏋️ Weights & Biases Logger Config

    We fixed a bug where the the Weights & Biases Logger was not properly recording the configuration.

Full Changelog

https://github.com/mosaicml/composer/compare/v0.6.0...v0.6.1

- Python
Published by ravi-mosaicml about 4 years ago

composer - v0.6.0

🚀 Composer v0.6.0

Composer v0.6.0 is released! Install via pip:

bash pip install --upgrade mosaicml==0.6.0 Alternatively, install Composer with Conda:

bash conda install -c mosaicml mosaicml=0.6.0

Major Changes

  1. 🗃️ Automatic Gradient Accumulation

    Composer v0.6.0 can automatically pick an appropriate value for gradient accumulation. The trainer will automatically catch OutOfMemory exceptions and handle them gracefully. No need to manually tune this parameter for each model, batch size, and hardware combination!

    To use automatic gradient accumulation, set grad_accum='auto'. For example:

    python trainer = Trainer( ..., grad_accum='auto', )

  2. 💾 Artifact Logging

    Training on spot instances? Composer v0.6.0 introduces artifact logging, making it possible to store checkpoints and other artifacts directly to cloud storage. See the Object Store Logger and the Checkpointing Guide for more information.

    Artifact Logging has replaced the run directory and the run directory uploader, which have been removed.

  3. 📊 Metric Values on the State

    Composer v0.6.0 binds the computed metric values on the State. Go ahead and read these values from your own callbacks! We'll be releasing an early stopping callback in an upcoming Composer release.

  4. ⚠️ NoEffectWarning and NotIntendedUseWarning for Algorithms

    Some algorithms, such as BlurPool, now emit a NoEffectWarning or a NotIntendedUseWarning when they're not being used appropriately.

Minor Improvements

  1. 🏃‍♀️ Training Run Names

    We introduced a run_name parameter in the Trainer to help organize training runs.

    python trainer = Trainer( ..., run_name='awesome-traing-run', )

    We'll automatically pick one if the run name is not specified.

  2. 💈 Automatic Progress Bars

    The ProgressBarLogger, formally called the TQDMLogger, is automatically enabled for all training runs.

    To disable the progress bar, set progress_bar=False. For example:

    python trainer = Trainer( ..., progress_bar=False, )

  3. 🪵 Logged Data in the Console

    To print Logger calls to the console, set the log_to_console and the console_log_level arguments.

    python trainer = Trainer( ..., log_to_console=True, console_log_level="epoch", )

    By default, the console logger will only be enabled when progress_bar=False. The default console log level is epoch.

  4. 📃 Capturing stdout and stderr in Log Files

    The FileLogger captures stdout and stderr by default now. Tracebacks will now be captured amongst other logging statements.

  5. ⬆️ PyTorch 1.11 Support

    We've tested Composer on PyTorch 1.11. Go ahead and upgrade your dependencies!

  6. ✅ Checkpointing

    We changed the checkpoint format to store the underlying model, not the DistributedDataParallel wrapped model. If you're using Composer to read checkpoints, there's nothing to change. But if you're reading Composer checkpoints manually, note that the module checkpoints will be formatted differently.

    In addition, we changed the checkpointing argument names for the trainer.

* The new parameters `save_artifact_name` and `save_latest_artifact_name` allow checkpoints to be saved directly to artifact stores.
* The new parameter `save_num_checkpoints_to_keep` helps preserve local disk storage by automatically removing old checkpoints.
* `load_path` replaces `load_path_format`.
* `save_name` replaces `save_path_format`.
* `save_latest_filename` replaces `save_latest_format`.
  1. 🏎️ Profiling

    We added support for custom scheduling functions and re-designed how the profiler saves traces. Each profiling cycle will now have its own trace file. Trace merging happens automatically throughout the training process. Long-running profiling is now possible without the long wait at the end of training for the trace merge.
    
    As part of this refactor, the profiler arguments have changed:
    
    * `prof_trace_handlers` replaces `prof_event_handlers`.
    * `prof_schedule` replaces `prof_skip_first`, `prof_wait`, `prof_warmup`, `prof_active`, and `prof_repeat`. See the [cyclic schedule](https://docs.mosaicml.com/en/v0.6.0/api_reference/composer.profiler.profiler_schedule.html#composer.profiler.profiler_schedule.cyclic_schedule) function.
    * `torch_prof_folder` replaces `torch_profiler_trace_dir`
    * The new arguments `torch_prof_filename`, `torch_prof_artifact_name`, `torch_prof_overwrite`, and `torch_prof_num_traces_to_keep` allow for customization on how [PyTorch Profiler](https://pytorch.org/docs/stable/profiler.html) traces are saved.
    
  2. 🏗️ TorchVision Model Architectures

    We switched our vision models to use the TorchVision model architecture implementations where possible.

Bug Fixes

  • Fixed a bug with MixUp and gradient accumulation
  • Fixed numerous issues with the Composer launch script for distributed training. Composer v0.6.0 includes environment variable support, better defaults and warings, and proper handling of crashed processes.

Changelog

  • Update MigratingfromPTL.ipynb by @moinnadeem in https://github.com/mosaicml/composer/pull/730
  • CodeQL Analysis by @Averylamp in https://github.com/mosaicml/composer/pull/723
  • Installing pyright via npm by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/735
  • Polish intro docs by @dblalock in https://github.com/mosaicml/composer/pull/721
  • Numerics docs page by @bandish-shah in https://github.com/mosaicml/composer/pull/725
  • Testing Niklas GH Docs Star w/ Dark Mode by @moinnadeem in https://github.com/mosaicml/composer/pull/742
  • [Artifact Logging PR1] Logger Refactoring by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/698
  • Update README.md by @moinnadeem in https://github.com/mosaicml/composer/pull/731
  • Updated the Method Cards by @hanlint in https://github.com/mosaicml/composer/pull/647
  • Using existing clone in conda meta.yaml by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/751
  • [Artifact Logging PR2] Logger Destination Cleanup by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/699
  • Shorten to minimal code snippets by @hanlint in https://github.com/mosaicml/composer/pull/752
  • Sample-wise Stochastic Depth Method Card by @Landanjs in https://github.com/mosaicml/composer/pull/749
  • Update algorithm yamls by @coryMosaicML in https://github.com/mosaicml/composer/pull/747
  • [Artifact Logging PR3] Add the run_name as a property of the Logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/700
  • [Artifact Logging PR4] Added logfileartifact base method by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/701
  • Fix README.md by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/753
  • Less CodeQL by @Averylamp in https://github.com/mosaicml/composer/pull/762
  • Increase the timeout for test trainer equivalence by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/766
  • Port squeze excite method card to new format by @dblalock in https://github.com/mosaicml/composer/pull/764
  • Small fixes by @hanlint in https://github.com/mosaicml/composer/pull/765
  • Adding defaults to blurpool by @moinnadeem in https://github.com/mosaicml/composer/pull/756
  • Added maximum versions to dependencies by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/768
  • Update sequence length warmup documentation by @moinnadeem in https://github.com/mosaicml/composer/pull/770
  • Additional README fixes by @hanlint in https://github.com/mosaicml/composer/pull/769
  • Fix setup.py by @Averylamp in https://github.com/mosaicml/composer/pull/761
  • Increased the timeout for test_trainer.py by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/775
  • Remove plural types and aliases for native pytorch types by @Landanjs in https://github.com/mosaicml/composer/pull/677
  • [Artifact Logging PR5] Added the object store logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/706
  • [Artifact Logging PR6] Rename the TQDMLogger as the ProgressBarLogger; remove terminal logging from the file logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/708
  • [Artifact Logging PR7] Add stdout and stderr capture to the FileLogger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/710
  • Update README.md by @vahidfazelrezai in https://github.com/mosaicml/composer/pull/781
  • URGENT: Fixing an incorrect number by @jfrankle in https://github.com/mosaicml/composer/pull/785
  • Add eval dataloader to the README.md by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/779
  • Readme code fix by @nqn in https://github.com/mosaicml/composer/pull/787
  • Set the random seed before each test. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/786
  • Docker file for vision applications with ffcv and deeplabv3 dependencies by @dskhudia in https://github.com/mosaicml/composer/pull/724
  • Update README.md by @murthyn in https://github.com/mosaicml/composer/pull/789
  • Chmod 644 all files by @Averylamp in https://github.com/mosaicml/composer/pull/760
  • Add Algorithm Warning for NoEffectWarning by @hanlint in https://github.com/mosaicml/composer/pull/720
  • Update dense label conversion and soft cross entropy to handle segmentation style labels by @coryMosaicML in https://github.com/mosaicml/composer/pull/763
  • added model card details comparing cifar to imagenet resnets by @growlix in https://github.com/mosaicml/composer/pull/792
  • Added codeowners file by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/797
  • ffcv integration for cifar10 dataset by @dskhudia in https://github.com/mosaicml/composer/pull/672
  • Add trainer link to README by @hanlint in https://github.com/mosaicml/composer/pull/804
  • ffcv integration for imagenet by @dskhudia in https://github.com/mosaicml/composer/pull/802
  • [XS] Consolidating NLP Import Message by @moinnadeem in https://github.com/mosaicml/composer/pull/795
  • Removed duplicate logger registry by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/808
  • Update docs on random seed by @hanlint in https://github.com/mosaicml/composer/pull/794
  • Remove the LoggerData and LoggerDataDict types by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/810
  • Rename composer/datasets/webdataset.py => composer/datasets/webdataset_utils.py by @dskhudia in https://github.com/mosaicml/composer/pull/813
  • More method card updates by @jfrankle in https://github.com/mosaicml/composer/pull/777
  • [Part 1] Adding Synthetic NLP Tokenizers, Models, Datasets w/o Integration by @moinnadeem in https://github.com/mosaicml/composer/pull/650
  • Update README by @moinnadeem in https://github.com/mosaicml/composer/pull/822
  • Updating setup.py with missing dependancies by @dlmgary in https://github.com/mosaicml/composer/pull/818
  • Fix submodule type errors when doing import composer by @dblalock in https://github.com/mosaicml/composer/pull/823
  • Update composer_model.rst by @moinnadeem in https://github.com/mosaicml/composer/pull/824
  • models cleanup - part 3: one model family per directory (cifar resnets) by @A-Jacobson in https://github.com/mosaicml/composer/pull/791
  • Support for webdatasets with ffcv by @dskhudia in https://github.com/mosaicml/composer/pull/815
  • Remove config from the logger base classes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/811
  • models cleanup - part 2: metrics and loss by @A-Jacobson in https://github.com/mosaicml/composer/pull/790
  • Adding docstring for missing conditional imports by @moinnadeem in https://github.com/mosaicml/composer/pull/836
  • Filepath formatting helper utilities by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/827
  • Serialize model state without module. prefix when using DDP by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/829
  • models cleanup - part 1: composermodel tasks by @A-Jacobson in https://github.com/mosaicml/composer/pull/788
  • Remove Batch Types - Part 1: recursive to_device function by @A-Jacobson in https://github.com/mosaicml/composer/pull/727
  • Profiler Refactor for Artifact Logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/828
  • [Artifact Logging PR8]: Switch to artifact logging and remove the run directory. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/712
  • conditional imports use MissingConditionalImportError #814 by @IanWorley in https://github.com/mosaicml/composer/pull/835
  • Vision Tests + Jenkins Improvements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/806
  • Fix the entrypoint and launch script by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/840
  • Remove a broken link to an old callback hparams tutorial. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/850
  • Remove no longer needed xfails by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/848
  • Ade20k streaming dataset yaml by @Landanjs in https://github.com/mosaicml/composer/pull/843
  • [Part 2] Integrating synthetic tokenizers, datasets, and models into our unit tests by @moinnadeem in https://github.com/mosaicml/composer/pull/652
  • 'Second' typo by @nqn in https://github.com/mosaicml/composer/pull/852
  • [FFCV] webdataset from local + download only once by @dskhudia in https://github.com/mosaicml/composer/pull/849
  • Lowered Test Timeouts by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/851
  • Proofreading for docs "Getting Started" section by @mcneela in https://github.com/mosaicml/composer/pull/859
  • Dynamic Shrinking Microbatches by @mvpatel2000 in https://github.com/mosaicml/composer/pull/485
  • Proofreading for speedup methods section by @mcneela in https://github.com/mosaicml/composer/pull/861
  • LICENSE: copyright and cleanup by @kobindra in https://github.com/mosaicml/composer/pull/862
  • CLI Launcher supports environment variables and tells fewer lies by @jbloxham in https://github.com/mosaicml/composer/pull/860
  • Update MixUp to allow use of index labels by @coryMosaicML in https://github.com/mosaicml/composer/pull/825
  • Bert validation refactor by @anisehsani in https://github.com/mosaicml/composer/pull/478
  • Make wandb tags optional by @siriuslee in https://github.com/mosaicml/composer/pull/865
  • Fix validation in CLI launcher by @jbloxham in https://github.com/mosaicml/composer/pull/870
  • Fixing version number by @ajaysaini725 in https://github.com/mosaicml/composer/pull/871
  • PyTorch 1.11 Docker Image by @bandish-shah in https://github.com/mosaicml/composer/pull/868
  • Add missing ffcv dependency in pytorch_vision docker image by @dskhudia in https://github.com/mosaicml/composer/pull/867
  • Fixed webdatasest import bug by @ajaysaini725 in https://github.com/mosaicml/composer/pull/874
  • Proofread five sections of Trainer module docs by @mcneela in https://github.com/mosaicml/composer/pull/872
  • Switch mixup events to avoid grad accum issues by @coryMosaicML in https://github.com/mosaicml/composer/pull/875
  • Proofreading docs through "Callbacks" section by @mcneela in https://github.com/mosaicml/composer/pull/878
  • Initialize distributed before dataloaders are created by @dskhudia in https://github.com/mosaicml/composer/pull/869
  • Proofreading the remainder of the trainer section of docs by @mcneela in https://github.com/mosaicml/composer/pull/881
  • Add test for grad_accum > 2 to the asset tests by @hanlint in https://github.com/mosaicml/composer/pull/876
  • Remove Batch Types - Part 2: unify split batch by @A-Jacobson in https://github.com/mosaicml/composer/pull/833
  • Proofreading Methods section of docs through AugMix by @mcneela in https://github.com/mosaicml/composer/pull/883
  • Add ssh by @Averylamp in https://github.com/mosaicml/composer/pull/885
  • rename LICENSE_HEADER to fix GH license detection by @kobindra in https://github.com/mosaicml/composer/pull/863
  • Torch 1.11 pytorch_vision Docker image by @bandish-shah in https://github.com/mosaicml/composer/pull/886
  • Add full traceback to grad accum errors by @mvpatel2000 in https://github.com/mosaicml/composer/pull/892
  • Modify ResNet9 benchmark to enable channelslast and progressiveresizing by @coryMosaicML in https://github.com/mosaicml/composer/pull/889
  • Proofreading Methods section of docs through Cutout by @mcneela in https://github.com/mosaicml/composer/pull/890
  • Proofread Methods section of docs through MixUp by @mcneela in https://github.com/mosaicml/composer/pull/895
  • Fixes for ffcv integration by @dskhudia in https://github.com/mosaicml/composer/pull/844
  • Print the stdout/stderr of the crashing process by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/893
  • Change NLP yamls to use evaluators by @anisehsani in https://github.com/mosaicml/composer/pull/891
  • Fix loss logging with DeepSpeed by @abhi-mosaic in https://github.com/mosaicml/composer/pull/897
  • Add Computed Metrics to State by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/842
  • Proofread Methods section of docs through Squeeze-Excite by @mcneela in https://github.com/mosaicml/composer/pull/899
  • test whether resuming from a checkpoint changes algorithm effect by @growlix in https://github.com/mosaicml/composer/pull/816
  • Object store symlinks for graceful resumption by @mvpatel2000 in https://github.com/mosaicml/composer/pull/887
  • Console log level by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/900
  • Remove asdict in unet by @Landanjs in https://github.com/mosaicml/composer/pull/901
  • Cherry Pick #906 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/912
  • Release/v0.6.0 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/933

New Contributors

  • @vahidfazelrezai made their first contribution in https://github.com/mosaicml/composer/pull/781
  • @murthyn made their first contribution in https://github.com/mosaicml/composer/pull/789
  • @dlmgary made their first contribution in https://github.com/mosaicml/composer/pull/818
  • @IanWorley made their first contribution in https://github.com/mosaicml/composer/pull/835

Full Changelog: https://github.com/mosaicml/composer/compare/v0.5.0...v0.6.0

- Python
Published by ravi-mosaicml about 4 years ago

composer - Release version v0.5.0

We are excited to share Composer v0.5, a library of speed-up methods for efficient neural network training. This release features: * Revamped checkpointing API based on community feedback * New baselines: ResNet34-SSD, GPT-3, and Vision Transformers * Additional improvements to our documentation * Support for bfloat16 * Streaming dataset support * Unified functional API for our algorithms

Highlights

Checkpointing API

Checkpointing models are now a Callback, so that users can easily write and add their own callbacks. The callback is automatically appended if a save_folder is provided to the Trainer.

python trainer = Trainer( model=model, algorithms=algorithms, save_folder="checkpoints", save_interval="1ep" ) Alternatively, CheckpointSaver can be directly added as a callback:

python trainer = Trainer(..., callbacks=[ CheckpointSaver( save_folder='checkpoints', name_format="ep{epoch}-ba{batch}/rank_{rank}", save_latest_format="latest/rank_{rank}", save_interval="1ep", weights_only=False, ) ])

Subclass from CheckpointSaver to add your own logic for saving the best model, or saving at specific intervals. Thanks to @mansheej @siriuslee and other users for their feedback.

bloat16

We've added experimental support for bfloat16, which can be provided via the precision argument to the Trainer:

python trainer = Trainer( ..., precision="bfloat16" )

Streaming datasets

We've added support for fast streaming datasets. For NLP-based datasets such as C4, we use the HuggingFace datasets backend, and add dataset-specific shuffling, tokenization , and grouping on-the-fly. To support data parallel training, we added specific sharding logic for efficiency. See C4Datasets for more details.

Vision streaming datasets are supported via a patched version of the webdatasets package, and added support for data sharding by workers for fast augmentations. See composer.datasets.webdataset for more details.

Baseline GPT-3, ResNet34-SSD, and Vision Transformer benchmarks

Configurations for GPT-3-like models ranging from 125m to 760m parameters are now released, and use DeepSpeed Zero Stage 0 for memory-efficient training. * GPT3-125m * GPT3-350m * GPT3-760m

We've also added the Single Shot Detection (SSD) model (Wei et al, 2016) with a ResNet34 backbone, based on the MLPerf reference implementation.

Our first Vision Transformer benchmark is the ViT-S/16 model from Touvron et al, 2021, and based on the vit-pytorch package.

See below for the full details:

What's Changed

  • Export Transforms in composer.algorithms by @ajaysaini725 in https://github.com/mosaicml/composer/pull/603
  • Make batchnorm default for UNet by @dskhudia in https://github.com/mosaicml/composer/pull/535
  • Fix noopmodel algorithm by @dskhudia in https://github.com/mosaicml/composer/pull/614
  • Pin pre-1.0 packages by @bandish-shah in https://github.com/mosaicml/composer/pull/595
  • Updated dark mode composer logo, and graph by @nqn in https://github.com/mosaicml/composer/pull/617
  • Jenkins + Docker Improvements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/621
  • update README links by @hanlint in https://github.com/mosaicml/composer/pull/628
  • Remove all old timing calls by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/594
  • Remove state shorthand by @mvpatel2000 in https://github.com/mosaicml/composer/pull/629
  • add bfloat16 support by @nikhilsardana in https://github.com/mosaicml/composer/pull/433
  • v0.4.0 Hotfix: Docker documentation updates by @bandish-shah in https://github.com/mosaicml/composer/pull/631
  • Fix wrong icons in the method cards by @hanlint in https://github.com/mosaicml/composer/pull/636
  • fix autocast for pytorch < 1.10 by @nikhilsardana in https://github.com/mosaicml/composer/pull/639
  • Add tutorial notebooks to the README by @moinnadeem in https://github.com/mosaicml/composer/pull/630
  • Converted Stateless Schedulers to Classes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/632
  • Jenkinsfile Fixes Part 2 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/627
  • Add C4 Streaming dataset by @abhi-mosaic in https://github.com/mosaicml/composer/pull/489
  • CONTRIBUTING.md additions by @kobindra in https://github.com/mosaicml/composer/pull/648
  • Hide showing object as a base class; fix skipping documentation of forward; fixed docutils dependency. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/643
  • Matthew/functional docstrings update by @growlix in https://github.com/mosaicml/composer/pull/622
  • docstrings improvements for core modules by @dskhudia in https://github.com/mosaicml/composer/pull/598
  • ssd-resnet34 on COCO map 0.23 by @florescl in https://github.com/mosaicml/composer/pull/646
  • Fix broken "best practices" link by @growlix in https://github.com/mosaicml/composer/pull/649
  • Update progressive resizing to work for semantic segmentation by @coryMosaicML in https://github.com/mosaicml/composer/pull/604
  • Let C4 Dataset overwrite num_workers if set incorrectly by @abhi-mosaic in https://github.com/mosaicml/composer/pull/655
  • Lazy imports for pycocotools by @abhi-mosaic in https://github.com/mosaicml/composer/pull/656
  • W&B excludes final eval metrics when plotted as a fxn of epoch or trainer/global_step by @growlix in https://github.com/mosaicml/composer/pull/633
  • Update GPT3-yamls for default 8xA100-40GB by @abhi-mosaic in https://github.com/mosaicml/composer/pull/663
  • Set WandB default to log rank zero only by @abhi-mosaic in https://github.com/mosaicml/composer/pull/461
  • Update schedulers guide by @hanlint in https://github.com/mosaicml/composer/pull/661
  • [XS] Fix a TQDM deserialization bug by @jbloxham in https://github.com/mosaicml/composer/pull/665
  • Add defaults to the docstrings for algorithms by @hanlint in https://github.com/mosaicml/composer/pull/662
  • Fix ZeRO config by @jbloxham in https://github.com/mosaicml/composer/pull/667
  • [XS] fix formatting for colout by @hanlint in https://github.com/mosaicml/composer/pull/666
  • Composer.core docstring touch-up by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/657
  • Add Uniform bounding box sampling option for CutOut and CutMix by @coryMosaicML in https://github.com/mosaicml/composer/pull/634
  • Update README.md by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/678
  • Fix bug in trainer test by @hanlint in https://github.com/mosaicml/composer/pull/651
  • InMemoryLogger has get_timeseries() method by @growlix in https://github.com/mosaicml/composer/pull/644
  • Batchwise resolution for SWA by @growlix in https://github.com/mosaicml/composer/pull/654
  • Fixed the conda build script so it runs on jenkins by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/676
  • Yahp version update to 0.1.0 by @Averylamp in https://github.com/mosaicml/composer/pull/674
  • Streaming vision datasets by @knighton in https://github.com/mosaicml/composer/pull/284
  • Fix DeepSpeed checkpointing by @jbloxham in https://github.com/mosaicml/composer/pull/686
  • Vit by @A-Jacobson in https://github.com/mosaicml/composer/pull/243
  • [S] cleanup tldr; standardize __all__ by @hanlint in https://github.com/mosaicml/composer/pull/688
  • Unify algorithms part 2: mixup, cutmix, label smoothing by @dblalock in https://github.com/mosaicml/composer/pull/658
  • composer.optim docstrings by @jbloxham in https://github.com/mosaicml/composer/pull/653
  • Fix DatasetHparams, WebDatasetHparams docstring by @growlix in https://github.com/mosaicml/composer/pull/697
  • Models docstrings by @A-Jacobson in https://github.com/mosaicml/composer/pull/469
  • docstrings improvements for composer.datasets by @dskhudia in https://github.com/mosaicml/composer/pull/694
  • Updated contributing.md and the style guide by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/670
  • Ability to retry ADE20k crop transform by @Landanjs in https://github.com/mosaicml/composer/pull/702
  • Add mmsegmentation DeepLabv3(+) by @Landanjs in https://github.com/mosaicml/composer/pull/684
  • Unify functional API part 3 by @dblalock in https://github.com/mosaicml/composer/pull/715
  • Update example notebooks by @coryMosaicML in https://github.com/mosaicml/composer/pull/707
  • [Checkpointing - PR1] Store the rank_zero_seed on state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/680
  • [Checkpointing - PR2] Added in new Checkpointing Events by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/690
  • [Checkpointing - PR3] Clean up RNG and State serialization by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/692
  • [Checkpointing - PR4] Refactored the CheckpointLoader into a load_checkpoint function by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/693
  • Update {blurpool,factorize,ghostbn} method cards by @dblalock in https://github.com/mosaicml/composer/pull/711
  • [Checkpointing - PR 5] Move the CheckpointSaver to a callback. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/687
  • Update datasets docstrings by @growlix in https://github.com/mosaicml/composer/pull/709
  • add notebooks and functional api by @hanlint in https://github.com/mosaicml/composer/pull/714
  • Migrating from PTL notebook by @florescl in https://github.com/mosaicml/composer/pull/436
  • Docs 0.4.1: Profiler section and tutorials by @bandish-shah in https://github.com/mosaicml/composer/pull/696
  • Improve datasets docstrings by @knighton in https://github.com/mosaicml/composer/pull/695
  • Update C4Dataset to repeat, handle max_samples safely by @abhi-mosaic in https://github.com/mosaicml/composer/pull/722
  • Fix docs build by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/773
  • v0.5 Release by @hanlint in https://github.com/mosaicml/composer/pull/732

New Contributors

  • @nikhilsardana made their first contribution in https://github.com/mosaicml/composer/pull/433
  • @knighton made their first contribution in https://github.com/mosaicml/composer/pull/284

Full Changelog: https://github.com/mosaicml/composer/compare/v0.4.0...v0.5.0

- Python
Published by hanlint over 4 years ago

composer - Release Version 0.4.0

What's Changed

  • Release/0.3.0 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/102
  • Create dataloader on trainer init() by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/92
  • label smoothing will not work without alpha set by @A-Jacobson in https://github.com/mosaicml/composer/pull/100
  • Warmup and cosine annealing warm restarts combine sequentially by @jacobfulano in https://github.com/mosaicml/composer/pull/99
  • Moved device.prepare() to init by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/111
  • run_event for callbacks, removed deferred logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/85
  • Remove composer.trainer.ddp; replace with composer.utils.ddp by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/105
  • Running callbacks befor algorithms for the INIT event in the engine by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/113
  • Replaced atexit with cleanup methods by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/112
  • Deepspeed Integration by @jbloxham in https://github.com/mosaicml/composer/pull/109
  • Fix loss reporting by @jbloxham in https://github.com/mosaicml/composer/pull/130
  • Run Directory Uploader by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/101
  • Dataloader Upgrades by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/114
  • Synthetic Datasets and Subset Sampling by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/110
  • Remove argparse from setup.py by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/131
  • Fixed pickling of torch.memory_format objects by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/132
  • Fixed issue #135; rename total_batch_size to train_batch_size by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/137
  • Implement MosaicMLLoggerBackend by @ajaysaini725 in https://github.com/mosaicml/composer/pull/81
  • Add a linear learning rate decay by @moinnadeem in https://github.com/mosaicml/composer/pull/142
  • Apply channels last on init by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/147
  • Update Trainer checkpointing documentation by @moinnadeem in https://github.com/mosaicml/composer/pull/150
  • Address crashes with DDP + Checkpointing by @moinnadeem in https://github.com/mosaicml/composer/pull/151
  • Sudo in the dockerimage by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/152
  • Remove curriculum learning by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/164
  • Remove broken symlinks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/163
  • Removed dataclass from state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/153
  • Guard artifact uploading in wandb with ddp barriers by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/162
  • add CODEOFCONDUCT.md by @kobindra in https://github.com/mosaicml/composer/pull/160
  • [XS] Fix wandb logger by @jbloxham in https://github.com/mosaicml/composer/pull/172
  • Print help on run_mosaic_trainer.py, cleaned up verbosity. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/170
  • DeepSpeed ZeRO config options by @jbloxham in https://github.com/mosaicml/composer/pull/166
  • DDP Seeding Across Processes by @ajaysaini725 in https://github.com/mosaicml/composer/pull/173
  • Fixed the run directory uploader test by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/177
  • Fix broken gpu tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/181
  • Conditionally skip tests when installed with mosaicml[dev] by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/185
  • A yapf update broke some formatting...re-running the linter by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/188
  • Timer PR parts 1 and 2 from #146 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/174
  • Fixed pyright issues by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/198
  • Additional Tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/191
  • Propagate processes that were sigkilled by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/184
  • Add the ability to load a checkpoint without restoring state by @moinnadeem in https://github.com/mosaicml/composer/pull/169
  • Add ResNet-9 for CIFAR-10 by @dblalock in https://github.com/mosaicml/composer/pull/193
  • Added helper methods for torch.distributed.boradcast by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/189
  • Checkpointing & DeepSpeed by @jbloxham in https://github.com/mosaicml/composer/pull/199
  • Distinguish between dist and DDP by @jbloxham in https://github.com/mosaicml/composer/pull/201
  • DeepSpeed precision fixes for CV by @jbloxham in https://github.com/mosaicml/composer/pull/197
  • Fix deterministic mode (and use it for tests); simplify checkpointing tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/203
  • Load checkpoints from cloud storage by @ravirahman in https://github.com/mosaicml/composer/pull/200
  • Updated the DataSpec for the timing abstraction (#146) parts 3 and 4 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/178
  • Add larger GPT models by @jbloxham in https://github.com/mosaicml/composer/pull/213
  • Add BERT Base to Composer by @moinnadeem in https://github.com/mosaicml/composer/pull/195
  • Integrate the timer into the training loop by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/210
  • Dockerfile enhancements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/182
  • Adding checkpointing at the end of training by @moinnadeem in https://github.com/mosaicml/composer/pull/219
  • Adding conditional branching on data_collator by @moinnadeem in https://github.com/mosaicml/composer/pull/220
  • Fixes apt sources bug fix by @Averylamp in https://github.com/mosaicml/composer/pull/231
  • Remove old timing calls from layer freezing by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/216
  • Require pip install -e be pip install --user -e when running as root by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/232
  • DeepLabv3 + ADE20k benchmark by @Landanjs in https://github.com/mosaicml/composer/pull/107
  • Remove old timing calls from selective backprop by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/221
  • Clean up the tests to make them work on jenkins by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/233
  • Make the run directory rank-local; fix checkpoints saving and restoring by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/215
  • Cleaned Up State by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/223
  • Fix the speed monitor by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/238
  • Fixed loggers and callbacks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/240
  • Fix ade20k padding fill calculation by @Landanjs in https://github.com/mosaicml/composer/pull/250
  • Adding fix for NLP learning rates by @moinnadeem in https://github.com/mosaicml/composer/pull/235
  • Training Loop Profiler by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/97
  • WIP: Composer Jenkinsfile by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/82
  • Fix broken tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/257
  • Fix bug with AFTER_DATALOADER event; remove microbatches from state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/258
  • Remove the DDP DataLoader by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/245
  • Fix Jenkins to work on PRs from Forks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/267
  • add ability to specify custom run name, with rank auto-appended by @dblalock in https://github.com/mosaicml/composer/pull/264
  • Remove secrets from the yaml by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/261
  • Checkpoint logging and doc fixes by @ajaysaini725 in https://github.com/mosaicml/composer/pull/270
  • Remove custom W&B config changes by @siriuslee in https://github.com/mosaicml/composer/pull/236
  • Dramatically increase default dist_timeout by @jbloxham in https://github.com/mosaicml/composer/pull/272
  • Add factorization by @dblalock in https://github.com/mosaicml/composer/pull/53
  • Allow str and dict in Trainer init signature by @hanlint in https://github.com/mosaicml/composer/pull/277
  • Add kwargs back to the closure by @jbloxham in https://github.com/mosaicml/composer/pull/292
  • Default to num_classes=10 for CIFAR10_ResNet56 by @hanlint in https://github.com/mosaicml/composer/pull/293
  • Use tqdm.auto for notebooks by @hanlint in https://github.com/mosaicml/composer/pull/298
  • Added ResNet20 by @growlix in https://github.com/mosaicml/composer/pull/289
  • Optimizer Surgery by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/249
  • Don't init dist when world_size is 1 by @jbloxham in https://github.com/mosaicml/composer/pull/311
  • Scheduler defaults to step-wise instead of epoch-wise by @hanlint in https://github.com/mosaicml/composer/pull/312
  • Added the version to composer.init by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/315
  • Rename checkpoint API by @hanlint in https://github.com/mosaicml/composer/pull/281
  • Update setup.py by @Averylamp in https://github.com/mosaicml/composer/pull/321
  • Timm support by @A-Jacobson in https://github.com/mosaicml/composer/pull/262
  • [XS] use correct package name in error messages by @jbloxham in https://github.com/mosaicml/composer/pull/331
  • Multiple Evaluator Datasets by @anisehsani in https://github.com/mosaicml/composer/pull/120
  • Fixed all uses of textwrap.dedent by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/332
  • Remove explicit YAHP constructs from algorithms by @jbloxham in https://github.com/mosaicml/composer/pull/317
  • Configure DeepSpeed with an ordinary DeepSpeed config dict by @jbloxham in https://github.com/mosaicml/composer/pull/322
  • Run Event.BATCH_END and Event.EPOCH_END after the timer is increm… by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/310
  • Guard dist.barrier in the checkpointer with try/finally by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/334
  • Replace composer ResNet with torchvision ResNet by @Landanjs in https://github.com/mosaicml/composer/pull/314
  • Fail fast if any step fails by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/333
  • Replace most instances of "Mosaic" with "Composer" by @jbloxham in https://github.com/mosaicml/composer/pull/335
  • Ensure that the training dataloader does not have an active iterator. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/337
  • Fully flatten checkpoint params by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/325
  • Added Pylint and docformatter by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/339
  • Add compression flag by @mvpatel2000 in https://github.com/mosaicml/composer/pull/336
  • Fix cutmix and mixup reliance on num_classes model attribute by @Landanjs in https://github.com/mosaicml/composer/pull/348
  • Copy extra_init_params to get rid of recursive config dicts by @siriuslee in https://github.com/mosaicml/composer/pull/316
  • Composer Style Guide by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/319
  • Get rid of create_from_hparams by @jbloxham in https://github.com/mosaicml/composer/pull/351
  • Added In Memory Logger, Timestamp Object by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/352
  • Fix Checkpoints by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/359
  • Add channels last standalone function by @dblalock in https://github.com/mosaicml/composer/pull/356
  • Quick style guide typo fix by @ajaysaini725 in https://github.com/mosaicml/composer/pull/360
  • Removed template_default fields in hparams by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/369
  • removed byo_trainer by @anisehsani in https://github.com/mosaicml/composer/pull/374
  • Fix sample SD inference multiplication by @Landanjs in https://github.com/mosaicml/composer/pull/376
  • Support import composer.functional as cf by @dblalock in https://github.com/mosaicml/composer/pull/368
  • Fix composer.functional page no longer showing functions by @dblalock in https://github.com/mosaicml/composer/pull/379
  • Testing trainer.fit on each algorithm, callback, logger, and profiler by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/371
  • Functional API renaming part 1 by @dblalock in https://github.com/mosaicml/composer/pull/380
  • Updated adddatasettransform() to have flexible insertion point by @growlix in https://github.com/mosaicml/composer/pull/320
  • Rename Event.TRAINING_START to Event.FIT; remove Event.TRAINING_END by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/263
  • Remove requirement for validation and metrics by @hanlint in https://github.com/mosaicml/composer/pull/378
  • Docs Refactor by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/386
  • Documentation Outline by @ajaysaini725 in https://github.com/mosaicml/composer/pull/302
  • Fix tests without DDP by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/389
  • Use Makefile instead of scripts; enable easier testing by @hanlint in https://github.com/mosaicml/composer/pull/387
  • Address Doc Fixes for Surgery and StochasticDepth by @ajaysaini725 in https://github.com/mosaicml/composer/pull/413
  • Cleanup conftest.py by @hanlint in https://github.com/mosaicml/composer/pull/390
  • Move world_size guard to trainer by @hanlint in https://github.com/mosaicml/composer/pull/392
  • Add defaults to functional API / share defaults across interfaces by @dblalock in https://github.com/mosaicml/composer/pull/377
  • Un-deprecate steps_per_epoch by @jbloxham in https://github.com/mosaicml/composer/pull/418
  • Remove the walkthrough section of the docs; replace with module-level docstrings by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/417
  • Rename Loggers by @hanlint in https://github.com/mosaicml/composer/pull/427
  • Alternative docs theme: furo by @nqn in https://github.com/mosaicml/composer/pull/341
  • Clarify DWD defaults by @abhi-mosaic in https://github.com/mosaicml/composer/pull/410
  • Added :ignore-module-all: to docs by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/431
  • Configured doctest by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/432
  • Functional API renaming part 2 by @dblalock in https://github.com/mosaicml/composer/pull/426
  • Pytest Refactor Part 1 by @hanlint in https://github.com/mosaicml/composer/pull/391
  • Deprecate scale scheduler algorithm and move to trainer by @jbloxham in https://github.com/mosaicml/composer/pull/438
  • Removed dead code from the public library; refactored some imports. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/437
  • Trainer test refactor (pytest refactor phase 2) by @hanlint in https://github.com/mosaicml/composer/pull/393
  • Skip saving of direct serialization fields by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/445
  • Hide geninterpolationlambda in mixup like in cutmix and augmix by @dblalock in https://github.com/mosaicml/composer/pull/449
  • Move all AlgorithmHparams classes to shared file by @dblalock in https://github.com/mosaicml/composer/pull/452
  • Trainer Docs + Param ordering + Alibi Export by @ajaysaini725 in https://github.com/mosaicml/composer/pull/419
  • Up and Running with Composer and Speedup Algorithms Demo Notebook by @growlix in https://github.com/mosaicml/composer/pull/340
  • Add NLP tutorial notebook by @Landanjs in https://github.com/mosaicml/composer/pull/370
  • add kaggle notebook by @A-Jacobson in https://github.com/mosaicml/composer/pull/381
  • Refactor Profiler init() by @bandish-shah in https://github.com/mosaicml/composer/pull/422
  • Random doc fixes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/456
  • support integer arguments to Trainer by @hanlint in https://github.com/mosaicml/composer/pull/458
  • Make algorithm functions either public or prefixed with "_" by @dblalock in https://github.com/mosaicml/composer/pull/460
  • bug in train metrics by @A-Jacobson in https://github.com/mosaicml/composer/pull/466
  • Fixes empty log lines if no algorithms are run by @siriuslee in https://github.com/mosaicml/composer/pull/462
  • Add default hparam values for cutout by @dblalock in https://github.com/mosaicml/composer/pull/459
  • Docstrings for composer.utils by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/439
  • notebook tests by @hanlint in https://github.com/mosaicml/composer/pull/468
  • resize_targets set to False by default by @siriuslee in https://github.com/mosaicml/composer/pull/475
  • Remove dist warnings by @hanlint in https://github.com/mosaicml/composer/pull/474
  • Add missing defaults for one function by @dblalock in https://github.com/mosaicml/composer/pull/476
  • Store metadata in json files for algorithms by @hanlint in https://github.com/mosaicml/composer/pull/471
  • Davis/algos intrafile organization by @dblalock in https://github.com/mosaicml/composer/pull/465
  • Get functional API running enough for notebook by @dblalock in https://github.com/mosaicml/composer/pull/479
  • Remove colons from run directory timestamps by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/486
  • Add custom methods notebook by @coryMosaicML in https://github.com/mosaicml/composer/pull/330
  • Move the clean notebooks script to the scripts folder by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/487
  • Checkpoint Usability Initial Changes by @ajaysaini725 in https://github.com/mosaicml/composer/pull/455
  • Removing HF XFail on model registry by @moinnadeem in https://github.com/mosaicml/composer/pull/490
  • Clean up Imports and Tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/482
  • Ravi/docs cleanup 2 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/488
  • Matthew/docstrings update by @growlix in https://github.com/mosaicml/composer/pull/457
  • No autodoc of forward by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/494
  • Update init.py by @growlix in https://github.com/mosaicml/composer/pull/493
  • allow from composer import ComposerModel by @hanlint in https://github.com/mosaicml/composer/pull/496
  • Methods landing page by @nqn in https://github.com/mosaicml/composer/pull/454
  • Small docs change to include timing reference by @anisehsani in https://github.com/mosaicml/composer/pull/500
  • docstring for callbacks by @dskhudia in https://github.com/mosaicml/composer/pull/470
  • Docs cleanup #3 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/502
  • Adding network fixes for the Run Directory Uploader by @moinnadeem in https://github.com/mosaicml/composer/pull/505
  • Adding network retries for downloading GLUE by @moinnadeem in https://github.com/mosaicml/composer/pull/506
  • Matthew/loggers docstrings by @growlix in https://github.com/mosaicml/composer/pull/499
  • Fix Sphinx Warnings by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/520
  • Anaconda configuration by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/507
  • Update docstrings for Colout, CutOut, CutMix, Layer Freezing, Mixup, Label Smoothing, Progressive Resizing by @coryMosaicML in https://github.com/mosaicml/composer/pull/483
  • Stateless schedulers by @jbloxham in https://github.com/mosaicml/composer/pull/463
  • Rename selective_backprop to select_using_loss by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/532
  • Update new README by @hanlint in https://github.com/mosaicml/composer/pull/540
  • Fix dark mode by @nqn in https://github.com/mosaicml/composer/pull/573
  • Fix the run directory uploader when use_procs=True and not using the … by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/547
  • Console font too bright by @nqn in https://github.com/mosaicml/composer/pull/574
  • Fix pilimagecollate by @Landanjs in https://github.com/mosaicml/composer/pull/514
  • ADE20k DeepLabv3 optimized benchmark yaml by @Landanjs in https://github.com/mosaicml/composer/pull/579
  • separate hparams in module docstrings by @hanlint in https://github.com/mosaicml/composer/pull/558
  • Fix DataloaderHparam docs by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/534
  • per #224, update function to use Timer and Time by @jzf2101 in https://github.com/mosaicml/composer/pull/583
  • Clean up Transformer models init function by @moinnadeem in https://github.com/mosaicml/composer/pull/587
  • Docstrings for composer.trainer by @ajaysaini725 in https://github.com/mosaicml/composer/pull/522
  • Additional updates to the loggers docstrings by @growlix in https://github.com/mosaicml/composer/pull/544
  • Profiler docstrings by @bandish-shah in https://github.com/mosaicml/composer/pull/473
  • Updated Model Cards by @ajaysaini725 in https://github.com/mosaicml/composer/pull/375
  • Unify augmentation API part 1 by @dblalock in https://github.com/mosaicml/composer/pull/524
  • Docstrings improvements for core.algorithm, core.callback, etc. by @dskhudia in https://github.com/mosaicml/composer/pull/516
  • Skip ResNet50 + DeepSpeed tests that are timing out by @hanlint in https://github.com/mosaicml/composer/pull/601
  • Make the default splitbatch method a no-op if gradaccum is 1. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/592
  • Add functional/standalone API tutorial notebook by @dblalock in https://github.com/mosaicml/composer/pull/326
  • Merge v0.4 fixes by @hanlint in https://github.com/mosaicml/composer/pull/606
  • updated docstring examples by @growlix in https://github.com/mosaicml/composer/pull/600
  • [v0.4rc] Documentation Guides by @hanlint in https://github.com/mosaicml/composer/pull/531
  • Method cards by @jfrankle in https://github.com/mosaicml/composer/pull/589
  • Improved docstring for surgery algorithms by @dblalock in https://github.com/mosaicml/composer/pull/602
  • Fix Lint by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/611
  • Fix Lint by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/612
  • Updated 'Up and Running with Composer' by @growlix in https://github.com/mosaicml/composer/pull/619
  • Release v0.4.0 by @hanlint in https://github.com/mosaicml/composer/pull/609

New Contributors

  • @A-Jacobson made their first contribution in https://github.com/mosaicml/composer/pull/100
  • @jacobfulano made their first contribution in https://github.com/mosaicml/composer/pull/99
  • @kobindra made their first contribution in https://github.com/mosaicml/composer/pull/160
  • @ravirahman made their first contribution in https://github.com/mosaicml/composer/pull/200
  • @Landanjs made their first contribution in https://github.com/mosaicml/composer/pull/107
  • @siriuslee made their first contribution in https://github.com/mosaicml/composer/pull/236
  • @mvpatel2000 made their first contribution in https://github.com/mosaicml/composer/pull/336
  • @abhi-mosaic made their first contribution in https://github.com/mosaicml/composer/pull/410
  • @jzf2101 made their first contribution in https://github.com/mosaicml/composer/pull/583
  • @jfrankle made their first contribution in https://github.com/mosaicml/composer/pull/589

Full Changelog: https://github.com/mosaicml/composer/compare/v0.3.1...v0.4.0

- Python
Published by hanlint over 4 years ago

composer - Release Version 0.3.1

Hotfix

Hotfix to fix installation of the composer package

- Python
Published by Averylamp over 4 years ago

composer - Release Version 0.3.0

Release PR

Major Changes

  • Python 3.7 Compatibility
  • Adds CutMix Method
  • New Pre-Fork DDP entrypoint

Minor Changes

  • Lazy-Loading of dependencies
  • General Docs updates for readability and correctness
  • DDP Port auto-selection by default (no more conflicting ports upon reuse of trainer)
  • Small bug fixes for YAHP inheritance

Notes

  • Google Colab may have issues installing composer with !pip install mosaicml
    • Known workaround: Install through git with !pip install git+https://github.com/mosaicml/composer@main

- Python
Published by Averylamp over 4 years ago