Recent Releases of allamo

allamo - v6.0.0

  • Major refactor
  • Add support for FSDP2 and TP
  • Add support for various activation functions and introduce LRA function
  • Add support for FlexAttention, xFormers, FlashAttention3. Improve custom mask and sliding window handling

Full Changelog: https://github.com/chrisociepa/allamo/compare/v5.0.0...v6.0.0

- Python
Published by chrisociepa 12 months ago

allamo - v5.0.0

  • Added a hook for external program invocation after saving regular checkpoints
  • Implemented support for SFT dataset packing with correct RoPE encoding and without cross-contamination
  • Added support for a new data format: ALM
  • Introduced support for DPO and DPO-Positive training methods
  • Added optional sample buffering in the dataloader
  • Added new utility scripts for data preparation and tokenizer replacement
  • Fixed bugs in main training scripts and utility scripts

Full Changelog: https://github.com/chrisociepa/allamo/compare/v4.1.0...v5.0.0

- Python
Published by chrisociepa over 1 year ago

allamo - v4.1.0

  • Enhanced checkpoint management
  • Resolved issues with saving checkpoint configurations in JSON format during HF model imports
  • Expanded HF model export capabilities to encompass extra configuration parameters
  • Improved DataLoader's memory efficiency by releasing memory more effectively between dataset loads
  • Discontinued support for the legacy (Simple) DataLoader (breaking change)

Full Changelog: https://github.com/chrisociepa/allamo/compare/v4.0.0...v4.1.0

- Python
Published by chrisociepa almost 2 years ago

allamo - v4.0.0

  • Added support for weighted token-level loss
  • Added support for adaptive learning rate
  • Added post-epoch hook for external program invocation

Full Changelog: https://github.com/chrisociepa/allamo/compare/v3.1.0...v4.0.0

- Python
Published by chrisociepa almost 2 years ago

allamo - v3.1.0

  • Added option to configure FSDP sharding strategy
  • Added optional logging of MD5 checksum for model checkpoint at the end of each epoch
  • Added option to ignore and overwrite the last checkpoint backup
  • Bug fixes in the export script

Full Changelog: https://github.com/chrisociepa/allamo/compare/v3.0.0...v3.1.0

- Python
Published by chrisociepa about 2 years ago

allamo - v3.0.0

  • Change checkpoint configuration storage to JSON file format (breaking change - use convert_config_checkpoint_to_json.py script to convert your checkpoints)
  • Add support for complex sample formats in DataLoader
  • Move util scripts to the scripts directory

Full Changelog: https://github.com/chrisociepa/allamo/compare/v2.2.2...v3.0.0

- Python
Published by chrisociepa about 2 years ago

allamo - v2.2.2

  • Renamed existing file to 'prev' version when creating a new checkpoint

Full Changelog: https://github.com/chrisociepa/allamo/compare/v2.2.1...v2.2.2

- Python
Published by chrisociepa about 2 years ago

allamo - v2.2.1

  • Improvements in scripts for importing Hugging Face model weights
  • Enhancements in the script for exporting weights to the Hugging Face format, including added support for Mistral models and setting the output model data type
  • Refactoring of RMSNorm implementation within the model
  • Truncation of overly long samples in the DataLoader

Full Changelog: https://github.com/chrisociepa/allamo/compare/v2.2.0...v2.2.1

- Python
Published by chrisociepa about 2 years ago

allamo - v2.2.0

  • Scripts to import Llama and Mistral weights from HuggingFace
  • Prevent enabling installed FlashAttention2 by default
  • Allow configuration of the model's intermediate_size
  • Reset training upon checkpoint loading if the loaded iter_num exceeds max_iters
  • Adjust the allocated buffer size for rotary embeddings to match the block_size

Full Changelog: https://github.com/chrisociepa/allamo/compare/v2.1.0...v2.2.0

- Python
Published by chrisociepa about 2 years ago

allamo - v2.1.0

  • AllamoDataLoader now supports instructions and padding
  • Improved logging and added new metrics
  • Enhanced recovery in FSDP from the last checkpoint

Full Changelog: https://github.com/chrisociepa/allamo/compare/v2.0.0...v2.1.0

- Python
Published by chrisociepa about 2 years ago

allamo - v2.0.0

  • FSDP training
  • new DataLoader
  • MFU and MTU metrics
  • a script to assist with depth upscaling of models

- Python
Published by chrisociepa about 2 years ago

allamo - v1.0.0

First official version of ALLaMo

- Python
Published by chrisociepa about 2 years ago