Recent Releases of allamo

allamo - v6.0.0

Major refactor
Add support for FSDP2 and TP
Add support for various activation functions and introduce LRA function
Add support for FlexAttention, xFormers, FlashAttention3. Improve custom mask and sliding window handling

Full Changelog: https://github.com/chrisociepa/allamo/compare/v5.0.0...v6.0.0

- Python
Published by chrisociepa about 1 year ago

allamo - v5.0.0

Added a hook for external program invocation after saving regular checkpoints
Implemented support for SFT dataset packing with correct RoPE encoding and without cross-contamination
Added support for a new data format: ALM
Introduced support for DPO and DPO-Positive training methods
Added optional sample buffering in the dataloader
Added new utility scripts for data preparation and tokenizer replacement
Fixed bugs in main training scripts and utility scripts

Full Changelog: https://github.com/chrisociepa/allamo/compare/v4.1.0...v5.0.0

- Python
Published by chrisociepa almost 2 years ago

allamo - v4.1.0

Enhanced checkpoint management
Resolved issues with saving checkpoint configurations in JSON format during HF model imports
Expanded HF model export capabilities to encompass extra configuration parameters
Improved DataLoader's memory efficiency by releasing memory more effectively between dataset loads
Discontinued support for the legacy (Simple) DataLoader (breaking change)

Full Changelog: https://github.com/chrisociepa/allamo/compare/v4.0.0...v4.1.0

- Python
Published by chrisociepa about 2 years ago

allamo - v4.0.0

Added support for weighted token-level loss
Added support for adaptive learning rate
Added post-epoch hook for external program invocation

Full Changelog: https://github.com/chrisociepa/allamo/compare/v3.1.0...v4.0.0

- Python
Published by chrisociepa about 2 years ago

allamo - v3.1.0

Added option to configure FSDP sharding strategy
Added optional logging of MD5 checksum for model checkpoint at the end of each epoch
Added option to ignore and overwrite the last checkpoint backup
Bug fixes in the export script

Full Changelog: https://github.com/chrisociepa/allamo/compare/v3.0.0...v3.1.0

- Python
Published by chrisociepa over 2 years ago

allamo - v3.0.0

Change checkpoint configuration storage to JSON file format (breaking change - use convert_config_checkpoint_to_json.py script to convert your checkpoints)
Add support for complex sample formats in DataLoader
Move util scripts to the scripts directory

Full Changelog: https://github.com/chrisociepa/allamo/compare/v2.2.2...v3.0.0

- Python
Published by chrisociepa over 2 years ago

allamo - v2.2.2

Renamed existing file to 'prev' version when creating a new checkpoint

Full Changelog: https://github.com/chrisociepa/allamo/compare/v2.2.1...v2.2.2

- Python
Published by chrisociepa over 2 years ago

allamo - v2.2.1

Improvements in scripts for importing Hugging Face model weights
Enhancements in the script for exporting weights to the Hugging Face format, including added support for Mistral models and setting the output model data type
Refactoring of RMSNorm implementation within the model
Truncation of overly long samples in the DataLoader

Full Changelog: https://github.com/chrisociepa/allamo/compare/v2.2.0...v2.2.1

- Python
Published by chrisociepa over 2 years ago

allamo - v2.2.0

Scripts to import Llama and Mistral weights from HuggingFace
Prevent enabling installed FlashAttention2 by default
Allow configuration of the model's intermediate_size
Reset training upon checkpoint loading if the loaded iter_num exceeds max_iters
Adjust the allocated buffer size for rotary embeddings to match the block_size

Full Changelog: https://github.com/chrisociepa/allamo/compare/v2.1.0...v2.2.0

- Python
Published by chrisociepa over 2 years ago

allamo - v2.1.0

AllamoDataLoader now supports instructions and padding
Improved logging and added new metrics
Enhanced recovery in FSDP from the last checkpoint

Full Changelog: https://github.com/chrisociepa/allamo/compare/v2.0.0...v2.1.0

- Python
Published by chrisociepa over 2 years ago

allamo - v2.0.0

FSDP training
new DataLoader
MFU and MTU metrics
a script to assist with depth upscaling of models

- Python
Published by chrisociepa over 2 years ago

allamo - v1.0.0

First official version of ALLaMo

- Python
Published by chrisociepa over 2 years ago