llama-factory-mfsgd
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: pmahdavi
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 50.5 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
MoFaSGD: Low-rank Momentum Factorization for Memory Efficient Training
This repository contains the official implementation of the paper "Low-rank Momentum Factorization for Memory Efficient Training" (TMLR, 2025). We introduce MoFaSGD, a memory-efficient optimizer that maintains a dynamically updated low-rank SVD representation of the first-order momentum, closely approximating its full-rank counterpart throughout training. This factorization enables a memory-efficient fine-tuning method that adaptively updates the optimization subspace at each iteration.
Our work demonstrates that MoFaSGD achieves a competitive trade-off between memory reduction and performance compared to state-of-the-art low-rank optimization methods like LoRA and GaLore, as well as full-parameter fine-tuning with AdamW.
Acknowledgement
This work is built upon the excellent LLaMA-Factory repository. We thank the authors for their wonderful work and for making their code publicly available.
Features
- Memory-Efficient Training: Fine-tune large language models with significantly less memory, comparable to LoRA.
- Dynamic Subspace Updates: Adaptively updates the optimization subspace at each iteration for better performance.
- Full-Parameter Updates: Enables full-parameter updates while operating in a lower-dimensional space.
Installation
Clone the repository and initialize the submodules:
bash git clone --recursive https://github.com/pmahdavi/llama-factory-mfsgd.git cd llama-factory-mfsgdIf you have already cloned the repository without the--recursiveflag, you can initialize the submodules by running:bash git submodule update --init --recursiveCreate and activate a conda environment:
bash conda create --name llama-factory-env python=3.10 conda activate llama-factory-envInstall the required dependencies:
bash pip install -e ".[torch,metrics]"This will install all the necessary packages, including PyTorch and other core libraries. For more details on optional dependencies, please refer to the original LLaMA-Factory repository.Install the custom GaLore implementation:
bash pip install -e galore-torch/
Usage
The run.py script is the main entry point for running experiments.
To run an experiment:
bash
python run.py <path_to_config_file> [options]
For example, to fine-tune a model with MoFaSGD on your local machine:
bash
python run.py configs/mfsgd/llama3.1_8b_sft_mfsgd_lr.yaml --ngpus 1
Available Arguments
-
config_file: Path to the YAML config file. -
--ngpus: Number of GPUs for the job (default:2). -
--base_output_dir: Base output directory for runs.
Experiment Setup
This repository contains the code and configurations for the LLaMA-3.1 8B instruction-tuning experiments on the Tulu3 dataset, as described in the MoFaSGD paper.
Example Configurations
Example configurations for the experiments can be found in the configs/mfsgd/ directory. These files provide a starting point for running experiments with different optimizers:
- MoFaSGD:
configs/mfsgd/llama3.1_8b_sft_mfsgd_lr.yaml - GaLore:
configs/mfsgd/llama3.1_8b_sft_galore.yaml - LoRA:
configs/mfsgd/llama3.1_8b_sft_lora.yaml - AdamW:
configs/mfsgd/llama3.1_8b_sft_adamw_bf16.yaml
You can modify these files or create new ones to run your own experiments.
Memory Profiling
To facilitate the memory analysis presented in the paper, we have integrated a memory profiling tool based on the PyTorch memory profiler. To use it, add the following parameters to your configuration YAML file:
yaml
profile_memory_from_start: true
profile_memory_stop_step: 4
profile_memory_max_entries: 10000000
-
profile_memory_from_start: Set totrueto begin profiling from the start of the training. -
profile_memory_stop_step: The profiler will dump a snapshot of the memory usage after this many optimizer steps and then stop. -
profile_memory_max_entries: The maximum number of memory allocation/deallocation events to record.
Example command:
bash
python run.py <your_profiling_config.yaml> --ngpus 1
Citation
If you find our work useful, please cite our paper:
bibtex
@article{mahdavinia2025mofasgd,
title={Low-rank Momentum Factorization for Memory Efficient Training},
author={Mahdavinia, Pouria and Mahdavi, Mehrdad},
journal={Transactions on Machine Learning Research},
year={2025},
url={https://openreview.net/forum?id=W3D3TVo9a3}
}
Owner
- Name: Pouria Mahdavinia
- Login: pmahdavi
- Kind: user
- Location: State College, PA
- Company: Penn State
- Repositories: 1
- Profile: https://github.com/pmahdavi
Computer Science Ph.D. student
Citation (CITATION.cff)
cff-version: 1.2.0
date-released: 2024-03
message: "If you use this software, please cite it as below."
authors:
- family-names: "Zheng"
given-names: "Yaowei"
- family-names: "Zhang"
given-names: "Richong"
- family-names: "Zhang"
given-names: "Junhao"
- family-names: "Ye"
given-names: "Yanhan"
- family-names: "Luo"
given-names: "Zheyan"
- family-names: "Feng"
given-names: "Zhangchi"
- family-names: "Ma"
given-names: "Yongqiang"
title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
url: "https://arxiv.org/abs/2403.13372"
preferred-citation:
type: conference-paper
conference:
name: "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)"
authors:
- family-names: "Zheng"
given-names: "Yaowei"
- family-names: "Zhang"
given-names: "Richong"
- family-names: "Zhang"
given-names: "Junhao"
- family-names: "Ye"
given-names: "Yanhan"
- family-names: "Luo"
given-names: "Zheyan"
- family-names: "Feng"
given-names: "Zhangchi"
- family-names: "Ma"
given-names: "Yongqiang"
title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
url: "https://arxiv.org/abs/2403.13372"
year: 2024
publisher: "Association for Computational Linguistics"
address: "Bangkok, Thailand"
GitHub Events
Total
- Push event: 3
- Public event: 1
Last Year
- Push event: 3
- Public event: 1
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- docker/build-push-action v6 composite
- docker/login-action v3 composite
- docker/setup-buildx-action v3 composite
- jlumbroso/free-disk-space 54081f138730dfa15788a46383842cd2f914a1be composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- pypa/gh-action-pypi-publish release/v1 composite
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- ${BASE_IMAGE} latest build
- ${BASE_IMAGE} latest build
- ${BASE_IMAGE} latest build
- accelerate >=1.3.0,<=1.7.0
- av *
- datasets >=2.16.0,<=3.6.0
- einops *
- fastapi *
- fire *
- gradio >=4.38.0,<=5.31.0
- hf-transfer *
- librosa *
- matplotlib >=3.7.0
- modelscope >=1.14.0
- numpy <2.0.0
- omegaconf *
- packaging *
- pandas >=2.0.0
- peft >=0.14.0,<=0.15.2
- protobuf *
- pydantic <=2.10.6
- pyyaml *
- scipy *
- sentencepiece *
- sse-starlette *
- tiktoken *
- tokenizers >=0.19.0,<=0.21.1
- transformers >=4.49.0,<=4.51.3,
- transformers >=4.49.0,<=4.52.4,
- trl >=0.8.6,<=0.9.6
- tyro <0.9.0
- uvicorn *