llama-factory-mfsgd

https://github.com/pmahdavi/llama-factory-mfsgd

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: pmahdavi
License: apache-2.0
Language: Python
Default Branch: main
Size: 50.5 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed 11 months ago

Metadata Files

Readme Contributing License Code of conduct Citation Security

MoFaSGD: Low-rank Momentum Factorization for Memory Efficient Training

This repository contains the official implementation of the paper "Low-rank Momentum Factorization for Memory Efficient Training" (TMLR, 2025). We introduce MoFaSGD, a memory-efficient optimizer that maintains a dynamically updated low-rank SVD representation of the first-order momentum, closely approximating its full-rank counterpart throughout training. This factorization enables a memory-efficient fine-tuning method that adaptively updates the optimization subspace at each iteration.

Our work demonstrates that MoFaSGD achieves a competitive trade-off between memory reduction and performance compared to state-of-the-art low-rank optimization methods like LoRA and GaLore, as well as full-parameter fine-tuning with AdamW.

Acknowledgement

This work is built upon the excellent LLaMA-Factory repository. We thank the authors for their wonderful work and for making their code publicly available.

Features

Memory-Efficient Training: Fine-tune large language models with significantly less memory, comparable to LoRA.
Dynamic Subspace Updates: Adaptively updates the optimization subspace at each iteration for better performance.
Full-Parameter Updates: Enables full-parameter updates while operating in a lower-dimensional space.

Installation

Clone the repository and initialize the submodules: bash git clone --recursive https://github.com/pmahdavi/llama-factory-mfsgd.git cd llama-factory-mfsgd If you have already cloned the repository without the --recursive flag, you can initialize the submodules by running: bash git submodule update --init --recursive
Create and activate a conda environment: bash conda create --name llama-factory-env python=3.10 conda activate llama-factory-env
Install the required dependencies: bash pip install -e ".[torch,metrics]" This will install all the necessary packages, including PyTorch and other core libraries. For more details on optional dependencies, please refer to the original LLaMA-Factory repository.
Install the custom GaLore implementation: bash pip install -e galore-torch/

Usage

The run.py script is the main entry point for running experiments.

To run an experiment: bash python run.py <path_to_config_file> [options]

For example, to fine-tune a model with MoFaSGD on your local machine:

bash python run.py configs/mfsgd/llama3.1_8b_sft_mfsgd_lr.yaml --ngpus 1

Available Arguments

config_file: Path to the YAML config file.
--ngpus: Number of GPUs for the job (default: 2).
--base_output_dir: Base output directory for runs.

Experiment Setup

This repository contains the code and configurations for the LLaMA-3.1 8B instruction-tuning experiments on the Tulu3 dataset, as described in the MoFaSGD paper.

Example Configurations

Example configurations for the experiments can be found in the configs/mfsgd/ directory. These files provide a starting point for running experiments with different optimizers:

MoFaSGD: configs/mfsgd/llama3.1_8b_sft_mfsgd_lr.yaml
GaLore: configs/mfsgd/llama3.1_8b_sft_galore.yaml
LoRA: configs/mfsgd/llama3.1_8b_sft_lora.yaml
AdamW: configs/mfsgd/llama3.1_8b_sft_adamw_bf16.yaml

You can modify these files or create new ones to run your own experiments.

Memory Profiling

To facilitate the memory analysis presented in the paper, we have integrated a memory profiling tool based on the PyTorch memory profiler. To use it, add the following parameters to your configuration YAML file:

yaml profile_memory_from_start: true profile_memory_stop_step: 4 profile_memory_max_entries: 10000000

profile_memory_from_start: Set to true to begin profiling from the start of the training.
profile_memory_stop_step: The profiler will dump a snapshot of the memory usage after this many optimizer steps and then stop.
profile_memory_max_entries: The maximum number of memory allocation/deallocation events to record.

Example command: bash python run.py <your_profiling_config.yaml> --ngpus 1

Citation

If you find our work useful, please cite our paper:

bibtex @article{mahdavinia2025mofasgd, title={Low-rank Momentum Factorization for Memory Efficient Training}, author={Mahdavinia, Pouria and Mahdavi, Mehrdad}, journal={Transactions on Machine Learning Research}, year={2025}, url={https://openreview.net/forum?id=W3D3TVo9a3} }

Owner

Name: Pouria Mahdavinia
Login: pmahdavi
Kind: user
Location: State College, PA
Company: Penn State

Repositories: 1
Profile: https://github.com/pmahdavi

Computer Science Ph.D. student

Citation (CITATION.cff)

cff-version: 1.2.0
date-released: 2024-03
message: "If you use this software, please cite it as below."
authors:
- family-names: "Zheng"
  given-names: "Yaowei"
- family-names: "Zhang"
  given-names: "Richong"
- family-names: "Zhang"
  given-names: "Junhao"
- family-names: "Ye"
  given-names: "Yanhan"
- family-names: "Luo"
  given-names: "Zheyan"
- family-names: "Feng"
  given-names: "Zhangchi"
- family-names: "Ma"
  given-names: "Yongqiang"
title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
url: "https://arxiv.org/abs/2403.13372"
preferred-citation:
  type: conference-paper
  conference:
    name: "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)"
  authors:
    - family-names: "Zheng"
      given-names: "Yaowei"
    - family-names: "Zhang"
      given-names: "Richong"
    - family-names: "Zhang"
      given-names: "Junhao"
    - family-names: "Ye"
      given-names: "Yanhan"
    - family-names: "Luo"
      given-names: "Zheyan"
    - family-names: "Feng"
      given-names: "Zhangchi"
    - family-names: "Ma"
      given-names: "Yongqiang"
  title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
  url: "https://arxiv.org/abs/2403.13372"
  year: 2024
  publisher: "Association for Computational Linguistics"
  address: "Bangkok, Thailand"

GitHub Events

Total

Push event: 3
Public event: 1

Last Year

Push event: 3
Public event: 1

Dependencies

.github/workflows/docker.yml actions

actions/checkout v4 composite
actions/setup-python v5 composite
docker/build-push-action v6 composite
docker/login-action v3 composite
docker/setup-buildx-action v3 composite
jlumbroso/free-disk-space 54081f138730dfa15788a46383842cd2f914a1be composite

.github/workflows/label_issue.yml actions

.github/workflows/publish.yml actions

actions/checkout v4 composite
actions/setup-python v5 composite
pypa/gh-action-pypi-publish release/v1 composite

.github/workflows/tests.yml actions

actions/cache v4 composite
actions/checkout v4 composite
actions/setup-python v5 composite

docker/docker-cuda/Dockerfile docker

${BASE_IMAGE} latest build

docker/docker-cuda/docker-compose.yml docker

docker/docker-npu/Dockerfile docker

${BASE_IMAGE} latest build

docker/docker-npu/docker-compose.yml docker

docker/docker-rocm/Dockerfile docker

${BASE_IMAGE} latest build

docker/docker-rocm/docker-compose.yml docker

pyproject.toml pypi

requirements.txt pypi

accelerate >=1.3.0,<=1.7.0
av *
datasets >=2.16.0,<=3.6.0
einops *
fastapi *
fire *
gradio >=4.38.0,<=5.31.0
hf-transfer *
librosa *
matplotlib >=3.7.0
modelscope >=1.14.0
numpy <2.0.0
omegaconf *
packaging *
pandas >=2.0.0
peft >=0.14.0,<=0.15.2
protobuf *
pydantic <=2.10.6
pyyaml *
scipy *
sentencepiece *
sse-starlette *
tiktoken *
tokenizers >=0.19.0,<=0.21.1
transformers >=4.49.0,<=4.51.3,
transformers >=4.49.0,<=4.52.4,
trl >=0.8.6,<=0.9.6
tyro <0.9.0
uvicorn *

setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science