Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: pmahdavi
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 50.5 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Code of conduct Citation Security

README.md

MoFaSGD: Low-rank Momentum Factorization for Memory Efficient Training

This repository contains the official implementation of the paper "Low-rank Momentum Factorization for Memory Efficient Training" (TMLR, 2025). We introduce MoFaSGD, a memory-efficient optimizer that maintains a dynamically updated low-rank SVD representation of the first-order momentum, closely approximating its full-rank counterpart throughout training. This factorization enables a memory-efficient fine-tuning method that adaptively updates the optimization subspace at each iteration.

Our work demonstrates that MoFaSGD achieves a competitive trade-off between memory reduction and performance compared to state-of-the-art low-rank optimization methods like LoRA and GaLore, as well as full-parameter fine-tuning with AdamW.

Acknowledgement

This work is built upon the excellent LLaMA-Factory repository. We thank the authors for their wonderful work and for making their code publicly available.

Features

  • Memory-Efficient Training: Fine-tune large language models with significantly less memory, comparable to LoRA.
  • Dynamic Subspace Updates: Adaptively updates the optimization subspace at each iteration for better performance.
  • Full-Parameter Updates: Enables full-parameter updates while operating in a lower-dimensional space.

Installation

  1. Clone the repository and initialize the submodules: bash git clone --recursive https://github.com/pmahdavi/llama-factory-mfsgd.git cd llama-factory-mfsgd If you have already cloned the repository without the --recursive flag, you can initialize the submodules by running: bash git submodule update --init --recursive

  2. Create and activate a conda environment: bash conda create --name llama-factory-env python=3.10 conda activate llama-factory-env

  3. Install the required dependencies: bash pip install -e ".[torch,metrics]" This will install all the necessary packages, including PyTorch and other core libraries. For more details on optional dependencies, please refer to the original LLaMA-Factory repository.

  4. Install the custom GaLore implementation: bash pip install -e galore-torch/

Usage

The run.py script is the main entry point for running experiments.

To run an experiment: bash python run.py <path_to_config_file> [options]

For example, to fine-tune a model with MoFaSGD on your local machine:

bash python run.py configs/mfsgd/llama3.1_8b_sft_mfsgd_lr.yaml --ngpus 1

Available Arguments

  • config_file: Path to the YAML config file.
  • --ngpus: Number of GPUs for the job (default: 2).
  • --base_output_dir: Base output directory for runs.

Experiment Setup

This repository contains the code and configurations for the LLaMA-3.1 8B instruction-tuning experiments on the Tulu3 dataset, as described in the MoFaSGD paper.

Example Configurations

Example configurations for the experiments can be found in the configs/mfsgd/ directory. These files provide a starting point for running experiments with different optimizers:

  • MoFaSGD: configs/mfsgd/llama3.1_8b_sft_mfsgd_lr.yaml
  • GaLore: configs/mfsgd/llama3.1_8b_sft_galore.yaml
  • LoRA: configs/mfsgd/llama3.1_8b_sft_lora.yaml
  • AdamW: configs/mfsgd/llama3.1_8b_sft_adamw_bf16.yaml

You can modify these files or create new ones to run your own experiments.

Memory Profiling

To facilitate the memory analysis presented in the paper, we have integrated a memory profiling tool based on the PyTorch memory profiler. To use it, add the following parameters to your configuration YAML file:

yaml profile_memory_from_start: true profile_memory_stop_step: 4 profile_memory_max_entries: 10000000

  • profile_memory_from_start: Set to true to begin profiling from the start of the training.
  • profile_memory_stop_step: The profiler will dump a snapshot of the memory usage after this many optimizer steps and then stop.
  • profile_memory_max_entries: The maximum number of memory allocation/deallocation events to record.

Example command: bash python run.py <your_profiling_config.yaml> --ngpus 1

Citation

If you find our work useful, please cite our paper:

bibtex @article{mahdavinia2025mofasgd, title={Low-rank Momentum Factorization for Memory Efficient Training}, author={Mahdavinia, Pouria and Mahdavi, Mehrdad}, journal={Transactions on Machine Learning Research}, year={2025}, url={https://openreview.net/forum?id=W3D3TVo9a3} }

Owner

  • Name: Pouria Mahdavinia
  • Login: pmahdavi
  • Kind: user
  • Location: State College, PA
  • Company: Penn State

Computer Science Ph.D. student

Citation (CITATION.cff)

cff-version: 1.2.0
date-released: 2024-03
message: "If you use this software, please cite it as below."
authors:
- family-names: "Zheng"
  given-names: "Yaowei"
- family-names: "Zhang"
  given-names: "Richong"
- family-names: "Zhang"
  given-names: "Junhao"
- family-names: "Ye"
  given-names: "Yanhan"
- family-names: "Luo"
  given-names: "Zheyan"
- family-names: "Feng"
  given-names: "Zhangchi"
- family-names: "Ma"
  given-names: "Yongqiang"
title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
url: "https://arxiv.org/abs/2403.13372"
preferred-citation:
  type: conference-paper
  conference:
    name: "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)"
  authors:
    - family-names: "Zheng"
      given-names: "Yaowei"
    - family-names: "Zhang"
      given-names: "Richong"
    - family-names: "Zhang"
      given-names: "Junhao"
    - family-names: "Ye"
      given-names: "Yanhan"
    - family-names: "Luo"
      given-names: "Zheyan"
    - family-names: "Feng"
      given-names: "Zhangchi"
    - family-names: "Ma"
      given-names: "Yongqiang"
  title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
  url: "https://arxiv.org/abs/2403.13372"
  year: 2024
  publisher: "Association for Computational Linguistics"
  address: "Bangkok, Thailand"

GitHub Events

Total
  • Push event: 3
  • Public event: 1
Last Year
  • Push event: 3
  • Public event: 1

Dependencies

.github/workflows/docker.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • docker/build-push-action v6 composite
  • docker/login-action v3 composite
  • docker/setup-buildx-action v3 composite
  • jlumbroso/free-disk-space 54081f138730dfa15788a46383842cd2f914a1be composite
.github/workflows/label_issue.yml actions
.github/workflows/publish.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/tests.yml actions
  • actions/cache v4 composite
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
docker/docker-cuda/Dockerfile docker
  • ${BASE_IMAGE} latest build
docker/docker-cuda/docker-compose.yml docker
docker/docker-npu/Dockerfile docker
  • ${BASE_IMAGE} latest build
docker/docker-npu/docker-compose.yml docker
docker/docker-rocm/Dockerfile docker
  • ${BASE_IMAGE} latest build
docker/docker-rocm/docker-compose.yml docker
pyproject.toml pypi
requirements.txt pypi
  • accelerate >=1.3.0,<=1.7.0
  • av *
  • datasets >=2.16.0,<=3.6.0
  • einops *
  • fastapi *
  • fire *
  • gradio >=4.38.0,<=5.31.0
  • hf-transfer *
  • librosa *
  • matplotlib >=3.7.0
  • modelscope >=1.14.0
  • numpy <2.0.0
  • omegaconf *
  • packaging *
  • pandas >=2.0.0
  • peft >=0.14.0,<=0.15.2
  • protobuf *
  • pydantic <=2.10.6
  • pyyaml *
  • scipy *
  • sentencepiece *
  • sse-starlette *
  • tiktoken *
  • tokenizers >=0.19.0,<=0.21.1
  • transformers >=4.49.0,<=4.51.3,
  • transformers >=4.49.0,<=4.52.4,
  • trl >=0.8.6,<=0.9.6
  • tyro <0.9.0
  • uvicorn *
setup.py pypi