curlora

The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.

https://github.com/mnoorfawi/curlora

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 13 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary

Keywords

ai catastrophic-forgetting continual-learning fine-tuning generative-ai llms matrix-decompositions
Last synced: 6 months ago · JSON representation ·

Repository

The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.

Basic Info
Statistics
  • Stars: 43
  • Watchers: 3
  • Forks: 2
  • Open Issues: 0
  • Releases: 4
Topics
ai catastrophic-forgetting continual-learning fine-tuning generative-ai llms matrix-decompositions
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Citation

README.md

CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation

Muhammad Fawi ORCID iD icon

Code DOI: DOI

Research Preprint DOI: arXiv

Overview

This repo contains the code for the CURLoRA research paper, a novel approach to fine-tuning large language models (LLMs) that leverages CUR matrix decomposition in the context of Low-Rank Adaptation (LoRA). Our method addresses two critical challenges in LLM fine-tuning: mitigating catastrophic forgetting during continual learning and reducing the number of trainable parameters. We propose a unique modification to the CUR decomposition process to enable a more efficient and stable way to adapt LLMs to new tasks without compromising any existing knowledge. We demonstrate through experiments on multiple datasets that CURLoRA outperforms standard LoRA in mitigating catastrophic forgetting. It maintains model stability and performance across tasks while significantly reducing the number of trainable parameters. Our results show that CURLoRA achieves superior accuracy and perplexity scores compared to LoRA, particularly in scenarios with limited data.

Contents

  • CURLoRA.pdf: The research paper detailing the CURLoRA approach.
  • code/: Directory containing the implementation of CURLoRA and the experiments.
    • code/curlora.py: Containing CURLoRA classes.
    • code/utils.py: Helper functions.
    • code/lora.py: LoRA classes.
    • code/curlora_experiment.ipynb: CURLoRA experiment with Mistral 7B (Fine-tuning on MRPC, SST-2 and Sentiment140).
    • code/curlora_experiment-gpt.ipynb: CURLoRA experiment with GPT2-Large (Fine-tuning on MRPC, SST-2 and Sentiment140).
    • code/squad_gpt-curlora.ipynb: Fine-Tuning GPT2-Large for Q&A with CURLoRA and SFTTrainer on SQuAD dataset.
Same notebooks are available for LoRA.

Quick Start

First we install the requirements bash pip3 install -r code/requirements.txt

All CURLoRA helper functions and classes are defined in code/curlora.py and code/utils.py.

Load the model and apply CURLoRA ```python from transformers import AutoTokenizer, AutoModelForCausalLM from utils import *

modelname = "gpt2-large" model = AutoModelForCausalLM.frompretrained(model_name) model.to("cuda") # this will make all existing layers in CUDA

turning off grad for all layers

for param in model.parameters(): param.requires_grad = False

replace original Q,K,V layers with CURLoRA (GPT2-Large specific)

refer to utils.py for a more general way

for name, module in model.namedmodules(): if isinstance(module, type(model.transformer.h[0].attn)): # rank = 24, alpha = 1 module.cattn = LinearWithCURLoRA(module.c_attn, 24, 1)

now look at how many CURLoRA parameters to be trained

totalparams = sum(p.numel() for p in model.parameters() if p.requiresgrad) print(f"Total trainable parameters after: {total_params:,}")

making sure CURLoRA layers are on CUDA as well

model.to("cuda") ``` Now you have the model with the CURLoRA layers applied to Attention layers (Key, Value and Query) which you can use for either fine-tuning or inference normally.

You may need to know how the layer is called so that you can replace it correctly. For instance, Q, K, V in Mistral can be found via: python for name, module in model.named_children(): if any(l in name for l in ["q_proj", "v_proj", "k_proj"]): setattr(model, name, LinearWithCURLoRA(module, rank, alpha))

Please Note: 1. Some variables and values are hardcoded either in code/utils.py or code/curlora.py like the layers to apply to, rank, alpha, device etc. 2. Ongoing work (contributions are welcome) on supporting quantization (QCURLoRA) i.e. so far you load the whole model not quantized. 3. In code/ directory there are notebooks to run the research paper experiments 4. You may need to use a slightly higher learning rate than with LoRA to get better accuracy. Higher learning rate won't cause overfitting due to the "implicit regularization" explained in the paper.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you find CURLoRA research or code helpful, please consider citing them.

Code Citation

  1. Bibtext bibtex @software{Fawi_CURLoRA_Leveraging_CUR_2024, author = {Fawi, Muhammad}, title = {{CURLoRA: Leveraging CUR Matrix Decomposition for Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation}}, month = jul, year = 2024, publisher = {Zenodo}, version = {v4.0.0}, doi = {10.5281/zenodo.12729738}, url = {https://zenodo.org/doi/10.5281/zenodo.12729738} }

  2. APA text Fawi, M. (2024). CURLoRA: Leveraging CUR Matrix Decomposition for Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation (v4.0.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.12729738

Research Paper Citation

  1. Bibtext bibtex @misc{fawi_2024_12730055, author = {Fawi, Muhammad}, title = {{CURLoRA: Leveraging CUR Matrix Decomposition for Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation}}, month = jul, year = 2024, publisher = {Zenodo}, doi = {10.5281/zenodo.12730055}, url = {https://doi.org/10.5281/zenodo.12730055} }

  2. APA text Fawi, M. (2024). CURLoRA: Leveraging CUR Matrix Decomposition for Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation. Zenodo. https://doi.org/10.5281/zenodo.12730055

Contribution and ideas will be much appreciated

Owner

  • Name: Muhammad Fawi
  • Login: MNoorFawi
  • Kind: user
  • Location: Dubai, UAE
  • Company: spiderSilk

Writing code is my hobby, writing fast code is my passion.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you're implementing CURLoRA, please cite it as below."
authors:
- family-names: "Fawi"
  given-names: "Muhammad"
  orcid: "https://orcid.org/0009-0007-7210-0528"
title: "CURLoRA: Leveraging CUR Matrix Decomposition for Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation"
version: v4.0.0
doi: 10.5281/zenodo.12729738
date-released: 2024-07-12
url: "https://github.com/mnoorfawi/curlora"

GitHub Events

Total
  • Issues event: 4
  • Watch event: 11
  • Issue comment event: 6
  • Fork event: 2
Last Year
  • Issues event: 4
  • Watch event: 11
  • Issue comment event: 6
  • Fork event: 2

Dependencies

code/requirements.txt pypi
  • accelerate *
  • datasets *
  • evaluate *
  • huggingface_hub *
  • numpy *
  • torch *
  • tqdm *
  • transformers *