longmem
Official implementation of our NeurIPS 2023 paper "Augmenting Language Models with Long-Term Memory".
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary
Keywords
Repository
Official implementation of our NeurIPS 2023 paper "Augmenting Language Models with Long-Term Memory".
Basic Info
- Host: GitHub
- Owner: Victorwz
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2306.07174
- Size: 15.4 MB
Statistics
- Stars: 789
- Watchers: 24
- Forks: 70
- Open Issues: 12
- Releases: 0
Topics
Metadata Files
README.md
LongMem
Official implementation of our paper "Augmenting Language Models with Long-Term Memory".
Please cite our paper if you find this repository interesting or helpful:
bibtex
@article{LongMem,
title={Augmenting Language Models with Long-Term Memory},
author={Wang, Weizhi and Dong, Li and Cheng, Hao and Liu, Xiaodong and Yan, Xifeng and Gao, Jianfeng and Wei, Furu},
journal={arXiv preprint arXiv:2306.07174},
year={2023}
}
Environment Setup
torch: Please follow torch official installation guide. We recommend torch>=1.8.0. Please select the torch-gpu version which is consistent with your cuda driver version.
Faiss-GPU: For Nvidia V100 GPUs, simply install via
pip install faiss-gpu. For Nvidia A100, A6000 GPUs, please runconda install faiss-gpu cudatoolkit=11.0 -c pytorch. The A100 GPU is not officially supported by faiss-gpu, sometimes it will lead to errors, you can refer to this git issue of faiss for help.fairseq:
pip install --editable ./fairseqThen the revisedfairseqand dependency packages will be installed. We strongly recommend you to use python 3.8 for stability.other packages:
pip install -r requirements.txt
Project Structure
Pre-trained LLM Class (L24, E1024, Alibi positional embedding):
fairseq/fairseq/models/newgpt.pyTransformer Decoder with SideNetwork (L12, E1024):
fairseq/fairseq/models/sidenet/transformer_decoder_sidenet.pyTransformer Language Model with SideNetwork Class:
fairseq/fairseq/models/transformer_lm_sidenet.pyMemory Bank and Retrieval:
fairseq/fairseq/modules/dynamic_memory_with_chunk.pyJoint Attention for Memory Fusion:
fairseq/fairseq/modules/joint_multihead_attention_sum.py
Memory-Augmented Adaptation Training
Data collection and Preprocessing
Please download the Pile from official release. Each sub-dataset in the Pile is organized as various jsonline splits. You can refer to preprocess/filter_shard_tnlg.py fpr how we sample the training set and binalize following standard fairseq preprocessing process.
Memory-Augmented Adaptation Training:
bash train_scripts/train_longmem.sh
Evaluation
Please firstly download the checkpoints for pre-trained GPT2-medium model and LongMem model to checkpoints/.
Memory-Augmented In-Context Learning
```
Evaluate gpt2 baseline
python evalscripts/evallongmemicl.py --path /path/to/gpt2pretrained_model
Evaluate LongMem model
python evalscripts/evallongmemicl.py --path /path/to/longmemmodel --pretrained-model-path /path/to/gpt2pretrainedmodel ```
Credits
LongMem is developed based on fairseq. Thanks to the team from eleuther.ai who constructed the largest high-quality corpora, the Pile.
Owner
- Name: Weizhi Wang
- Login: Victorwz
- Kind: user
- Company: University of California, Santa Barbara
- Repositories: 12
- Profile: https://github.com/Victorwz
CS Ph.D. Student @UCSB
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Wang"
given-names: "Weizhi"
- family-names: "Dong"
given-names: "Li"
- family-names: "Cheng"
given-names: "Hao"
- family-names: "Liu"
given-names: "Xiaodong"
- family-names: "Yan"
given-names: "Xifeng"
- family-names: "Gao"
given-names: "Jianfeng"
- family-names: "Wei"
given-names: "Furu"
title: "Augmenting Language Models with Long-Term Memory"
doi: "https://doi.org/10.48550/arXiv.2306.07174"
repository-code: "https://github.com/Victorwz/LongMem"
license: "Apache-2.0"
date-released: 2023
abstract: "Official implementation of our paper 'Augmenting Language Models with Long-Term Memory'."
GitHub Events
Total
- Watch event: 44
- Fork event: 5
Last Year
- Watch event: 44
- Fork event: 5