rome

Locating and editing factual associations in GPT (NeurIPS 2022)

https://github.com/kmeng01/rome

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.7%) to scientific vocabulary

Keywords

gpt interpretability pytorch transformers
Last synced: 6 months ago · JSON representation ·

Repository

Locating and editing factual associations in GPT (NeurIPS 2022)

Basic Info
  • Host: GitHub
  • Owner: kmeng01
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage: https://rome.baulab.info
  • Size: 22.1 MB
Statistics
  • Stars: 620
  • Watchers: 7
  • Forks: 138
  • Open Issues: 24
  • Releases: 0
Topics
gpt interpretability pytorch transformers
Created about 4 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

Rank-One Model Editing (ROME)

This repository provides an implementation of Rank-One Model Editing (ROME) on auto-regressive transformers (GPU-only). We currently support OpenAI's GPT-2 XL (1.5B) and EleutherAI's GPT-J (6B). The release of a 20B GPT-like model from EleutherAI is expected soon; we hope to support it ASAP.

Feel free to open an issue if you find any problems; we are actively developing this repository and will monitor tickets closely.

Colab ROME Demo

causal tracing GIF

Table of Contents

  1. Installation
  2. Causal Tracing
  3. Rank-One Model Editing (ROME)
  4. CounterFact
  5. Evaluation
  6. How to Cite

Installation

We recommend conda for managing Python, CUDA, and PyTorch-related dependencies, and pip for everything else. To get started, simply install conda and run: bash ./scripts/setup_conda.sh

Causal Tracing

notebooks/causal_trace.ipynb demonstrates Causal Tracing, which can be modified to apply tracing to the processing of any statement.

causal tracing GIF

Rank-One Model Editing (ROME)

notebooks/rome.ipynb demonstrates ROME. The API is simple; one simply has to specify a requested rewrite of the following form:

python request = { "prompt": "{} plays the sport of", "subject": "LeBron James", "target_new": { "str": "football" } }

Several similar examples are included in the notebook.

CounterFact

Details coming soon!

Evaluation

See baselines/ for a description of the available baselines.

Running the Full Evaluation Suite

experiments/evaluate.py can be used to evaluate any method in baselines/. To get started (e.g. using ROME on GPT-2 XL), run: bash python3 -m experiments.evaluate \ --alg_name=ROME \ --model_name=gpt2-xl \ --hparams_fname=gpt2-xl.json

Results from each run are stored at results/<method_name>/run_<run_id> in a specific format: bash results/ |__ ROME/ |__ run_<run_id>/ |__ params.json |__ case_0.json |__ case_1.json |__ ... |__ case_10000.json

To summarize the results, you can use experiments/summarize.py: bash python3 -m experiments.summarize --dir_name=ROME --runs=run_<run_id>

Running python3 -m experiments.evaluate -h or python3 -m experiments.summarize -h provides details about command-line flags.

Integrating New Editing Methods

Say you have a new method X and want to benchmark it on CounterFact. To integrate X with our runner: - Subclass HyperParams into XHyperParams and specify all hyperparameter fields. See ROMEHyperParameters for an example implementation. - Create a hyperparameters file at hparams/X/gpt2-xl.json and specify some default values. See hparams/ROME/gpt2-xl.json for an example. - Define a function apply_X_to_model which accepts several parameters and returns (i) the rewritten model and (ii) the original weight values for parameters that were edited (in the dictionary format {weight_name: original_weight_value}). See rome/rome_main.py for an example. - Add X to ALG_DICT in experiments/evaluate.py by inserting the line "X": (XHyperParams, apply_X_to_model).

Finally, run the main scripts: ```bash python3 -m experiments.evaluate \ --algname=X \ --modelname=gpt2-xl \ --hparams_fname=gpt2-xl.json

python3 -m experiments.summarize --dirname=X --runs=run ```

Note on Cross-Platform Compatibility

We currently only support methods that edit autoregressive HuggingFace models using the PyTorch backend. We are working on a set of general-purpose methods (usable on e.g. TensorFlow and without HuggingFace) that will be released soon.

How to Cite

bibtex @article{meng2022locating, title={Locating and Editing Factual Associations in {GPT}}, author={Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov}, journal={Advances in Neural Information Processing Systems}, volume={35}, year={2022} }

Owner

  • Name: Kevin Meng
  • Login: kmeng01
  • Kind: user
  • Location: boston
  • Company: @mit, @csail

@MIT. interested in language models, compbio, and robotics.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
preferred-citation:
  type: article
  authors:
  - family-names: "Meng"
    given-names: "Kevin"
  - family-names: "Bau"
    given-names: "David"
  - family-names: "Andonian"
    given-names: "Alex"
  - family-names: "Belinkov"
    given-names: "Yonatan"
  journal: "arXiv preprint arXiv:2202.05262"
  title: "Locating and Editing Factual Associations in GPT"
  year: 2022

GitHub Events

Total
  • Issues event: 1
  • Watch event: 90
  • Issue comment event: 5
  • Fork event: 32
Last Year
  • Issues event: 1
  • Watch event: 90
  • Issue comment event: 5
  • Fork event: 32

Dependencies

baselines/kn/knowledge_neurons/requirements.txt pypi
  • einops *
  • numpy *
  • seaborn *
  • torch *
  • transformers *
baselines/kn/knowledge_neurons/setup.py pypi
  • transformers *
baselines/mend/requirements.txt pypi
  • allennlp *
  • click ==7.1.2
  • datasets *
  • hydra-core *
  • jsonlines *
  • numpy *
  • spacy *
  • torch *
  • wandb *