belief-localization

This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Can Be Injected in Language Models."

https://github.com/google/belief-localization

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary

Last synced: 7 months ago · JSON representation ·

Repository

This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Can Be Injected in Language Models."

Basic Info

Host: GitHub
Owner: google
License: apache-2.0
Default Branch: main
Homepage:
Size: 521 KB

Statistics

Stars: 61
Watchers: 2
Forks: 7
Open Issues: 3
Releases: 0

Created over 3 years ago · Last pushed almost 3 years ago

Metadata Files

Readme Contributing License Citation

Does Localization Inform Editing?

This repository includes code for the paper Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models. It is built on top of code from the MEMIT repository here.

Installation
Causal Tracing
Model Editing
Data Analysis

Installation

For needed packages, first create a virtual environment via python3 -m venv env and activate it (source env/bin/activate).

Then, install an appropriate version of torch for your system. Next, install the remaining requirements: cd third_party pip install -r requirements.txt python -c "import nltk; nltk.download('punkt')"

Causal Tracing

We gather causal tracing results from the first 2000 points in the CounterFact dataset, filtering to 652 correctly completed prompts when using GPT-J. The window_sizes argument controls which tracing window sizes to use. To reproduce all GPT-J results in the paper, run tracing experiments with for window sizes 10, 5, 3, and 1. This can be done with the following steps.

First, set the global variables in experiments/tracing.py (i.e. CODE_DIR, BASE_DIR, and MODEL_DIR) to desired values. Then, run:

python -m experiments.tracing \ -n 2000 \ --ds_name counterfact \ --model_name EleutherAI/gpt-j-6B \ --run 1 \ --window_sizes "10 5 3 1"

To get results for ZSRE, run:

python -m experiments.tracing \ -n 2000 \ --ds_name zsre \ --model_name EleutherAI/gpt-j-6B \ --run 1 \ --window_sizes "5"

python -m experiments.tracing \ -n 2000 \ --ds_name zsre \ --model_name gpt2-xl \ --run 1 \ --window_sizes "5" \ --gpu 1

Model Editing Evaluation

We check the relationship between causal tracing localization and editing performance using several editing methods applied to five different variants of the basic model editing problem. The editing methods are: - Constrained finetuning with Adam at one layer - Constrained finetuning with Adam at five adjacent layers - ROME (which edits one layer) - MEMIT (which edits five layers)

The editing problems include the original model editing problem specified by the CounterFact dataset (changing the prediction for a given input), as well as a few variants mentioned below.

To run the default Error Injection editing problem using ROME with GPT-J, first set the global variabes in experiments/evaluate.py (i.e. CODE_DIR, BASE_DIR, and MODEL_DIR) to desired values. Then, run:

python3 -m experiments.evaluate \ -n 2000 \ --alg_name ROME \ --window_sizes "1" \ --ds_name cf \ --model_name EleutherAI/gpt-j-6B \ --run 1 \ --edit_layer -2 \ --correctness_filter 1 \ --norm_constraint 1e-4 \ --kl_factor 1 \ --fact_token subject_last

To run an experiment with ZSRE, use:

python3 -m experiments.evaluate \ -n 2000 \ --alg_name ROME \ --window_sizes "1" \ --ds_name zsre \ --model_name EleutherAI/gpt-j-6B \ --run 1 \ --edit_layer 5 \ --correctness_filter 0 \ --norm_constraint 1e-4 \ --kl_factor 1 \ --fact_token subject_last

Add the following flags for each variation of the experiments:

Error Injection: no flag
Tracing Reversal: --tracing_reversal
Fact Erasure: --fact_erasure
Fact Amplification: --fact_amplification
Fact Forcing: --fact_forcing

For example, to run with constrained finetuning across 5 layers in order to do Fact Erasure, run:

python3 -m experiments.evaluate \ -n 2000 \ --alg_name FT \ --window_sizes "5" \ --ds_name cf \ --model_name EleutherAI/gpt-j-6B \ --run 1 \ --edit_layer -2 \ --correctness_filter 1 \ --norm_constraint 1e-4 \ --kl_factor .0625 \ --fact_erasure

Data Analysis

Data analysis for this work is done in R via the data_analysis.ipynb file. All plots and regression analyses in the paper can be reproduced via this file.

Disclaimer

This is not an officially supported Google product.

Owner

Name: Google
Login: google
Kind: organization
Email: opensource@google.com
Location: United States of America

Website: https://opensource.google/
Twitter: GoogleOSS
Repositories: 2,773
Profile: https://github.com/google

Google ❤️ Open Source

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
preferred-citation:
  type: article
  authors:
  - family-names: "Hase"
    given-names: "Peter"
  - family-names: "Bansal"
    given-names: "Mohit"
  - family-names: "Kim"
    given-names: "Been"
  - family-names: "Ghandeharioun"
    given-names: "Asma"
  journal: "arXiv preprint"
  title: Locate Then Edit? Surprising Differences in Where Knowledge Is Stored vs. Can Be Manipulated in Language Models
  year: 2023

GitHub Events

Total

Watch event: 6
Fork event: 1

Last Year

Watch event: 6
Fork event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 405
Total Committers: 5
Avg Commits per committer: 81.0
Development Distribution Score (DDS): 0.064

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
peterhase	p**e@g**m	379
Asma	a**n@g**m	16
Peter Hase	p**e@g**m	8
Ikko Eltociear Ashimine	e**r@g**m	1
Michael	m**g@g**m	1

Committer Domains (Top 20 + Academic)

google.com: 2

Issues and Pull Requests

Last synced: 7 months ago

All Time

Total issues: 5
Total pull requests: 2
Average time to close issues: 12 days
Average time to close pull requests: 17 days
Total issue authors: 5
Total pull request authors: 2
Average comments per issue: 0.4
Average comments per pull request: 0.5
Merged pull requests: 2
Bot issues: 1
Bot pull requests: 0

Past Year

Issues: 3
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 3
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

WLYangICT (1)
neo-9981 (1)
google-allstar-prod[bot] (1)
michaelwilliamtang (1)
josejhlee (1)

Pull Request Authors

eltociear (1)
michaelwilliamtang (1)

Top Labels

Issue Labels

allstar (1)

Pull Request Labels

Dependencies

third_party/baselines/kn/knowledge_neurons/requirements.txt pypi

einops *
numpy *
seaborn *
torch ==1.13.1
transformers *

third_party/baselines/kn/knowledge_neurons/setup.py pypi

transformers *

third_party/baselines/mend/requirements.txt pypi

allennlp *
click ==7.1.2
datasets *
hydra-core *
jsonlines *
numpy *
spacy *
torch *
wandb *

third_party/requirements.txt pypi

accelerate ==0.10.0
allennlp ==2.9.2
click ==8.0.3
datasets ==1.18.3
einops ==0.4.0
higher ==0.2.1
hydra-core ==1.1.1
matplotlib *
python-dotenv ==0.19.2
tensorboard *
tokenizers ==0.11.2
transformers ==4.21.0
unidecode *

belief-localization

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Does Localization Inform Editing?

Table of Contents

Installation

Causal Tracing

Model Editing Evaluation

Data Analysis

Disclaimer

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies