https://github.com/chanind/causal-tracer

Causal tracing for language models

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Causal tracing for language models

Basic Info

Host: GitHub
Owner: chanind
License: mit
Language: Python
Default Branch: main
Size: 170 KB

Statistics

Stars: 13
Watchers: 2
Forks: 0
Open Issues: 1
Releases: 0

Created almost 3 years ago · Last pushed about 2 years ago

Metadata Files

Readme Changelog License

Causal Tracer

Causal trace plots for transformer language models.

Demo:

rome_knows_fact

About

This library generates causal trace plots for transformer language models like Llama and GPT2, and should work with any decoder-only model on Huggingface. This library is based on causal tracing code from ROME, and broadly packages and improves on their excellent work. Thank you to these authors! There are some notable differences between the original ROME causal tracing code and this library, such as support for batch processing, automatic noise calculation, more processing options, and a slightly different API.

Causal tracing is a technique to find which activations at which layers are causally important for the model to generate any given output. The way this works is by scrambing subject tokens, then slowly replacing activations in the scrambled computation graph and observe if replacing an activation gets the model closer to its original answer.

For instance, if we prompt a languge model with "Rome is located in the country of", it will output "Italy". If we want to understand how the model generated that answer, we can scramble the tokens for "Rome" by adding gaussian noise so the model now sees gibberish instead, like "@#(* is located in the country of". Of course, after this scrambling, there's no way for the model to output "Italy" since the subject is just noise. However, we can take this corrupted computation graph and start replacing activations in it with the original uncorrupted activations, and see if the model starts outputting "Italy" again. If it does, we know that activation is important to the computation!

For more info on causal tracing, check out the original ROME paper, Locating and Editing Factual Associations in GPT.

Installation

pip install causal-tracer

Basic usage

If you're generating causal traces for a Llama-based model or GPT2, you don't need any further configuration.

```python from transformers import AutoModelForCausalLM, AutoTokenizer from causaltracer import CausalTracer, plothiddenflowheatmap

model = AutoModelForCausalLM.frompretrained("gpt2-medium") tokenizer = AutoTokenizer.frompretrained("gpt2-medium") tracer = CausalTracer(model, tokenizer)

perform causal tracing across hidden layers (residual stream) of the model

hiddenlayerflow = tracer.calculatehiddenflow( "The Space Needle is located in the city of", subject="The Space Needle", )

plot the result

plothiddenflowheatmap(hiddenlayer_flow) ```

You can also generate causal traces of MLP layers or attention layers in the transformer.

```python from transformers import AutoModelForCausalLM, AutoTokenizer from causaltracer import CausalTracer, plothiddenflowheatmap

model = AutoModelForCausalLM.frompretrained("gpt2-medium") tokenizer = AutoTokenizer.frompretrained("gpt2-medium") tracer = CausalTracer(model, tokenizer)

perform causal tracing across MLP layers of the model

mlplayerflow = tracer.calculatehiddenflow( "The Space Needle is located in the city of", subject="The Space Needle", kind="mlp", window=10, ) plothiddenflowheatmap(mlplayer_flow)

perform causal tracing across MLP layers of the model

attnlayerflow = tracer.calculatehiddenflow( "The Space Needle is located in the city of", subject="The Space Needle", kind="attention", window=10, ) plothiddenflowheatmap(attnlayer_flow) ```

When generating MLP or attention causal traces, it's you should typically set a window size. In the ROME paper, this is set to 10, which means the MLP or attention traces are replaced as a group and their impact is averaged to make it easier to see the impact of smaller changes.

Batching and sampling

By default, causal traces will be calculated by scrambling the subject tokens with 10 different noise samples, and will run in batches of size 32. You can improve the quality of the causal trace by increasing the number of samples higher. Also, if you run out of RAM during processing, you can try decreasing the batch size.

python hidden_layer_flow = tracer.calculate_hidden_flow( "The Space Needle is located in the city of", subject="The Space Needle", samples=50, batch_size=8, )

Limiting patching for performance

Running causal tracing can be slow as it requires a lot of passes through the model to generate a trace. You can get a speed-up by only calculating causal traces of certain layers, or only performing patching on subject tokens themselves. The results won't be complete if you do this, but depending on the use-case, that might be fine.

python hidden_layer_flow = tracer.calculate_hidden_flow( "The Space Needle is located in the city of", subject="The Space Needle", start_layer=10, end_layer=15, patch_subject_tokens_only=True, )

Custom layer configs

If you're using a model that isn't automatically detected by the library, you'll need to add a LayerConfig to tell CausalTracer where to findthe embeddings, attention, MLP, and hidden layers within the model. You can do this by creating a LayerConfig object and passing it in when creating a CausalTracer object.

```python from causal_tracer import CausalTracer, LayerConfig

customlayerconfig = LayerConfig( hiddenlayersmatcher="h.{num}", attentionlayersmatcher="h.{num}.attn", mlplayersmatcher="h.{num}.mlp", embeddinglayer="wte", ) tracer = CausalTracer(model, tokenizer, layerconfig=customlayerconfig) ```

Note that hidden_layers_matcher, attention_layers_matcher, and mlp_layers_matcher are template strings, containg {num} in the middle. During processing, {num} will get replaced with the layer number. These strings correspond to the named modules of the transformer. You find all the named modules of a Pytorch model by running model.named_modules().

Using hidden flow results directly

If you want to use the results of the tracer.calculate_hidden_flow() method in downstream tasks instead of just making a plot, the returned HiddenFlow object contains a number of fields which can be further analyzed. The full HiddenFlow dataclass types are below:

python class HiddenFlow: scores: torch.Tensor low_score: float high_score: float input_ids: torch.Tensor input_tokens: list[str] subject_range: tuple[int, int] answer: str kind: LayerKind # one of "hidden", "attention", or "mlp" layer_outputs: OrderedDict[str, torch.Tensor]

Of particular interest, the score attribute contains the full matrix of causal tracing scores. The layer_outputs attribute contains the uncorrupted layer activations for each layer of the type being analyzed.

Contributing

Contributions are welcome! If you submit code, please make sure to add or update tests coverage along with your change. This repo uses Black for code formatting, MyPy for type checking, and Flake8 for linting.

Owner

Name: David Chanin
Login: chanind
Kind: user
Location: London, UK
Company: UCL

Website: https://chanind.github.io
Repositories: 97
Profile: https://github.com/chanind

GitHub Events

Total

Watch event: 4

Last Year

Watch event: 4

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 1
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 1.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

haolun-wu (1)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 54 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 5
Total maintainers: 1

pypi.org: causal-tracer

Homepage: https://github.com/chanind/causal-tracer
Documentation: https://causal-tracer.readthedocs.io/
License: mit
Latest release: 1.1.0
published about 2 years ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 54 Last month

Rankings

Dependent packages count: 7.5%

Average: 38.6%

Dependent repos count: 69.6%

Maintainers (1)

chanind

Last synced: 11 months ago

Dependencies

.github/workflows/ci.yaml actions

actions/checkout v3 composite
actions/setup-python v3 composite
snok/install-poetry v1 composite

poetry.lock pypi

black 23.7.0 develop
click 8.1.6 develop
cmake 3.27.2 develop
colored 1.4.4 develop
exceptiongroup 1.1.3 develop
flake8 6.1.0 develop
iniconfig 2.0.0 develop
jinja2 3.1.2 develop
lit 16.0.6 develop
markupsafe 2.1.3 develop
mccabe 0.7.0 develop
mpmath 1.3.0 develop
mypy 1.5.1 develop
mypy-extensions 1.0.0 develop
networkx 3.1 develop
nvidia-cublas-cu11 11.10.3.66 develop
nvidia-cuda-cupti-cu11 11.7.101 develop
nvidia-cuda-nvrtc-cu11 11.7.99 develop
nvidia-cuda-runtime-cu11 11.7.99 develop
nvidia-cudnn-cu11 8.5.0.96 develop
nvidia-cufft-cu11 10.9.0.58 develop
nvidia-curand-cu11 10.2.10.91 develop
nvidia-cusolver-cu11 11.4.0.1 develop
nvidia-cusparse-cu11 11.7.4.91 develop
nvidia-nccl-cu11 2.14.3 develop
nvidia-nvtx-cu11 11.7.91 develop
pathspec 0.11.2 develop
platformdirs 3.10.0 develop
pluggy 1.2.0 develop
pycodestyle 2.11.0 develop
pyflakes 3.1.0 develop
pytest 7.4.0 develop
sentencepiece 0.1.99 develop
setuptools 68.1.0 develop
sympy 1.12 develop
syrupy 4.1.0 develop
tomli 2.0.1 develop
torch 2.0.0 develop
triton 2.0.0 develop
wheel 0.41.1 develop
certifi 2023.7.22
charset-normalizer 3.2.0
colorama 0.4.6
contourpy 1.1.0
cycler 0.11.0
filelock 3.12.2
fonttools 4.42.0
fsspec 2023.6.0
huggingface-hub 0.16.4
idna 3.4
importlib-resources 6.0.1
kiwisolver 1.4.4
matplotlib 3.7.2
numpy 1.25.2
packaging 23.1
pillow 10.0.0
pyparsing 3.0.9
python-dateutil 2.8.2
pyyaml 6.0.1
regex 2023.8.8
requests 2.31.0
safetensors 0.3.2
six 1.16.0
tokenizers 0.13.3
tqdm 4.66.1
transformers 4.31.0
typing-extensions 4.7.1
urllib3 2.0.4
zipp 3.16.2

pyproject.toml pypi

matplotlib ^3.7.1
python ^3.9
tqdm ^4.66.1
transformers ^4.28.1

https://github.com/chanind/causal-tracer

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Causal Tracer

About

Installation

Basic usage

perform causal tracing across hidden layers (residual stream) of the model

plot the result

perform causal tracing across MLP layers of the model

perform causal tracing across MLP layers of the model

Batching and sampling

Limiting patching for performance

Custom layer configs

Using hidden flow results directly

Contributing

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: causal-tracer

Rankings

Maintainers (1)

Dependencies