Science Score: 41.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.9%) to scientific vocabulary
Keywords
Repository
Multi-Layer Sparse Autoencoders (ICLR 2025)
Basic Info
- Host: GitHub
- Owner: tim-lawson
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2409.04185
- Size: 642 KB
Statistics
- Stars: 17
- Watchers: 2
- Forks: 0
- Open Issues: 6
- Releases: 0
Topics
Metadata Files
README.md
Multi-Layer Sparse Autoencoders (MLSAE)
[!NOTE] This repository accompanies the preprint Residual Stream Analysis with Multi-Layer SAEs (https://arxiv.org/abs/2409.04185). See References for related work.
Pretrained MLSAEs
We define two types of model: plain PyTorch MLSAE modules, which are relatively small; and PyTorch Lightning MLSAETransformer modules, which include the underlying transformer. HuggingFace collections for both are here:
We assume that pretrained MLSAEs have repo_ids with this naming convention:
tim-lawson/mlsae-pythia-70m-deduped-x{expansion_factor}-k{k}tim-lawson/mlsae-pythia-70m-deduped-x{expansion_factor}-k{k}-tfm
The Weights & Biases project for the paper is here.
Installation
Install Python dependencies with Poetry:
bash
poetry env use 3.12
poetry install
Install Python dependencies with pip:
bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Install Node.js dependencies:
bash
cd app
npm install
Training
Train a single MLSAE:
bash
python train.py --help
python train.py --model_name EleutherAI/pythia-70m-deduped --expansion_factor 64 -k 32
Analysis
Test a single pretrained MLSAE:
[!WARNING] We assume that the test split of
monology/pile-uncopyrightedis already downloaded and stored indata/test.jsonl.zst.
bash
python test.py --help
python test.py --model_name EleutherAI/pythia-70m-deduped --expansion_factor 64 -k 32
Compute the distributions of latent activations over layers for a single pretrained MLSAE (HuggingFace datasets):
bash
python -m mlsae.analysis.dists --help
python -m mlsae.analysis.dists --repo_id tim-lawson/mlsae-pythia-70m-deduped-x64-k32-tfm --max_tokens 100_000_000
Compute the maximally activating examples for each combination of latent and layer for a single pretrained MLSAE (HuggingFace datasets):
bash
python -m mlsae.analysis.examples --help
python -m mlsae.analysis.examples --repo_id tim-lawson/mlsae-pythia-70m-deduped-x64-k32-tfm --max_tokens 1_000_000
Interactive visualizations
Run the interactive web application for a single pretrained MLSAE:
```bash python -m mlsae.api --help python -m mlsae.api --repo_id tim-lawson/mlsae-pythia-70m-deduped-x64-k32-tfm
cd app npm run dev ```
Navigate to http://localhost:3000, enter a prompt, and click 'Submit'.
Alternatively, navigate to http://localhost:3000/prompt/foobar.
Figures
Compute the mean cosine similarities between residual stream activation vectors at adjacent layers of a single pretrained transformer:
bash
python figures/resid_cos_sim.py --help
python figures/resid_cos_sim.py --model_name EleutherAI/pythia-70m-deduped
Save heatmaps of the distributions of latent activations over layers for multiple pretrained MLSAEs:
bash
python figures/dists_heatmaps.py --help
python figures/dists_heatmaps.py --expansion_factor 32 64 128 -k 16 32 64
Save a CSV of the mean standard deviations of the distributions of latent activations over layers for multiple pretrained MLSAEs:
bash
python figures/dists_layer_std.py --help
python figures/dists_layer_std.py --expansion_factor 32 64 128 -k 16 32 64
Save heatmaps of the maximum latent activations for a given prompt and multiple pretrained MLSAEs:
bash
python figures/prompt_heatmaps.py --help
python figures/prompt_heatmaps.py --expansion_factor 32 64 128 -k 16 32 64
Save a CSV of the Mean Max Cosine Similarity (MMCS) for multiple pretrained MLSAEs:
bash
python figures/mmcs.py --help
python figures/mmcs.py --expansion_factor 32 64 128 -k 16 32 64
References
Code
- https://github.com/openai/sparse_autoencoder
- https://github.com/EleutherAI/sae
- https://github.com/ai-safety-foundation/sparse_autoencoder
- https://github.com/callummcdougall/sae_vis
Papers
- Gao et al. [2024] https://cdn.openai.com/papers/sparse-autoencoders.pdf
- Bricken et al. [2023] https://transformer-circuits.pub/2023/monosemantic-features/index.html
Owner
- Name: Tim Lawson
- Login: tim-lawson
- Kind: user
- Location: Bristol, UK
- Company: University of Bristol
- Website: timlawson.dev
- Twitter: tslwn
- Repositories: 1
- Profile: https://github.com/tim-lawson
AI PhD student at the University of Bristol. Previously Physics at Cambridge and software at Graphcore. Language, cognition, etc.
Citation (citation.bib)
@misc{lawson_residual_2024,
title = {Residual {{ "{{" }}Stream Analysis{{ "}}" }} with {{ "{{" }}Multi-Layer SAEs{{ "}}" }}},
author = {Lawson, Tim and Farnik, Lucy and Houghton, Conor and Aitchison, Laurence},
year = {2024},
month = oct,
number = {arXiv:2409.04185},
eprint = {2409.04185},
primaryclass = {cs},
publisher = {arXiv},
doi = {10.48550/arXiv.2409.04185},
urldate = {2024-10-08},
archiveprefix = {arXiv}
}
GitHub Events
Total
- Issues event: 6
- Watch event: 14
- Delete event: 1
- Push event: 20
- Pull request event: 4
- Create event: 1
Last Year
- Issues event: 6
- Watch event: 14
- Delete event: 1
- Push event: 20
- Pull request event: 4
- Create event: 1
Dependencies
- 448 dependencies
- @types/node ^22.5.1 development
- eslint-import-resolver-typescript ^3.6.3 development
- eslint-plugin-import ^2.29.1 development
- knip ^5.29.1 development
- prettier ^3.3.3 development
- typescript ^5.5.4 development
- @hookform/resolvers ^3.9.0
- @radix-ui/react-label ^2.1.0
- @radix-ui/react-slot ^1.1.0
- @radix-ui/react-tabs ^1.1.0
- @types/react ^18.3.3
- @types/react-dom ^18.3.0
- autoprefixer ^10.4.19
- class-variance-authority ^0.7.0
- clsx ^2.1.1
- d3-scale ^4.0.2
- eslint ^8.57.0
- eslint-config-next ^14.2.3
- eslint-config-prettier ^9.1.0
- next ^14.2.3
- postcss ^8.4.38
- react ^18.3.1
- react-dom ^18.3.1
- react-hook-form ^7.52.1
- recharts ^2.12.7
- swr ^2.2.5
- tailwind-merge ^2.3.0
- tailwindcss ^3.4.4
- usehooks-ts ^3.1.0
- zod ^3.23.8
- 122 dependencies
- ruff ^0.5.2 develop
- datasets ^2.20.0
- einops ^0.8.0
- fastapi ^0.111.1
- huggingface-hub ^0.23.5
- jaxtyping ^0.2.33
- lightning ^2.3.3
- loguru ^0.7.2
- matplotlib ^3.9.1
- orjson ^3.10.6
- pydantic ^2.8.2
- pytest ^8.3.2
- python ^3.12
- simple-parsing ^0.1.5
- torch ^2.4.0
- transformers ^4.42.4
- triton ^3.0.0
- uvicorn ^0.30.3
- wandb ^0.17.4
- zstandard ^0.23.0
- datasets >=2.19.0
- einops *
- fastapi *
- huggingface-hub *
- jaxtyping *
- lightning *
- loguru *
- matplotlib *
- orjson *
- pydantic *
- pytest *
- simple-parsing *
- torch *
- transformers *
- triton *
- uvicorn *
- wandb *
- zstandard *