https://github.com/chanind/sparse-but-wrong-paper
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: chanind
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2508.16560
- Size: 458 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Sparse but Wrong
This repo contains the code for the paper Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders.
Setup
This project uses uv for package management. You can install dependencies with:
bash
uv sync
Or, you can just use pip and run:
bash
pip install -e .
Running the experiments
The toy model experiments are all in the notebooks directory, with supporting files in the toy_model directory. We also provide a demo notebook for how to reproduce our Gemma-2-2b experiments in notebooks/train_and_eval_llm_sae.ipynb.
We extend the SAELens BatchTopKSAE class in enhanced_batch_topk_sae.py to keep the decoder normalized and support our experiments where we change the L0 of the SAE during training.
development
We use pytest for testing. You can run the tests with:
bash
uv run pytest
We also use ruff for linting and pyright for type checking. You can run the linting and type checking with:
bash
uv run ruff check .
uv run pyright
Citation
@article{chanin2025sparse,
title={Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders},
author={David Chanin and Adrià Garriga-Alonso},
year={2025},
journal={arXiv preprint arXiv:2508.16560}
}
Owner
- Name: David Chanin
- Login: chanind
- Kind: user
- Location: London, UK
- Company: UCL
- Website: https://chanind.github.io
- Repositories: 97
- Profile: https://github.com/chanind
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this research, please cite it as below."
authors:
- family-names: "Chanin"
given-names: "David"
- family-names: "Garriga-Alonso"
given-names: "Adrià"
title: "Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders"
type: article
identifiers:
- type: arxiv
value: "2505.11756"
description: arXiv preprint
url: "https://arxiv.org/abs/2508.16560"
keywords:
- "sparse autoencoders"
- "SAEs"
- "interpretability"
- "NLP"
preferred-citation:
type: article
authors:
- family-names: "Chanin"
given-names: "David"
- family-names: "Garriga-Alonso"
given-names: "Adrià"
title: "Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders"
year: 2025
url: "https://arxiv.org/abs/2508.16560"
identifiers:
- type: arxiv
value: "2508.16560"
archive:
- name: "arXiv"
location: "https://arxiv.org/"
collection-type: "proceedings"
GitHub Events
Total
- Push event: 2
Last Year
- Push event: 2
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- astral-sh/setup-uv v4 composite
- matplotlib >=3.10.5
- nbformat >=5.10.4
- plotly >=6.3.0
- sae-lens >=6.6.3
- sae-probes >=0.1.3
- seaborn >=0.13.2
- tueplots >=0.2.1
- 163 dependencies