https://github.com/chanind/sparse-but-wrong-paper

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.9%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: chanind
License: mit
Language: Python
Default Branch: main
Homepage: https://arxiv.org/abs/2508.16560
Size: 458 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 10 months ago · Last pushed 9 months ago

Metadata Files

Readme License Citation

Sparse but Wrong

This repo contains the code for the paper Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders.

Setup

This project uses uv for package management. You can install dependencies with:

bash uv sync

Or, you can just use pip and run:

bash pip install -e .

Running the experiments

The toy model experiments are all in the notebooks directory, with supporting files in the toy_model directory. We also provide a demo notebook for how to reproduce our Gemma-2-2b experiments in notebooks/train_and_eval_llm_sae.ipynb.

We extend the SAELens BatchTopKSAE class in enhanced_batch_topk_sae.py to keep the decoder normalized and support our experiments where we change the L0 of the SAE during training.

development

We use pytest for testing. You can run the tests with:

bash uv run pytest

We also use ruff for linting and pyright for type checking. You can run the linting and type checking with:

bash uv run ruff check . uv run pyright

Citation

@article{chanin2025sparse, title={Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders}, author={David Chanin and Adrià Garriga-Alonso}, year={2025}, journal={arXiv preprint arXiv:2508.16560} }

Owner

Name: David Chanin
Login: chanind
Kind: user
Location: London, UK
Company: UCL

Website: https://chanind.github.io
Repositories: 97
Profile: https://github.com/chanind

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this research, please cite it as below."
authors:
  - family-names: "Chanin"
    given-names: "David"
  - family-names: "Garriga-Alonso"
    given-names: "Adrià"
title: "Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders"
type: article
identifiers:
  - type: arxiv
    value: "2505.11756"
    description: arXiv preprint
url: "https://arxiv.org/abs/2508.16560"
keywords:
  - "sparse autoencoders"
  - "SAEs"
  - "interpretability"
  - "NLP"
preferred-citation:
  type: article
  authors:
    - family-names: "Chanin"
      given-names: "David"
    - family-names: "Garriga-Alonso"
      given-names: "Adrià"
  title: "Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders"
  year: 2025
  url: "https://arxiv.org/abs/2508.16560"
  identifiers:
    - type: arxiv
      value: "2508.16560"
  archive:
    - name: "arXiv"
      location: "https://arxiv.org/"
  collection-type: "proceedings"

GitHub Events

Total

Push event: 2

Last Year

Push event: 2

Dependencies

.github/workflows/ci.yaml actions

actions/checkout v4 composite
actions/setup-python v5 composite
astral-sh/setup-uv v4 composite

pyproject.toml pypi

matplotlib >=3.10.5
nbformat >=5.10.4
plotly >=6.3.0
sae-lens >=6.6.3
sae-probes >=0.1.3
seaborn >=0.13.2
tueplots >=0.2.1

uv.lock pypi

163 dependencies

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science