https://github.com/chanind/sparse-but-wrong-paper

https://github.com/chanind/sparse-but-wrong-paper

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 7 months ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

Sparse but Wrong

build

This repo contains the code for the paper Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders.

Setup

This project uses uv for package management. You can install dependencies with:

bash uv sync

Or, you can just use pip and run:

bash pip install -e .

Running the experiments

The toy model experiments are all in the notebooks directory, with supporting files in the toy_model directory. We also provide a demo notebook for how to reproduce our Gemma-2-2b experiments in notebooks/train_and_eval_llm_sae.ipynb.

We extend the SAELens BatchTopKSAE class in enhanced_batch_topk_sae.py to keep the decoder normalized and support our experiments where we change the L0 of the SAE during training.

development

We use pytest for testing. You can run the tests with:

bash uv run pytest

We also use ruff for linting and pyright for type checking. You can run the linting and type checking with:

bash uv run ruff check . uv run pyright

Citation

@article{chanin2025sparse, title={Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders}, author={David Chanin and Adrià Garriga-Alonso}, year={2025}, journal={arXiv preprint arXiv:2508.16560} }

Owner

  • Name: David Chanin
  • Login: chanind
  • Kind: user
  • Location: London, UK
  • Company: UCL

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this research, please cite it as below."
authors:
  - family-names: "Chanin"
    given-names: "David"
  - family-names: "Garriga-Alonso"
    given-names: "Adrià"
title: "Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders"
type: article
identifiers:
  - type: arxiv
    value: "2505.11756"
    description: arXiv preprint
url: "https://arxiv.org/abs/2508.16560"
keywords:
  - "sparse autoencoders"
  - "SAEs"
  - "interpretability"
  - "NLP"
preferred-citation:
  type: article
  authors:
    - family-names: "Chanin"
      given-names: "David"
    - family-names: "Garriga-Alonso"
      given-names: "Adrià"
  title: "Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders"
  year: 2025
  url: "https://arxiv.org/abs/2508.16560"
  identifiers:
    - type: arxiv
      value: "2508.16560"
  archive:
    - name: "arXiv"
      location: "https://arxiv.org/"
  collection-type: "proceedings"

GitHub Events

Total
  • Push event: 2
Last Year
  • Push event: 2

Dependencies

.github/workflows/ci.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • astral-sh/setup-uv v4 composite
pyproject.toml pypi
  • matplotlib >=3.10.5
  • nbformat >=5.10.4
  • plotly >=6.3.0
  • sae-lens >=6.6.3
  • sae-probes >=0.1.3
  • seaborn >=0.13.2
  • tueplots >=0.2.1
uv.lock pypi
  • 163 dependencies