sparo-clip

Separate-head attention read-out

https://github.com/ankitkv/sparo-clip

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Separate-head attention read-out

Basic Info
  • Host: GitHub
  • Owner: ankitkv
  • License: other
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 8.47 MB
Statistics
  • Stars: 5
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog License Citation

README.md

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

Ankit Vani, Bac Nguyen, Samuel Lavoie, Ranjay Krishna, Aaron Courville

Published at ECCV 2024

[Paper, arXiv]

Abstract: Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often fail to demonstrate robustness and compositionality. We highlight a missing architectural prior: unlike human perception, transformer encodings do not separately attend over individual concepts. In response, we propose SPARO, a read-out mechanism that partitions encodings into separately-attended slots, each produced by a single attention head. Using SPARO with CLIP imparts an inductive bias that the vision and text modalities are different views of a shared compositional world with the same corresponding concepts. Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each). We also showcase a powerful ability to intervene and select individual SPARO concepts to further improve downstream task performance (up from +4% to +9% for SugarCrepe) and use this ability to study the robustness of SPARO's representation structure. Finally, we provide insights through ablation experiments and visualization of learned concepts.

This repository demonstrates the incorporation of SPARO onto the CLIP encoders.

Demo scripts for training models on a single GPU are provided in scripts/.

This code is based on OpenCLIP v2.20.0.

Citation

@inproceedings{vani2024sparo, title={{SPARO}: Selective Attention for Robust and Compositional Transformer Encodings for Vision}, author={Ankit Vani and Bac Nguyen and Samuel Lavoie and Ranjay Krishna and Aaron Courville}, booktitle = {European Conference on Computer Vision}, year={2024}, }

Owner

  • Name: Ankit Vani
  • Login: ankitkv
  • Kind: user
  • Location: Canada
  • Company: Mila (@mila-iqia)

PhD candidate in machine learning

Citation (CITATION.cff)

cff-version: 1.1.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Ilharco
    given-names: Gabriel
  - family-names: Wortsman
    given-names: Mitchell
  - family-names: Wightman
    given-names: Ross
  - family-names: Gordon
    given-names: Cade   
  - family-names: Carlini
    given-names: Nicholas
  - family-names: Taori
    given-names: Rohan
  - family-names: Dave
    given-names: Achal
  - family-names: Shankar
    given-names: Vaishaal
  - family-names: Namkoong
    given-names: Hongseok
  - family-names: Miller
    given-names: John
  - family-names: Hajishirzi
    given-names: Hannaneh
  - family-names: Farhadi
    given-names: Ali
  - family-names: Schmidt
    given-names: Ludwig
title: OpenCLIP
version: v0.1
doi: 10.5281/zenodo.5143773
date-released: 2021-07-28

GitHub Events

Total
  • Watch event: 1
  • Push event: 2
Last Year
  • Watch event: 1
  • Push event: 2