intr

This is an official implementation for [ICLR'24] INTR: Interpretable Transformer for Fine-grained Image Classification.

https://github.com/imageomics/intr

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
    Organization imageomics has institutional domain (imageomics.osu.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary

Keywords

computer-vision explainable-ai fine-grained-classification imageomics interpretation transformer
Last synced: 4 months ago · JSON representation ·

Repository

This is an official implementation for [ICLR'24] INTR: Interpretable Transformer for Fine-grained Image Classification.

Basic Info
  • Host: GitHub
  • Owner: Imageomics
  • License: apache-2.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 2.24 MB
Statistics
  • Stars: 49
  • Watchers: 13
  • Forks: 4
  • Open Issues: 0
  • Releases: 0
Topics
computer-vision explainable-ai fine-grained-classification imageomics interpretation transformer
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation Codeowners

README.md

INTR: A Simple Interpretable Transformer for Fine-grained Image Classification and Analysis (ICLR 2024)

This repo is the official implementation of INTR: A Simple Interpretable Transformer for Fine-grained Image Classification and Analysis. It currently includes code and models for the interpretation of fine-grained data. We will provide a link to the upcoming ICLR 2024 proceedings for this paper when it becomes available online.

INTR is a novel usage of Transformers to make image classification interpretable. In INTR, we investigate a proactive approach to classification, asking each class to look for itself in an image. We learn class-specific queries (one for each class) as input to the decoder, allowing them to look for their presence in an image via cross-attention. We show that INTR intrinsically encourages each class to attend distinctly; the cross-attention weights thus provide a meaningful interpretation of the model's prediction. Interestingly, via multi-head cross-attention, INTR could learn to localize different attributes of a class, making it particularly suitable for fine-grained classification and analysis.

Image Description

In the INTR model, each query in the decoder is responsible for the prediction of a class. So, a query looks at itself to find class-specific features from the feature map. First, we visualize the feature map i.e., the value matrix of the transformer architecture to see the important parts of the object in the image. To find the specific features, where the model pays attention in the value matrix, we show the heatmap of the attention of the model. To avoid external interference in the classification, we use a shared weight vector for classification so therefore the attention weight explains the model's prediction.

Image Description

Fine-tune models and results

INTR on DETR-R50 backbone, classification performance, and fine-tuned models on different datasets.

| Dataset | acc@1 | acc@5 | Model | |----------|----------|----------|----------| | CUB | 71.8 | 89.3 | checkpoint download| | Bird | 97.4 | 99.2 | checkpoint download| | Butterfly | 95.0 | 98.3 | checkpoint download|

Installation Instructions

Create python environment (optional) sh conda create -n intr python=3.8 -y conda activate intr

Clone the repository sh git clone https://github.com/dipanjyoti/INTR.git cd INTR

Install python dependencies

sh pip install -r requirements.txt

Data Preparation

Follow the below format for data. datasets ├── dataset_name │ ├── train │ │ ├── class1 │ │ │ ├── img1.jpeg │ │ │ ├── img2.jpeg │ │ │ └── ... │ │ ├── class2 │ │ │ ├── img3.jpeg │ │ │ └── ... │ │ └── ... │ └── val │ ├── class1 │ │ ├── img4.jpeg │ │ ├── img5.jpeg │ │ └── ... │ ├── class2 │ │ ├── img6.jpeg │ │ └── ... │ └── ...

INTR Evaluation

To evaluate the performance of INTR on the CUB dataset, on a multi-GPU (e.g., 4 GPUs) settings, execute the below command. INTR checkpoints are available at Fine-tune model and results.

sh CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port 12345 --use_env main.py --eval --resume <path/to/intr_checkpoint_cub_detr_r50.pth> --dataset_path <path/to/datasets> --dataset_name <dataset_name>

INTR Interpretation

To generate visual representations of the INTR's interpretations, execute the provided command below. This command will present the interpretation for a specific class with the index . By default, it will display interpretations from all attention heads. To focus on interpretations associated with the top queries labeled as topq as well, set the parameter simquery_heads to 1. Use a batch size of 1 for the visualization.

sh python -m tools.visualization --eval --resume <path/to/intr_checkpoint_cub_detr_r50.pth> --dataset_path <path/to/datasets> --dataset_name <dataset_name> --class_index <class_number>

Inference time single-image prediction and visualization: We've also provided a Jupyter Notebook, demo.ipynb, designed for single-image prediction and visualization during the inference process. Please note that the demo is focused on the CUB dataset.

INTR Training

To prepare INTR for training, use the pretrained model DETR-R50. To train for a particular dataset, modify '--num_queries' by setting it to the number of classes in the dataset. Within the INTR architecture, each query in the decoder is assigned the task of capturing class-specific features, which means that every query can be adapted through the learning process. Consequently, the total number of model parameters will grow in proportion to the number of classes in the dataset. To train INTR on a multi-GPU system, (e.g., 4 GPUs), execute the command below.

sh CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port 12345 --use_env main.py --finetune <path/to/detr-r50-e632da11.pth> --dataset_path <path/to/datasets> --dataset_name <dataset_name> --num_queries <num_of_classes>

Acknowledgment

Our model is inspired by the DEtection TRansformer (DETR) method.

We thank the authors of DETR for doing such great work.

Bibtext Paper

If you find our work helpful for your research, please consider citing the BibTeX entry.

sh @inproceedings{paul2024simple, title={A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis}, author={Paul, Dipanjyoti and Chowdhury, Arpita and Xiong, Xinqi and Chang, Feng-Ju and Carlyn, David and Stevens, Samuel and Provost, Kaiya and Karpatne, Anuj and Carstens, Bryan and Rubenstein, Daniel and Stewart, Charles and Berger-Wolf, Tanya and Su, Yu and Chao, Wei-Lun}, booktitle={International Conference on Learning Representations}, year={2024} }

Owner

  • Name: Imageomics Institute
  • Login: Imageomics
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  A Simple Interpretable Transformer for Fine-Grained Image
  Classification and Analysis
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Dipanjyoti
    family-names: Paul
    email: paul.1164@osu.edu
    affiliation: The Ohio State University
    orcid: 'https://orcid.org/0000-0001-9079-7524'
  - given-names: Arpita
    family-names: Chowdhury
  - given-names: Xinqi
    family-names: Xiong
  - given-names: Feng-Ju
    family-names: Chang
  - given-names: David
    family-names: Carlyn
  - given-names: Samuel
    family-names: Stevens
  - given-names: Kaiya
    family-names: Provost
  - given-names: Anuj
    family-names: Karpatne
  - given-names: Bryan
    family-names: Carstens
  - given-names: Daniel
    family-names: Rubenstein
  - given-names: Charles
    family-names: Stewart
  - given-names: Tanya
    family-names: Berger-Wolf
  - given-names: Yu
    family-names: Su
  - given-names: Wei-Lun
    family-names: Chao
identifiers:
  - type: doi
    value: 10.48550/arXiv.2311.04157
repository-code: 'https://github.com/Imageomics/INTR'
keywords:
  - explainable-ai
  - interpretation
  - imageomics
  - fine-grained-classification
  - transformer
  - computer-vision
license: Apache-2.0
commit: INTR
version: 1.0.0
date-released: '2023-09-27'

GitHub Events

Total
  • Watch event: 15
Last Year
  • Watch event: 15

Dependencies

requirements.txt pypi
  • cython *
  • matplotlib *
  • onnx *
  • onnxruntime *
  • opencv-python *
  • pycocotools *
  • scipy *
  • seaborn *
  • submitit *
  • timm ==0.9.0
  • torch >=1.5.0
  • torchvision >=0.6.0
  • transformers *