zoomisallyouneed

Official code and data for NeurIPS 2023 paper "ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification"

https://github.com/taesiri/zoomisallyouneed

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary

Keywords

image-recognition imagenet imagenet-hard neurips object-detection ood out-of-distribution

Last synced: 10 months ago · JSON representation ·

Repository

Official code and data for NeurIPS 2023 paper "ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification"

Basic Info

Host: GitHub
Owner: taesiri
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage: https://taesiri.github.io/ZoomIsAllYouNeed/
Size: 110 MB

Statistics

Stars: 38
Watchers: 4
Forks: 2
Open Issues: 0
Releases: 0

Topics

image-recognition imagenet imagenet-hard neurips object-detection ood out-of-distribution

Created over 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification

by [Mohammad Reza Taesiri ](https://taesiri.ai/), [Giang Nguyen](https://giangnguyen2412.github.io/), [Sarra Habchi](https://habchisarra.github.io/), [Cor-Paul Bezemer](https://asgaard.ece.ualberta.ca/), and [Anh Nguyen](https://anhnguyen.me/). [![Website](http://img.shields.io/badge/Website-4b44ce.svg)](https://taesiri.github.io/ZoomIsAllYouNeed/) [![Supplementary Material](http://img.shields.io/badge/Supplementary%20Material-4b44ce.svg)](https://drive.google.com/drive/folders/1bTj5GUGpGp4qssZWVuYCYbUzWy14ASJ6?usp=sharing) [![arXiv](https://img.shields.io/badge/arXiv-2304.05538-b31b1b.svg)](https://arxiv.org/abs/2304.05538) [![Hugging Face Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-red)](https://huggingface.co/datasets/taesiri/imagenet-hard) [![Hugging Face Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue)](https://huggingface.co/datasets/taesiri/imagenet-hard-4k)

Abstract

Image classifiers are information-discarding machines, by design. Yet, how these models discard information remains mysterious. We hypothesize that one way for image classifiers to reach high accuracy is to first zoom to the most discriminative region in the image and then extract features from there to predict image labels, discarding the rest of the image. Studying six popular networks ranging from AlexNet to CLIP, we find that proper framing of the input image can lead to the correct classification of 98.91% of ImageNet images. Furthermore, we uncover positional biases in various datasets, especially a strong center bias in two popular datasets: ImageNet-A and ObjectNet. Finally, leveraging our insights into the potential of zooming, we propose a test-time augmentation (TTA) technique that improves classification accuracy by forcing models to explicitly perform zoom-in operations before making predictions. Our method is more interpretable, accurate, and faster than MEMO, a state-of-the-art (SOTA) TTA method. We introduce ImageNet-Hard, a new benchmark that challenges SOTA classifiers including large vision-language models even when optimal zooming is allowed.

https://user-images.githubusercontent.com/588431/231219248-08eab4cc-6c9e-4bae-8003-176149f4987c.mp4

ImageNet-Hard

The ImageNet-Hard is a new benchmark that comprises an array of challenging images, curated from several validation datasets of ImageNet. This dataset challenges state-of-the-art vision models, as merely zooming in often fails to enhance their ability to correctly classify images. Consequently, even the most advanced models, such as CLIP-ViT-L/14@336px, struggle to perform well on this dataset, achieving only 2.02% accuracy.

The ImageNet-Hard dataset is avaible to access and browser on Hugging Face: - ImageNet-Hard - ImageNet-Hard-4K .

Dataset Distribution

Performance Report

| Model | Accuracy | | ------------------- | -------- | | AlexNet | 7.34 | | VGG-16 | 12.00 | | ResNet-18 | 10.86 | | ResNet-50 | 14.74 | | ViT-B/32 | 18.52 | | EfficientNet-B0 | 16.57 | | EfficientNet-B7 | 23.20 | | EfficientNet-L2-Ns | 39.00 | | CLIP-ViT-L/14@224px | 1.86 | | CLIP-ViT-L/14@336px | 2.02 | | OpenCLIP-ViT-bigG-14| 15.93 | | OpenCLIP-ViT-L-14 | 15.60 |

Evaluation Code

CLIP
OpenCLIP
Other models

Supplementary Material

You can find all the supplementary material on Google Drive.

Citation information

If you use this software, please consider citing:

@article{taesiri2023zoom, title={ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification}, author={Taesiri, Mohammad Reza and Nguyen, Giang and Habchi, Sarra and Bezemer, Cor-Paul and Nguyen, Anh}, booktitle={Advances in Neural Information Processing Systems} year={2023} }

Owner

Name: Mohammad Reza Taesiri
Login: taesiri
Kind: user
Location: Planet Mars

Website: https://taesiri.com
Twitter: taesiri
Repositories: 29
Profile: https://github.com/taesiri

Representation Learning

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Zoom Is What You Need
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Mohammad Reza
    family-names: Taesiri
    email: mtaesiri@gmail.com
    affiliation: University of Alberta
  - given-names: Giang
    email: nguyengiangbkhn@gmail.com
    family-names: Nguyen
    affiliation: Auburn University
  - given-names: Sarra
    family-names: Habchi
    email: sarra.habchi@ubisoft.com
    affiliation: Ubisoft
  - given-names: Cor-Paul
    family-names: ' Bezemer'
    email: bezemer@ualberta.ca
    affiliation: University of Alberta
  - given-names: Anh
    family-names: Nguyen
    email: anh.ng8@gmail.com
    affiliation: Auburn University
repository-code: 'https://github.com/taesiri/ZoomIsAllYouNeed'
url: 'https://taesiri.github.io/ZoomIsAllYouNeed/'
abstract: >-
  Image classifiers are information-discarding machines, by
  design. Yet, how these models discard information remains
  mysterious. We hypothesize that one way for image
  classifiers to reach high accuracy is to first learn to
  zoom to the most discriminative region in the image and
  then extract features from there to predict image labels.
  We study six popular networks ranging from AlexNet to
  CLIP, and we show that proper framing of the input image
  can lead to the correct classification of 98.91% of
  ImageNet images. Furthermore, we explore the potential and
  limits of zoom transforms in image classification and
  uncover positional biases in various datasets, especially
  a strong center bias in two popular datasets: ImageNet-A
  and ObjectNet. Finally, leveraging our insights into the
  potential of zoom, we propose a state-of-the-art test-time
  augmentation (TTA) technique that improves classification
  accuracy by forcing models to explicitly perform zoom-in
  operations before making predictions. Our method is more
  interpretable, accurate, and faster than MEMO, a
  state-of-the-art TTA method. Additionally, we propose
  ImageNet-Hard, a new benchmark where zooming in alone
  often does not help state-of-the-art models better label
  images.
keywords:
  - Zoom
  - Representation Learning
  - ImageNet-Hard
  - Robustness
license: MIT

GitHub Events

Total

Watch event: 3

Last Year

Watch event: 3

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 1
Total pull requests: 0
Average time to close issues: about 3 hours
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 2.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science