tta_dl-project

University project based on implementing a Test Time Adaptation (TTA) solution for image classifiers. University of Trento (Italy)

https://github.com/lucazzola/tta_dl-project

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.1%) to scientific vocabulary

Keywords

contrastive-language-image-pretraining deep-learning image-augm prompt-engineering test-time-adaptation
Last synced: 6 months ago · JSON representation ·

Repository

University project based on implementing a Test Time Adaptation (TTA) solution for image classifiers. University of Trento (Italy)

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
contrastive-language-image-pretraining deep-learning image-augm prompt-engineering test-time-adaptation
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Test Time Adaptation (TTA) project

Test Time Adaptation (TTA) explores the possibility to improve a model's performaces working at test time instead of fine tuning it in a "traditional" way. That can be a really effective and helpfull practice mostly for 2 reasons: 1) 💥 Fine tuning itself might be not so straight forward. It really depends on the architecture, but it can be challenging. 2) 💸 Big models require non neglectable computational capacity & data to work with. (Lots of money).

Our obective is to implement a TTA solution to improve an existent image classifier.

contributors : @LuCazzola @lorenzialessandro


Design

The backbone model of choice is Contrastive Language–Image Pre-training (CLIP), a well known model by OpenAI trained with the contrastive learning paradigma, capable of making zero-shot classification.


A possible TTA solution for CLIP as Test-Time Prompt Tuning (TPT)



Our Contribution

For the most part we focussed on finding better alternatives to the image augmentation methods proposed in TPT :

Image Augmentations

1) PreAugment 2) AugMix 3) AutoAugment 4) DiffusionAugment

N.B. $\rightarrow$ implementation matters! notebook


Testing on ImageNet-A we scored :

| Augmentation Technique | Avg Accuracy (%) | | ---------------------- | ---------------------------- | | PreAugment | 27.51 | | AugMix | 28.80 | | **AutoAugment** | **30.36** | | DiffusionAugment | _[notebook](notebook.ipynb)_ |

Prompt Augmentation

We introduce a our approach for augmenting prompts using an image captioning system.

This method aims to create more context-aware prompts compared to the standard, generic descriptions like "a photo of a {label}" Our hypothesis is that captions specifically tailored to the content of the image will enhance the alignment between the image and the class labels, leading to improved model performance.



Accuracy on CLIP (CLIP-RN50):

| Method | Avg Loss | Avg Accuracy (%) | | --------------------- | ------------- | ---------------- | | Our Method | 3.0781 | 19.41 | | Baseline | - | 21.83 |

Accuracy on CLIP (CLIP-ViT-B/16):

| Method | Avg Loss | Avg Accuracy (%) | | --------------------- | ------------- | ---------------- | | Our Method | 2.5711 | 42.13 | | Baseline | - | 47.87 |

Results are a bit underwhelming, but there's much room for improvement! read the notebook for a better insight on our methodology.

Owner

  • Login: LuCazzola
  • Kind: user

Citation (CITATION.cff)

@software{TTA_LucaC_AleL,
  author = {Luca Cazzola, Alessandro Lorenzi},
  month = {8},
  title = {{TTA_DL-Project}},
  url = {https://github.com/LuCazzola/TTA_DL-Project},
  version = {1.0},
  year = {2024}
}

GitHub Events

Total
Last Year