open-hummingbird-eval

This is a repository that implements the Dense NN Retrieval Evaluation used for evaluating the In-Context Learning Capabilities of Vision Encoders.

https://github.com/vpariza/open-hummingbird-eval

Last synced: 10 months ago · JSON representation ·

Repository

This is a repository that implements the Dense NN Retrieval Evaluation used for evaluating the In-Context Learning Capabilities of Vision Encoders.

Basic Info

Host: GitHub
Owner: vpariza
License: mit
Language: Python
Default Branch: main
Size: 701 KB

Statistics

Stars: 20
Watchers: 2
Forks: 3
Open Issues: 1
Releases: 0

Created about 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

Description

This repository is a reproduction repository that implements the Dense NN Retrieval Evaluation method introduced by Balažević et al. "Towards In-context Scene Understanding", NeurIPS 2023.

Briefly, it evaluates the effectiveness of spatial features acquired from a vision encoder, to associate themselves to relevant features from a dataset (validation), through the utilization of a k-NN classifier/retriever that operates across various proportions of training data.

Hummingbird Evaluation Image taken from "Towards In-context Scene Understanding", NeurIPS 2023.

This evaluation approach helps understand scenes by comparing new images with ones we already know. We start by showing it a bunch of densely labeled images. It densely encodes the images such that we have both the encoded patches (top-left section) and their labels (bottom-left section) as taken from a set of image-label examples given (left part). Then, we give it new images to describe (right part) without the labels, which again densely encodes. Then, it compares parts (encoded patches) of each of the given images with similar parts in the examples it knows. By looking at what's closest, it figures out what is the potential label for that part and therefore on what the new image might be showing. This is a flexible approach because it doesn't assume anything about the labels.

Reproduction done by: * Valentinos Pariza * Mohammadreza Salehi * Yuki M. Asano

At the University of Amsterdam (UvA)

Notes

For any questions/issues etc. please open a github issue on this repository.
If you find this repository useful, please consider starring and citing.

Results we got with our implementation on Pascal VOC

For the experiments below we used two dataset augmentation epochs and also we used image size of (512,512) for the dino and (504,504) for dinov2.

arch	model	PVOC (mIoU) per Memory Size			PVOC (mIoU) from orig. Paper
arch	model	1024*10²	1024*10³	1024*10⁴	1024*10⁴
ViT-S/16	dino	37.2	43.1	46.6	-
ViT-B/16	dino	44.9	50.8	55.7	55.9
ViT-S/14	dinov2	70.2	74.9	77.0	-
ViT-B/14	dinov2	69.1	74.6	76.9	-
ViT-L/14	dinov2	64.6	71.7	74.8	-
ViT-G/14	dinov2	62.3	69.9	73.6	-

Usage

Example on how to Evaluate dino with the Hummingbird (Dense NN Retrieval) Evaluation on Pascal VOC

```python import torch from src.hbirdeval import hbirdevaluation

Parameters for the model dino

device = 'cuda' inputsize = 224 batchsize = 64 patchsize = 16 embeddim = 384 model = torch.hub.load('facebookresearch/dino:main', 'dino_vits16')

Define the function to extract features from the model

Input to the function is the model and the images

Output of the function is the features extracted from the model

and optionally the attention maps

fn = lambda model, imgs: (model.getintermediatelayers(imgs)[0][:, 1:], None)

Evaluate the model using the Full In-Context Learning Hummingbird

or Dense k-NN Retrieval Evaluation on the Pascal VOC Dataset

hbirdmiou = hbirdevaluation(model.to(device), dmodel=embeddim, # size of the embedding feature vectors of patches patchsize=patchsize, batchsize = batchsize, inputsize=inputsize,
augmentationepoch=1, # how many iterations of augmentations to use on top of # the training dataset in order to generate the memory device=device,
returnknndetails=False, # whether to return additional NNs details nneighbours=30, # the number of neighbors to fetch per image patch nnmethod='', # options: faiss or scann as the k-nn library to be used, scann uses cpu, faiss gpu nnparams=None, # Other parameters to be used for the k-NN operator ftrextrfn=fn, # function that extracts image patch features with # a vision encoder datasetname='voc', # the name of the dataset to use, # currently only Pascal VOC is included. datadir='', # path to the dataset # to use for evaluation memorysize=None, # How much you want to limit your datasetNone if to be left unbounded trainfspath=None, # The path to the file with the subset of filenames for training valfs_path=None, # The path to the file with the subset of filenames for validation )

print('Dense NN Ret - miou score:', hbird_miou)

```

Example on how to Evaluate dinov2 with Dense NN Retrieval on Pascal VOC

```python import torch from src.hbirdeval import hbirdevaluation

Parameters for the model dino

device = 'cuda' inputsize = 224 batchsize = 256 patchsize = 14 embeddim = 384 model = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')

Define the function to extract features from the model

Input to the function is the model and the images

Output of the function is the features extracted from the model

and optionally the attention maps

fn = lambda model, imgs: (model.forwardfeatures(imgs)['xnorm_patchtokens'], None)

Evaluate the model using the Full In-Context Learning Hummingbird

or Dense k-NN Retrieval Evaluation on the Pascal VOC Dataset

hbirdmiou = hbirdevaluation(model.to(device), dmodel=embeddim, # size of the embedding feature vectors of patches patchsize=patchsize, batchsize = batchsize, inputsize=inputsize,
augmentationepoch=1, # how many iterations of augmentations to use on top of # the training dataset in order to generate the memory device=device,
returnknndetails=False, # whether to return additional NNs details nneighbours=30, # the number of neighbors to fetch per image patch nnmethod='', # options: faiss or scann as the k-nn library to be used, scann uses cpu, faiss gpu nnparams=None, # Other parameters to be used for the k-NN operator ftrextrfn=fn, # function that extracts image patch features with # a vision encoder datasetname='voc', # the name of the dataset to use, # currently only Pascal VOC is included. datadir='', # path to the dataset # to use for evaluation memorysize=None, # How much you want to limit your datasetNone if to be left unbounded trainfspath=None, # The path to the file with the subset of filenames for training valfspath=None, # The path to the file with the subset of filenames for validation )
print('Dense NN Ret - miou score:', hbirdmiou)

```

Ready to use script

We also provide a ready to use Python script to run evaluations using DINO backbones. For example, to evaluate a ViT S/16 on the whole Pascal VOC dataset using a memory bank of size 1024*10² you can run the following command

sh python eval.py \ --seed 42 \ --batch-size 64 \ --input-size 512 \ --patch-size 16 \ --memory-size 102400 \ --embeddings-size 384 \ --data-dir VOCSegmentation \ --model dino_vits16

Setup

This is the section describing what is required to execute the Dense NN Retrieval Evaluation. Installation instructions can be found to the Installation Guide.

If you want to install the library to have access everywhere when using a python environment, then you can do so: bash cd open-hummingbird-eval pip install . # or for editing and using: `pip install -e .`

Dataset Setup

We now have 5 available datasets: * ade20k * voc * coco-thing * coco-stuff * cityscapes

And you can now select a subset of the training dataset by including a suffix *<fraction> next to the dataset name. For example, to evaluate on a random 0.1 fraction of the Pascal VOC, you can specify dataset_name="voc*0.1" in the hbird_evaluation evaluation method above.

Please refer to the Dataset Guide to see the full structure of how each dataset folder should look like.

We also provide file sets in the file_sets folder that specify subsets of the original dataset. For example in the folder ./filesets/voc/1div_8/ there are 5 different subsets of the Pascal VOC training dataset that are a 1/8 fraction of the original dataset, keeping the same distribution of labels as the original dataset ( for the 1div128 that would be 1/128 fraction etc.).

Examples

Basic examples on how to download any of our dataset versions and evaluate a vision encoder with our implementation of the Hummingbird evaluation can be found at the examples folder.

You can also open it in google colab:

Example with using scann library

Example with using faiss-gpu library

Upcoming/Future Features

Stay tuned with our work because we will bring more support and extensions of our implementation for extra features.

| Feature | Description | | --- | --- | | NYUv2 | Support for Depth Estimation with this code for NYUv2 |

Contributors

| n | Username | | ------------- | ------------- | | 1 | @vpariza | | 2 | @Smsd75 | | 3 | @yukimasano |

Citations

If you find this repo helpful, please consider citing these works:

The original paper: @inproceedings{ balazevic2023towards, title={Towards In-context Scene Understanding}, author={Ivana Balazevic and David Steiner and Nikhil Parthasarathy and Relja Arandjelovic and Olivier J Henaff}, booktitle={Thirty-seventh Conference on Neural Information Processing Systems}, year={2023}, url={https://openreview.net/forum?id=FasIQqsJhe} }

Our work and repository: @misc{pariza2024hbird, author = {Pariza, Valentinos and Salehi, Mohammadreza and Asano, Yuki}, month = {4}, title = {Hummingbird Evaluation for vision encoders}, url = {https://github.com/vpariza/open-hummingbird-eval}, year = {2024} }

Owner

Name: Valentinos
Login: vpariza
Kind: user
Location: Amsterdam, Netherlands
Company: University of Amsterdam (UvA)

Website: https://www.linkedin.com/in/valentinos-pariza/
Repositories: 1
Profile: https://github.com/vpariza

I am an Masters student at the University of Amsterdam (UvA), studying Artificial Intelligence.

Citation (CITATION.cff)

cff-version: 1.2.0
title: Hummingbird Evaluation for vision encoders
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Valentinos
    family-names: Pariza
    email: valentinos.pariza@student.uva.nl
    affiliation: University of Amsterdam
    orcid: 'https://orcid.org/0009-0008-3440-9935'
  - given-names: Mohammadreza
    family-names: Salehi
    email: s.salehidehnavi@uva.nl
    affiliation: University of Amsterdam
    orcid: 'https://orcid.org/0000-0002-9247-9439'
  - given-names: Yuki
    family-names: Asano
    email: y.m.asano@uva.nl
    affiliation: University of Amsterdam
repository-code: 'https://github.com/vpariza/open-hummingbird-eval'
abstract: >-
  This repository implements the Dense NN Retrieval
  Evaluation method introduced by Balažević et al. Towards
  In-context Scene Understanding -
  https://arxiv.org/abs/2306.01667.
keywords:
  - Deep Learning
  - Vision
  - Semantic Segmentation
  - Dense NN Retrieval
  - Hummingbird
license: MIT
version: 1.0.0
date-released: '2024-04-18'

GitHub Events

Total

Issues event: 9
Watch event: 8
Delete event: 3
Issue comment event: 9
Push event: 17
Pull request review event: 1
Pull request event: 5
Fork event: 4
Create event: 4

Last Year

Issues event: 9
Watch event: 8
Delete event: 3
Issue comment event: 9
Push event: 17
Pull request review event: 1
Pull request event: 5
Fork event: 4
Create event: 4

open-hummingbird-eval

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Description

Notes

Results we got with our implementation on Pascal VOC

Usage

Example on how to Evaluate dino with the Hummingbird (Dense NN Retrieval) Evaluation on Pascal VOC

Parameters for the model dino

Define the function to extract features from the model

Input to the function is the model and the images

Output of the function is the features extracted from the model

and optionally the attention maps

Evaluate the model using the Full In-Context Learning Hummingbird

or Dense k-NN Retrieval Evaluation on the Pascal VOC Dataset

Example on how to Evaluate dinov2 with Dense NN Retrieval on Pascal VOC

Parameters for the model dino

Define the function to extract features from the model

Input to the function is the model and the images

Output of the function is the features extracted from the model

and optionally the attention maps

Evaluate the model using the Full In-Context Learning Hummingbird

or Dense k-NN Retrieval Evaluation on the Pascal VOC Dataset

Ready to use script

Setup

Dataset Setup

Examples

Example with using scann library

Example with using faiss-gpu library

Upcoming/Future Features

Contributors

Citations

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year