enhance-finegrained

[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding

https://github.com/lezhang7/enhance-finegrained

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding

Basic Info

Host: GitHub
Owner: lezhang7
License: other
Language: Python
Default Branch: main
Homepage:
Size: 16 MB

Statistics

Stars: 47
Watchers: 1
Forks: 1
Open Issues: 2
Releases: 0

Created over 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License Citation

Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding

:tada: The paper was accepted to CVPR 2024:

TL;DR: We Propose two losses on our generated hard negative examples to enhance model's compositional understanding ability for CLIP.

motivation

This repo forks from wonderful OpenCLIP, for model and training details, please refer to original repo.

:ballotboxwith_check: Checkpoints

The checkpoints could be downloaded directly using gdown with following script:

bash pip install --upgrade --no-cache-dir gdown # must update gdown to avoid bugs, thanks to https://github.com/wkentaro/gdown/issues/146 gdown 1DWPw3CtGh5cHz9bW_-iXRSG7BBUVl13K #download checkpoint for CE-CLIP

You can also download it from 🤗 Hugging Face: https://huggingface.co/le723z/CE_CLIP

Training

1. Generating Training dataset

The training data is generated based on COCO 2014, so you can either download by yourself and assign coco dataset_path in dataset.py or you can simply run following script to download and generate dataset

python cd data/ bash prepare_dataset.sh

2. Training

you need to specify training parameters in scrips/runall.sh such as --gres=gpu:a100:2 and `batchsize`, please refer to this script file to see more details, to simply run the training, using following scritps

python cd scripts/ bash run_multiple_nodes.sh

The result checkpoint will be at Enhance-FineGrained/src/Outputs

Evaluation

We evaluate our method on four downstream task ARO, VALSE and VL-CheckList, and very recent SugarCrepe and we also provide evaluation code. However, one need go to official github page to download dataset to evaluate on them.

ARO&VALSE

ARO

Evaluation code for ARO is included in Enhance-FineGrained/vision-language-models-are-bows, to reproduce results, you need

set up environment by running bash Enhance-FineGrained/vision-language-models-are-bows/scripts/create_environment.sh
cd Enhance-FineGrained/vision-language-models-are-bows/scripts and change the checkpoint path in reproduce_aro.sh, then run the script to reproduce the results. Note that dataset will be download automatically

Evaluation code for VALSE is included in Enhance-FineGrained/VALSE, to reproduce results on valse, please download dataset here first. Then replace dataset path in Enhance-FineGrained/VALSE/clip_valse_eval.py Enhance-FineGrained/VALSE/xvlm_valse_eval.py
replace $checkpoint in Enhance-FineGrained/VALSE/scripts then run the scripts, evaluation results will be included in /home/mila/l/le.zhang/scratch/Enhance-FineGrained/VALSE/output

VL-CheckList [Not Suggested]

vlchecklist

:exclamation: Note: The original dataset is not complete, we encourage skip this dataset

Please refer to official github repo to download dataset and perform evaluation. Note that Downloading the dataset can be quite cumbersome

we provide script at here

:star2: SugarCrepe

sugarcrepe

SugarCrepe is a benchmark for faithful vision-language compositionality evaluation. This dataset fix a several biases in all above benchmarks rendering them hackable that blind models with no access to the image outperform state-of-the-art vision-language models.

to evaluate on this dataset, simply clone their repo and follow their installation setup, and assign retrained to our checkpoints

python python main_eval.py --model ViT-B-32 --pretrained Enhance-FineGrained/clip/epoch_5.pt \ --output ./output \ --coco_image_root ./data/coco/images/val2017/ \ --data_root ./data/ \

Ablations

Our method entails curriculum learning, which is validated by the growth of adaptive threshold

abaltion

:paperclip: Citation

bibtex @article{zhang2023contrasting, title={Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding}, author={Zhang, Le and Awal, Rabiul and Agrawal, Aishwarya}, journal={arXiv preprint arXiv:2306.08832}, year={2023} }

:email: Contact

please let us know if you have further questions or comments, reach out to le.zhang@mila.quebec

Owner

Name: Le Zhang
Login: lezhang7
Kind: user
Location: 6666 St-Urbain, #200, Montréal, QC, H2S 3H1
Company: MILA - Quebec AI Institute

Website: https://lezhang7.github.io/
Repositories: 1
Profile: https://github.com/lezhang7

representation learning, cross vision&language learning

Citation (CITATION.cff)

cff-version: 1.1.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Ilharco
    given-names: Gabriel
  - family-names: Wortsman
    given-names: Mitchell
  - family-names: Wightman
    given-names: Ross
  - family-names: Gordon
    given-names: Cade   
  - family-names: Carlini
    given-names: Nicholas
  - family-names: Taori
    given-names: Rohan
  - family-names: Dave
    given-names: Achal
  - family-names: Shankar
    given-names: Vaishaal
  - family-names: Namkoong
    given-names: Hongseok
  - family-names: Miller
    given-names: John
  - family-names: Hajishirzi
    given-names: Hannaneh
  - family-names: Farhadi
    given-names: Ali
  - family-names: Schmidt
    given-names: Ludwig
title: OpenCLIP
version: v0.1
doi: 10.5281/zenodo.5143773
date-released: 2021-07-28

GitHub Events

Total

Issues event: 2
Watch event: 12
Issue comment event: 4
Push event: 1

Last Year

Issues event: 2
Watch event: 12
Issue comment event: 4
Push event: 1

Dependencies

.github/workflows/ci.yml actions

actions/cache v3 composite
actions/checkout v3 composite
actions/download-artifact v3 composite
actions/setup-python v4 composite
actions/upload-artifact v3 composite

.github/workflows/clear-cache.yml actions

actions/github-script v6 composite

.github/workflows/python-publish.yml actions

actions-ecosystem/action-regex-match v2 composite
actions/checkout v2 composite
actions/setup-python v2 composite
softprops/action-gh-release v1 composite

requirements-test.txt pypi

pytest ==7.2.0 test
pytest-split ==0.8.0 test
timm ==0.6.11 test
transformers * test

requirements-training.txt pypi

braceexpand *
fsspec *
ftfy *
huggingface_hub *
pandas *
pycocotools *
regex *
timm *
torch >=1.9.0
torchvision *
tqdm *
transformers *
webdataset >=0.2.5

requirements.txt pypi

ftfy *
huggingface_hub *
protobuf ==3.20.
regex *
sentencepiece *
timm *
torch >=1.9.0
torchvision *
tqdm *

setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science