clip-retrieval

https://github.com/sbrood/clip-retrieval

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: sbrood
License: other
Language: Jupyter Notebook
Default Branch: main
Size: 5.95 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 0

Created over 3 years ago · Last pushed over 3 years ago

Metadata Files

Readme Changelog License Citation

CLIP for image-text retrieval

End of studies internship project

This repository is a fork of mlfundation implementation to train and use the CLIP model, please check their README as well as this one!

This aims at delivering a way to use and to finetune the CLIP model.

🪛 Installation

bash docker build . -t open-clip

bash docker run --rm -ti -v ${PWD}:/home/open-clip -v /my_dataset:/home/open-clip/my_dataset open-clip:latest

add --gpus '"device=0,1,2,3" to use gpus

add -v /dev/shm:/dev/shm if you are training on multiple gpus (this will enable access to shared memory)

⚠ This project uses python 3.7.

🍄 Usage

Metrics

Each of these metrics is available from image to text, and from text to image. - Median rank - Mean rank - R@X accuracy : accuracy on : 'grounth truth in the top-X ranked answers ?'

Available models

Open CLIP project tries to reach the same metric presented in OpenAI's paper. You can choose between OpenAI's pre-trained weights and open-clip pre-trained weights, with each available architecture and different pre-trained dataset.

Use the following commands to list the available models and their weights and to load a model. python import open_clip open_clip.list_pretrained() model, train_transform, eval_transform = open_clip.create_model_and_transforms('ViT-B-32', pretrained='laion2b_e16') To load other pre-trained image use pretrained-image python import open_clip model, train_transform, eval_transform = open_clip.create_model_and_transforms('ViT-B-32', pretrained-image='my_checkpoint_path')

👉 Basics

Simple inference

```python import torch from PIL import Image import open_clip

model, , preprocess = openclip.createmodelandtransforms('ViT-B-32-quickgelu', pretrained='laion400me32')

image = preprocess(Image.open("CLIP.png")).unsqueeze(0) text = open_clip.tokenize(["a diagram", "a dog", "a cat"])

with torch.nograd(): imagefeatures = model.encodeimage(image) textfeatures = model.encodetext(text) imagefeatures /= imagefeatures.norm(dim=-1, keepdim=True) textfeatures /= text_features.norm(dim=-1, keepdim=True)

text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs) # prints: [[1., 0., 0.]] ```

Evaluation Use the cliptestacc.py script : 1. Compute simple metrics on all dataset (may not fit GPU!) bash python retrieval.main --ground_truth_csv_path GROUND_TRUTH_CSV_PATH \ --csv_img_key CSV_IMG_KEY \ --csv_caption_key CSV_CAPTION_KEY \ --input_dir INPUT_DIR \ --csv_separator CSV_SEPARATOR \ [--network NETWORK] \ [--checkpoint CHECKPOINT] \ [--workers WORKERS] \ [--device DEVICE] \ [--pretrained PRETRAINED] \ [--log_rate LOG_RATE] \ [--tops TOPS] \

where - ground_truth_csv_path is the csv where you store the image filename, its label and shooting id - csv_img_key is the name of the column where the filename are - csv_caption_key is the name of the column where the labels are - input_dir is the folder where the images are stored - csv_separator is the separator character of your csv file - network is the name of the network, see Available Models for more precisions - checkpoint is the filename of the checkpoint - workers are the number of workers - device is cpu or cuda , default is cpu - pretrained is the source of the pretrained model, see Available Models for more precisions - log_rate is the rate for printing the metrics, default is 10 - tops is the accuracy tops to compute, to enter with spaces (e.g 1 2 4 9), default 1 2 3 5 10

Compute average metrics on shootings add --per_shooting --csv_shooting_key CSV_SHOOTING_KEY to retrieval command
From the training main from local checkpoint : bash python -m training.main --val-data="/path/to/validation_data.csv" --model RN101 --pretrained /path/to/checkpoints/epoch_K.pt from a hosted pretrained checkpoint bash python -m training.main --imagenet-val /path/to/imagenet/validation --model ViT-B-32-quickgelu --pretrained laion400m_e32

👉 Training

All the parameters can be found in training/params.py

Single GPU (example) bash python -m training.main \ --save-frequency 1 \ --zeroshot-frequency 1 \ --report-to tensorboard \ --train-data="path to train data csv" \ --val-data="path to validation csv" \ --csv-img-key filepath \ --csv-caption-key title \ --imagenet-val=/path/to/imagenet/root/val/ \ --warmup 10000 \ --batch-size=128 \ --lr=1e-3 \ --wd=0.1 \ --epochs=30 \ --workers=8 \ --model RN50

Multi GPUs (example) ```bash

torchrun --nprocpernode 4 -m training.main \ --train-data="path to train data csv" \ --val-data="path to validation csv" \ --csv-img-key "new filename" \ --csv-caption-key "food label" \ --csv-separator ',' \ --batch-size 128 \ --precision amp \ --workers 4 \ --model ViT-B-32 \ --epochs=40 \ --save-frequency 15 \ --pretrained 'openai' \ --warmup 100 \ --lr 5.0e-5\ --val-frequency 2 ```

🔒 LiT

LiT consist in lock the image tower and unlock the text tower. open-clip offers parameters to use this technique to fine-tune CLIP. Use the following parameters : - --lock-image to lock full image tower by disabling gradients. - --lock-image-unlocked-groups n to leave last n image tower layer groups unlocked. - --lock-image-freeze-bn-stats to freeze BatchNorm running stats in image tower for any locked layers

Weight and Biases - Log to weight and biases with wandb login - Add --report-to 'wandb' in script parameters - Open your WandB dashboard, you're set !

🌶 Dataset tools

Some script are available inside src/data for dataset management

gather_cc.py is an open-clip tool to download conceptual caption dataset.

🔗 Resources

Articles - CLIP , article, original code - LiT, Zero-Shot Transfer with Locked-image text Tuning, article, code

Repositories - OpenAI CLIP - Open-CLIP from ML fundation

Owner

Name: Sarah Brood
Login: sbrood
Kind: user
Location: Paris

Repositories: 1
Profile: https://github.com/sbrood

PhD Student at ENS Paris and LSCE working on deep learning models to retrieve forest properties from remote sensing data. 🌳

Citation (CITATION.cff)

cff-version: 1.1.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Ilharco
    given-names: Gabriel
  - family-names: Wortsman
    given-names: Mitchell
  - family-names: Wightman
    given-names: Ross
  - family-names: Gordon
    given-names: Cade   
  - family-names: Carlini
    given-names: Nicholas
  - family-names: Taori
    given-names: Rohan
  - family-names: Dave
    given-names: Achal
  - family-names: Shankar
    given-names: Vaishaal
  - family-names: Namkoong
    given-names: Hongseok
  - family-names: Miller
    given-names: John
  - family-names: Hajishirzi
    given-names: Hannaneh
  - family-names: Farhadi
    given-names: Ali
  - family-names: Schmidt
    given-names: Ludwig
title: OpenCLIP
version: v0.1
doi: 10.5281/zenodo.5143773
date-released: 2021-07-28

GitHub Events

Total

Last Year

Dependencies

Dockerfile docker

nvidia/cuda 11.3.1-cudnn8-runtime-ubuntu20.04 build

requirements-test.txt pypi

pytest ==7.0.1 test
pytest-xdist ==2.5.0 test

requirements-training.txt pypi

braceexpand *
ftfy *
pandas *
regex *
torch >=1.9.0
torchvision *
tqdm *
webdataset >=0.2.5

requirements.txt pypi

ftfy *
regex *
torch >=1.9.0
torchvision *
tqdm *

setup.py pypi

braceexpand *
ftfy *
pandas *
regex *
setproctitle *
torch *
torchvision *
tqdm *
wandb *
webdataset *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science