https://github.com/agroboticsresearch/alpha-clip

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: AgRoboticsResearch
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Size: 176 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme License

Alpha-CLIP

This repository is the official implementation of AlphaCLIP

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun*, Ye Fang*, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

*Equal Contribution

Demo Alpha-CLIP with Stable Diffusion:

Demo Alpha-CLIP with LLaVA:

News

[2024/3/4] CLIP-L/14@336px finetuned on GRIT-20M is available, checkout model-zoo!

[2024/2/27] Our paper Alpha-CLIP is accepted by CVPR'24!

[2024/1/2] Zero-shot testing code for Imagenet-S Classification and Referring Expression Comprehension are released!

[2023/12/27] Web demo and local demo of Alpha-CLIP with LLaVA are released!

[2023/12/7] Web demo and local demo of Alpha-CLIP with Stable Diffusion are released!

[2023/12/7] The paper and project page are released!

Highlights

3.93% improved zero-shot ImageNet classification accuracy when providing foreground alpha-map.
Plug-in and play with region focus in any work that use CLIP vision encoder.
A strong visual encoder as versatile tool when foreground mask is available.

Todo

[ ] Training code for Alpha-CLIP based on Open-CLIP
[x] Evaluation code for Alpha-CLIP
[x] Zero-shot evaluation for Imagenet-S Classification and REC tasks.
[x] Web demo and local demo of Alpha-CLIP with LLaVA
[x] Web demo and local demo of Alpha-CLIP with Stable Diffusion
[x] Usage example notebook of Alpha-CLIP
[x] Checkpoints of Alpha-CLIP

Usage

Installation

our model is based on CLIP, please first prepare environment for CLIP, then directly install Alpha-CLIP.

shell pip install -e .

install loralib

shell pip install loralib

how to use

Download model from model-zoo and place it under checkpoints.

python import alpha_clip alpha_clip.load("ViT-B/16", alpha_vision_ckpt_pth="checkpoints/clip_b16_grit1m_fultune_8xe.pth", device="cpu"), image_features = model.visual(image, alpha) alpha need to be normalized via transforms when using binary_mask in (0, 1)

python mask_transform = transforms.Compose([ transforms.ToTensor(), transforms.Resize((224, 224)), transforms.Normalize(0.5, 0.26) ]) alpha = mask_transform(binary_mask * 255)

Zero-shot Prediction

```python import torch import alpha_clip from PIL import Image import numpy as np from torchvision import transforms

load model and prepare mask transform

device = "cuda" if torch.cuda.isavailable() else "cpu" model, preprocess = alphaclip.load("ViT-L/14", alphavisionckptpth="./checkpoints/clipl14grit20mfultune2xe.pth", device=device) # change to your own ckpt path masktransform = transforms.Compose([ transforms.ToTensor(), transforms.Resize((224, 224)), # change to (336,336) when using ViT-L/14@336px transforms.Normalize(0.5, 0.26) ])

prepare image and mask

imgpth = './examples/image.png' maskpth = './examples/dress_mask.png' # image-type mask

image = Image.open(imgpth).convert('RGB') mask = np.array(Image.open(maskpth))

get `binary_mask` array (2-dimensional bool matrix)

if len(mask.shape) == 2: binarymask = (mask == 255) if len(mask.shape) == 3: binarymask = (mask[:, :, 0] == 255)

alpha = masktransform((binarymask * 255).astype(np.uint8)) alpha = alpha.half().cuda().unsqueeze(dim=0)

calculate image and text features

image = preprocess(image).unsqueeze(0).half().to(device) text = alpha_clip.tokenize(["a goegously dressed woman", "a purple sleeveness dress", "bouquet of pink flowers"]).to(device)

with torch.nograd(): imagefeatures = model.visual(image, alpha) textfeatures = model.encodetext(text)

normalize

imagefeatures = imagefeatures / imagefeatures.norm(dim=-1, keepdim=True) textfeatures = textfeatures / textfeatures.norm(dim=-1, keepdim=True)

print the result

similarity = (100.0 * imagefeatures @ textfeatures.T).softmax(dim=-1) print("Label probs:", similarity.cpu().numpy()) # prints: [[9.388e-05 9.995e-01 2.415e-04]] ```

Note: Using .half() for tensor or .float() for model to maintain type consistency.

More usage examples are available

Visualization of attention map: notebook
Alpha-CLIP used in BLIP-Diffusion: notebook
Alpha-CLIP used in SD_ImageVar: demo
Alpha-CLIP used in LLaVA-1.5: code demo
Alpha-CLIP evaluation code for Image Recognition: code

Demos

Acknowledgments

CLIP: The codebase we built upon. Thanks for their wonderful work.
LAVIS: The amazing open-sourced multimodality learning codebase, where we test Alpha-CLIP in BLIP-2 and BLIP-Diffusion.
Point-E: Wonderful point-cloud generation model, where we test Alpha-CLIP for 3D generation task.
LLaVA: Wounderful MLLM that use CLIP as visual bacbone where we test the effectiveness of Alpha-CLIP.

Citation

If you find our work helpful for your research, please consider giving a star and citation bibtex @misc{sun2023alphaclip, title={Alpha-CLIP: A CLIP Model Focusing on Wherever You Want}, author={Zeyi Sun and Ye Fang and Tong Wu and Pan Zhang and Yuhang Zang and Shu Kong and Yuanjun Xiong and Dahua Lin and Jiaqi Wang}, year={2023}, eprint={2312.03818}, archivePrefix={arXiv}, primaryClass={cs.CV} }

License

Usage and License Notices: The data and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of CLIP. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

Owner

Name: AgRoboticsResearch
Login: AgRoboticsResearch
Kind: organization

Repositories: 1
Profile: https://github.com/AgRoboticsResearch

GitHub Events

Total

Last Year

Dependencies

demo/with_diffusion/requirements.txt pypi

accelerate *
diffusers *
gradio ==3.37.0
transformers *
wget *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/agroboticsresearch/alpha-clip

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Alpha-CLIP

News

Highlights

Todo

Usage

Installation

how to use

load model and prepare mask transform

prepare image and mask

get `binary_mask` array (2-dimensional bool matrix)

calculate image and text features

normalize

print the result

Demos

Acknowledgments

Citation

License

Owner

GitHub Events

Total

Last Year

Dependencies

https://github.com/agroboticsresearch/alpha-clip

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Alpha-CLIP

News

Highlights

Todo

Usage

Installation

how to use

load model and prepare mask transform

prepare image and mask

get binary_mask array (2-dimensional bool matrix)

calculate image and text features

normalize

print the result

Demos

Acknowledgments

Citation

License

Owner

GitHub Events

Total

Last Year

Dependencies

get `binary_mask` array (2-dimensional bool matrix)