boxmot
BoxMOT: Pluggable SOTA multi-object tracking modules modules for segmentation, object detection and pose estimation models
clipseq
CLIP sequencing analysis pipeline for QC, pre-mapping, genome mapping, UMI deduplication, and multiple peak-calling options.
uform
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
x-anylabeling
Effortless data labeling with AI support from Segment Anything and other awesome models.
ppdiffusers
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
marqo-fashionclip
State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.
@stdlib/math-base-special-clamp
Restrict a double-precision floating-point number to a specified range.
@stdlib/math-base-special-clampf
Restrict a single-precision floating-point number to a specified range.
promptdet
PromptDet: Towards Open-vocabulary Detection using Uncurated Images, ECCV2022
https://github.com/bentoml/clip-api-service
CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search
https://github.com/autodistill/autodistill-clip
CLIP module for use with Autodistill.
https://github.com/capjamesg/sam-clip
Use Grounding DINO, Segment Anything, and CLIP to label objects in images.
b-cosification
[NeurIPS 2024] Code for the paper: B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable.
geospatial-rag
AI Framework for Remote Sensing Image Analysis using RAG - 88%+ accuracy, multi-modal queries, ChatGPT-like interface
detect-clip-backdoor-samples
[ICLR2025] Detecting Backdoor Samples in Contrastive Language Image Pretraining
https://github.com/924973292/mambapro
【AAAI2025】MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt
spn4cir
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
motherboard-dataset
[Kaggle Dataset] Motherboard production defect dataset for object detection. Currently available for YOLOv5, YOLOv7, YOLOv8 & CLIP. Also available on Kaggle.
https://github.com/ajaymin28/humanactionrecognition
CLIP based human action recognition, alignment of text and image using Prompt engineering.
https://github.com/autodistill/autodistill-metaclip
MetaCLIP module for use with Autodistill.
uninfo
The official code for "Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption."
peka-eclip
Download and prepare ENCODE eCLIP raw fastq for processing with the nf-core/clipseq pipeline
https://github.com/autodistill/autodistill-altclip
AltCLIP model for use with Autodistill.
bioclip
This is the repository for the BioCLIP model and the TreeOfLife-10M dataset [CVPR'24 Oral, Best Student Paper].
understanding-clip-ood
Official code for the paper: "When and How Does CLIP Enable Domain and Compositional Generalization?" (ICML 2025 Spotlight)
https://github.com/chen-yang-liu/git-rsclip
Git-RSCLIP pre-trained on 10 million Remote sensing image-text pairs