bioclip-2

Repository for the BioCLIP 2 model project.

https://github.com/imageomics/bioclip-2

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
    Organization imageomics has institutional domain (imageomics.osu.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary

Keywords

biology clip computer-vision imageomics knowledge-guided-machine-learning taxonomy
Last synced: 4 months ago · JSON representation ·

Repository

Repository for the BioCLIP 2 model project.

Basic Info
Statistics
  • Stars: 30
  • Watchers: 1
  • Forks: 4
  • Open Issues: 1
  • Releases: 0
Topics
biology clip computer-vision imageomics knowledge-guided-machine-learning taxonomy
Created 8 months ago · Last pushed 4 months ago
Metadata Files
Readme Changelog License Citation

README.md

BioCLIP 2 DOI

This repository contains the code for BioCLIP 2 training and evaluation (testing and visualizing embeddings). We developed this repository based on BioCLIP and OpenCLIP. BioCLIP 2 is trained on the TreeOfLife-200M dataset and achieves state-of-the-art performance on both species classification and other biological visual tasks. The BioCLIP 2 website is hosted from the gh-pages branch of this repository.

Paper | Model | Data | Demo

BioCLIP 2 is a CLIP model trained on a new 200M-image dataset of biological organisms with fine-grained taxonomic labels. BioCLIP 2 outperforms general domain baselines on a wide spread of biology-related tasks, including zero-shot and few-shot classification.

Table of Contents

  1. Model
  2. Training and Evaluation Commands
  3. Paper, website, and data
  4. Citation

Model

The main differences in the training implementation between BioCLIP 2 and BioCLIP are the adopted model architecture and the introduction of experience replay. BioCLIP 2 employs a ViT-L/14 CLIP architecture pre-trained with LAION-2B data. Along with the contrastive optimization of biological organism data, we also include part of the LAION-2B data for experience replay. In order to reduce the influence of the domain gap between hierarchical labels and image captions, we use two separate visual projectors on top of the visual encoder. This part of the code is in transformer.py. We provide the weight of BioCLIP 2 in the BioCLIP 2 model repo.

Commands

Training

The TreeOfLife-200M images can be downloaded from their original sources with distributed-downloader. TreeOfLife-toolbox/docs contains instructions for full download into the proper format, and the code to construct the webdataset for training. These repositories are included in the supplementary material. img2dataset can be used to download data from the first three metadata parquet files of LAION-2B-en; we use the first downloaded 4,000 tar files for experience replay. Finally, download the validation set from TreeOfLife-10M (download instructions), as we use that for evaluation during training.

Clone this repository, then install the requirements: conda env create -f requirements-training.yml

To train the model, run: sbatch slurm/train.sh

Evaluation

Species classification

We evaluated BioCLIP 2 on the same test sets as used for BioCLIP, as well as a newly curated camera trap test set:

The metadata used in evaluation is provided in data/annotation, including NABirds, Rare Species, and other benchmarks from Meta Album. All evaluation parameters are described in src/evaluation/README.md. Please be sure to update the directories accordingly to reflect the locations of these data and metadata in slurm/eval.sh and run: sbatch slurm/eval.sh

Other biological visual tasks

We also evaluated on biological tasks that go beyond species classification with the following datasets: - NeWT - FishNet - AwA2 - Herbarium19 - PlantDoc

Please be sure to update the directories accordingly to reflect the locations of these data in slurm/eval_other.sh and run: sbatch slurm/eval_other.sh

Paper, Website, and Data

We have a preprint on arXiv and a project website.

Our data is published on Hugging Face: TreeOfLife-200M and IDLE-OO Camera Traps. Step-by-step download instructions for TreeOfLife-200M are available in TreeOfLife-toolbox.

Citation

Please cite our papers and the associated repositories if you use our code or results.

@article{gu2025bioclip, title = {{B}io{CLIP} 2: Emergent Properties from Scaling Hierarchical Contrastive Learning}, author = {Jianyang Gu and Samuel Stevens and Elizabeth G Campolongo and Matthew J Thompson and Net Zhang and Jiaman Wu and Andrei Kopanev and Zheda Mai and Alexander E. White and James Balhoff and Wasila M Dahdul and Daniel Rubenstein and Hilmar Lapp and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su}, year = {2025}, eprint={2505.23883}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2505.23883}, }

Our code (this repository): @software{bioclip2code, author = {Jianyang Gu and Samuel Stevens and Elizabeth G. Campolongo and Matthew J. Thompson and Net Zhang and Jiaman Wu and Zheda Mai}, doi = {10.5281/zenodo.15644363}, title = {{B}io{CLIP} 2}, version = {1.0.1}, month = {sep}, year = {2025} }

Also consider citing OpenCLIP and BioCLIP:

@software{ilharco_gabriel_2021_5143773, author={Ilharco, Gabriel and Wortsman, Mitchell and Wightman, Ross and Gordon, Cade and Carlini, Nicholas and Taori, Rohan and Dave, Achal and Shankar, Vaishaal and Namkoong, Hongseok and Miller, John and Hajishirzi, Hannaneh and Farhadi, Ali and Schmidt, Ludwig}, title={OpenCLIP}, year={2021}, doi={10.5281/zenodo.5143773}, }

Original BioCLIP Paper: @inproceedings{stevens2024bioclip, title = {{B}io{CLIP}: A Vision Foundation Model for the Tree of Life}, author = {Samuel Stevens and Jiaman Wu and Matthew J Thompson and Elizabeth G Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2024}, pages = {19412-19424} } Original Code: @software{bioclip2023code, author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn}, doi = {10.5281/zenodo.10895871}, title = {BioCLIP}, version = {v1.0.0}, year = {2024} }

License

BioCLIP 2 is released under the MIT License. Some elements of the code are copyright by others (see LICENSE); detailed provenance information is provided in HISTORY.md.

Owner

  • Name: Imageomics Institute
  • Login: Imageomics
  • Kind: organization

Citation (CITATION.cff)

---
abstract: "Foundation models trained at scale exhibit remarkable emergent behaviors, 
  learning new capabilities beyond their initial training objectives. We find such emergent 
  behaviors in biological vision models via large-scale contrastive vision-language training. 
  To achieve this, we first curate TreeOfLife-200M, comprising 214 million images of living 
  organisms, the largest and most diverse biological organism image dataset to date. We then 
  train BioCLIP 2 on TreeOfLife-200M to distinguish different species. Despite the narrow 
  training objective, BioCLIP 2 yields extraordinary accuracy when applied to various biological 
  visual tasks such as habitat classification and trait prediction. We identify emergent 
  properties in the learned embedding space of BioCLIP 2. At the inter-species level, the 
  embedding distribution of different species aligns closely with functional and ecological 
  meanings (e.g. beak sizes and habitats). At the intra-species level, instead of being diminished, 
  the intra-species variations (e.g. life stages and sexes) are preserved and better separated 
  in subspaces orthogonal to inter-species distinctions. We provide formal proof and analyses 
  to explain why hierarchical supervision and contrastive objectives encourage these emergent 
  properties. Crucially, our results reveal that these properties become increasingly significant 
  with larger-scale training data, leading to a biologically meaningful embedding space."
authors:
  - family-names: Gu
    given-names: Jianyang
  - family-names: Stevens
    given-names: Samuel
  - family-names: Campolongo
    given-names: "Elizabeth G."
  - family-names: Thompson
    given-names: "Matthew J."
  - family-names: Zhang
    given-names: Net
  - family-names: Wu
    given-names: Jiaman
  - family-names: Mai
    given-names: Zheda
cff-version: 1.2.0
date-released: "2025-09-03"
identifiers:
  - doi: "10.5281/zenodo.15644363"
  - description: "The GitHub release URL of tag v1.0.1."
    type: url
    value: "https://github.com/Imageomics/bioclip-2/releases/tag/v1.0.1"
  - description: "The GitHub URL of the commit tagged with v1.0.1."
    type: url
    value: "https://github.com/Imageomics/bioclip-2/tree/92bf5e1f74e40df91a02c4dd6cad63b3396c94cf"
keywords:
  - clip
  - biology
  - CV
  - imageomics
  - animals
  - species
  - images
  - taxonomy
  - "rare species"
  - "endangered species"
  - "evolutionary biology"
  - multimodal
  - "knowledge-guided"
license: MIT
message: "If you use this software, please cite both the article and the software itself."
repository-code: "https://github.com/Imageomics/bioclip-2"
title: "BioCLIP 2"
version: 1.0.1
type: software
references:
  - authors:
      - family-names: Stevens
        given-names: Samuel
      - family-names: Wu
        given-names: Jiaman
      - family-names: Thompson
        given-names: "Matthew J."
      - family-names: Campolongo
        given-names: "Elizabeth G."
      - family-names: Song
        given-names: "Chan Hee"
      - family-names: Carlyn
        given-names: "David Edward"
    date-released: "2024-09-19"
    doi: "10.5281/zenodo.10895870"
    license: MIT
    repository-code: "https://github.com/Imageomics/bioclip"
    title: BioCLIP
    version: 1.0.2
    type: software
  - authors:
      - family-names: Stevens
        given-names: Samuel
      - family-names: Wu
        given-names: Jiaman
      - family-names: Thompson
        given-names: "Matthew J."
      - family-names: Campolongo
        given-names: "Elizabeth G."
      - family-names: Song
        given-names: "Chan Hee"
      - family-names: Carlyn
        given-names: "David Edward"
      - family-names: Dong
        given-names: Li
      - family-names: Dahdul
        given-names: "Wasila M"
      - family-names: Stewart
        given-names: Charles
      - family-names: "Berger-Wolf"
        given-names: Tanya
      - family-names: Chao
        given-names: "Wei-Lun"
      - family-names: Su
        given-names: Yu
    title: "BioCLIP: A Vision Foundation Model for the Tree of Life"
    year: 2024
    pages: "19412-19424"
    collection-title: "Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)"
    type: coference-paper
  - authors:
      - family-names: Ilharco
        given-names: Gabriel
      - family-names: Wortsman
        given-names: Mitchell
      - family-names: Wightman
        given-names: Ross
      - family-names: Gordon
        given-names: Cade
      - family-names: Carlini
        given-names: Nicholas
      - family-names: Taori
        given-names: Rohan
      - family-names: Dave
        given-names: Achal
      - family-names: Shankar
        given-names: Vaishaal
      - family-names: Namkoong
        given-names: Hongseok
      - family-names: Miller
        given-names: John
      - family-names: Hajishirzi
        given-names: Hannaneh
      - family-names: Farhadi
        given-names: Ali
      - family-names: Schmidt
        given-names: Ludwig
    title: OpenCLIP
    version: v0.1
    type: software
    doi: "10.5281/zenodo.5143773"
    date-released: "2021-07-28"

GitHub Events

Total
  • Watch event: 20
  • Delete event: 7
  • Issue comment event: 9
  • Push event: 6
  • Public event: 1
  • Pull request review event: 4
  • Pull request event: 8
  • Fork event: 3
  • Create event: 4
Last Year
  • Watch event: 20
  • Delete event: 7
  • Issue comment event: 9
  • Push event: 6
  • Public event: 1
  • Pull request review event: 4
  • Pull request event: 8
  • Fork event: 3
  • Create event: 4

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 0
  • Total pull requests: 6
  • Average time to close issues: N/A
  • Average time to close pull requests: about 9 hours
  • Total issue authors: 0
  • Total pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.17
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 0
  • Pull requests: 6
  • Average time to close issues: N/A
  • Average time to close pull requests: about 9 hours
  • Issue authors: 0
  • Pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.17
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
Pull Request Authors
  • egrace479 (4)
  • vimar-gu (1)
  • dependabot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
documentation (4) dependencies (1) python (1)

Dependencies

pyproject.toml pypi
requirements-training.txt pypi
  • Jinja2 ==3.1.3
  • Markdown ==3.6
  • MarkupSafe ==2.1.5
  • PyYAML ==6.0.1
  • Werkzeug ==3.0.1
  • absl-py ==2.1.0
  • braceexpand ==0.1.7
  • cachetools ==5.3.3
  • certifi ==2024.2.2
  • charset-normalizer ==3.3.2
  • filelock ==3.13.1
  • fsspec ==2024.3.1
  • ftfy ==6.2.0
  • google-auth ==2.29.0
  • google-auth-oauthlib ==1.0.0
  • grpcio ==1.62.1
  • huggingface-hub ==0.21.4
  • idna ==3.6
  • mpmath ==1.3.0
  • networkx ==3.2.1
  • numpy ==1.26.4
  • nvidia-cublas-cu12 ==12.1.3.1
  • nvidia-cuda-cupti-cu12 ==12.1.105
  • nvidia-cuda-nvrtc-cu12 ==12.1.105
  • nvidia-cuda-runtime-cu12 ==12.1.105
  • nvidia-cudnn-cu12 ==8.9.2.26
  • nvidia-cufft-cu12 ==11.0.2.54
  • nvidia-curand-cu12 ==10.3.2.106
  • nvidia-cusolver-cu12 ==11.4.5.107
  • nvidia-cusparse-cu12 ==12.1.0.106
  • nvidia-nccl-cu12 ==2.19.3
  • nvidia-nvjitlink-cu12 ==12.4.99
  • nvidia-nvtx-cu12 ==12.1.105
  • oauthlib ==3.2.2
  • packaging ==24.0
  • pandas ==2.2.1
  • pillow ==10.2.0
  • protobuf ==5.26.0
  • pyasn1 ==0.5.1
  • pyasn1-modules ==0.3.0
  • python-dateutil ==2.9.0.post0
  • pytz ==2024.1
  • regex ==2023.12.25
  • requests ==2.31.0
  • requests-oauthlib ==2.0.0
  • rsa ==4.9
  • safetensors ==0.4.2
  • six ==1.16.0
  • sympy ==1.12
  • tensorboard ==2.14.0
  • tensorboard-data-server ==0.7.2
  • timm ==0.9.16
  • tokenizers ==0.15.2
  • torch ==2.2.1
  • torchvision ==0.17.1
  • tqdm ==4.66.2
  • transformers ==4.39.1
  • triton ==2.2.0
  • typing_extensions ==4.10.0
  • tzdata ==2024.1
  • urllib3 ==2.2.1
  • wcwidth ==0.2.13
  • webdataset ==0.2.86
requirements.txt pypi
  • Jinja2 ==3.1.3
  • MarkupSafe ==2.1.5
  • PyYAML ==6.0.1
  • certifi ==2024.2.2
  • charset-normalizer ==3.3.2
  • cmake ==3.28.4
  • filelock ==3.13.1
  • fsspec ==2024.2.0
  • ftfy ==6.1.1
  • huggingface-hub ==0.21.3
  • idna ==3.6
  • lit ==18.1.2
  • mpmath ==1.3.0
  • networkx ==3.2.1
  • numpy ==1.26.4
  • nvidia-cublas-cu11 ==11.10.3.66
  • nvidia-cuda-cupti-cu11 ==11.7.101
  • nvidia-cuda-nvrtc-cu11 ==11.7.99
  • nvidia-cuda-runtime-cu11 ==11.7.99
  • nvidia-cudnn-cu11 ==8.5.0.96
  • nvidia-cufft-cu11 ==10.9.0.58
  • nvidia-curand-cu11 ==10.2.10.91
  • nvidia-cusolver-cu11 ==11.4.0.1
  • nvidia-cusparse-cu11 ==11.7.4.91
  • nvidia-nccl-cu11 ==2.14.3
  • nvidia-nvtx-cu11 ==11.7.91
  • packaging ==23.2
  • pandas ==2.0.2
  • pillow ==10.2.0
  • python-dateutil ==2.9.0.post0
  • pytz ==2024.1
  • regex ==2023.12.25
  • requests ==2.31.0
  • safetensors ==0.4.2
  • scipy ==1.10.1
  • six ==1.16.0
  • sympy ==1.12
  • tokenizers ==0.15.2
  • torch ==2.0.1
  • torchvision ==0.15.2
  • tqdm ==4.66.2
  • transformers ==4.38.2
  • triton ==2.0.0
  • typing_extensions ==4.10.0
  • tzdata ==2024.1
  • urllib3 ==2.2.1
  • wcwidth ==0.2.13