biobench

Computer vision benchmark for evolutionary biology-related tasks.

https://github.com/samuelstevens/biobench

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Computer vision benchmark for evolutionary biology-related tasks.

Basic Info

Host: GitHub
Owner: samuelstevens
License: mit
Language: Python
Default Branch: main
Homepage: http://samuelstevens.me/biobench/
Size: 4.17 MB

Statistics

Stars: 7
Watchers: 1
Forks: 1
Open Issues: 1
Releases: 0

Created over 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme Contributing License Citation

Biology Benchmark (`biobench`)

Coverage

This library is an easy-to-read benchmark for biology-related computer vision tasks.

It aims to make it easy to:

Evaluate new models.
Add new tasks.
Understand meaningful (or not) differences in model performance.

Check out the docs for an introduction.

Getting Started

I use uv for Python which makes it easy to manage Python versions, dependencies, virtual environments, etc.

To install uv, run curl -LsSf https://astral.sh/uv/install.sh | sh.

Then download at least one of the dataset. NeWT is really easy to download.

sh uv run biobench/newt/download.py --dir ./newt

Download it wherever you want on your own filesystem.

Why?

For computational biologists: biobench gives you an overview of how different models perform on different tasks. If you have a concrete task that you need to solve, you can easily write a script that matches other, existing tasks and then evaluate many different models on your task. If you have an idea of a task, you can find the most similar existing task(s) on the leaderboard and compare model performance.

For computer vision researchers: biobench is a realistic set of benchmarks that more accurately reflect how your model will be used by downstream users. If you aim to train a new foundation vision model, be aware that downstream users will likely not fine-tune it, and will instead use the image embeddings to do all sorts of weird things. Your foundation model should output representations that are universally useful; biobench lets you measure to what degree this is true.

Concrete Goals

Easy, fast, reproducible, understandable evaluation of PyTorch computer vision models across a suite of realistic biology-related vision tasks.

Easy: one launch script, with all options documented in the code and in auto-generated web documentation.
Fast: Each evaluation takes at most 1 hour of A100 or A6000 time. There might be $n$ evaluations, so $n$ hours of A100, but it is embarrassingly parallel and the launch script supports easy parallel running and reporting.
Reproducible: the results include instructions to regenerate these results from scratch, assuming access to the biobench Git repo and that web dependencies have not changed.[^web-deps]
Understandable: results are in a machine-readable format, but include a simple human-readable notebook for reading. Common analyses (mean score across all tasks) are included in the notebook and take under one second to run.

[^web-deps]: Web dependencies include things like datasets being available from their original source, Huggingface datasets can be re-downloaded, model checkpoints do not change, etc.

We at Imageomics use this library for testing BioCLIP and other internal models during development. Because of this, there are two main classes of tasks:

Downstream applications. These are tasks like KABR or Beluga whale re-ID. These tasks represent real problems that computer vision systems fail to solve today.
Benchmarks. These are made-up tasks like NeWT that are artificial tasks, created to help us understand how useful a model might be in the real world for similar tasks.

Road Map

Add contributing guide.
Add example images for each task to the docs.
Add 5-shot RareSpecies with simpleshot (like in BioCLIP paper). This is blocked because the Huggingface dataset doesn't work (see this issue).
Add FishVista for localized trait prediction. This is another non-classification task, and we are specifically interested in traits. But it will take more work because we have to match bounding boxes and patch-level features which is challenging after resizes.

Additional Tasks

Counting insects on sticky insect traps Predicting plant stem angle

Contributing New Tasks

We welcome new tasks. Here are a few guidelines for doing that.

Choose a task that offers new signal. We want tasks that:

Uses a sensor or modality we do not cover (thermal, sonar, hyperspectral, LiDAR, microscopy, drone video, and so on),
Introduces a different prediction type (counts, traits, time series, segmentation, ordinal labels),
Or targets an under-represented group or environment (marine life, airborne organisms, underground roots, cell imagery).

Stay within our contraints:

Evaluation must run on frozen image embeddings with a lightweight probe (logistic/linear, small MLP, or similar). See the biobench.registry.VisionBackbone class for the API that models conform to.
A ViT-L/14 checkpoint should finish your task in under two hours on a single A6000 or A100 GPU.
Data must be publicly downloadable and licensed for academic use; we redistribute predictions.

Match the style:

download.py fetches the dataset and verifies checksums.
__init__.py runs the benchmark, defines the bootstrapped evaluation metric.

If the task is simply another RGB species classification challenge, it probably fits better in iNat. Counting fish in noisy sonar frames or predicting tree-ring widths from microscopy slides—those are the kinds of additions we welcome.

Owner

Name: Sam
Login: samuelstevens
Kind: user

Website: https://samuelstevens.me
Repositories: 12
Profile: https://github.com/samuelstevens

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: BioBench
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Samuel
    family-names: Stevens
    email: samuel.robert.stevens@gmail.com
  - given-names: Jianyang
    family-names: Gu
repository-code: 'https://github.com/samuelstevens/biobench/'
url: 'https://samuelstevens.me/biobench/'
abstract: ' Computer vision benchmarks for evolutionary biology-related tasks. '
keywords:
  - benchmarking
  - computer vision
  - evolutionary biology
license: MIT

GitHub Events

Total

Watch event: 4
Delete event: 3
Push event: 104
Pull request event: 7
Fork event: 1
Create event: 5

Last Year

Watch event: 4
Delete event: 3
Push event: 104
Pull request event: 7
Fork event: 1
Create event: 5

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 0
Total pull requests: 5
Average time to close issues: N/A
Average time to close pull requests: 1 minute
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 5
Average time to close issues: N/A
Average time to close pull requests: 1 minute
Issue authors: 0
Pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

samuelstevens (4)
vimar-gu (2)

Top Labels

Issue Labels

Pull Request Labels

codex (4)

Dependencies

pyproject.toml pypi

aim >=3.24.0
beartype >=0.18.5
cloudpickle >=3.0.0
datasets >=2.21.0
gdown >=5.2.0
jaxtyping >=0.2.33
kaggle >=1.6.17
marimo >=0.8.7
matplotlib >=3.9.2
open-clip-torch >=2.26.1
pdoc3 >=0.11.1
polars >=1.6.0
pycocotools >=2.0.8
requests >=2.32.3
scikit-learn >=1.5.1
submitit >=1.5.2
timm >=1.0.9
torch >=2.4.0
torchmetrics >=1.4.1
torchvision >=0.19.1
tqdm >=4.66.5
tyro >=0.8.10
wilds >=2.0.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

biobench

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Biology Benchmark (`biobench`)

Getting Started

Why?

Concrete Goals

Road Map

Additional Tasks

Contributing New Tasks

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

biobench

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Biology Benchmark (biobench)

Getting Started

Why?

Concrete Goals

Road Map

Additional Tasks

Contributing New Tasks

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Biology Benchmark (`biobench`)