biobench
Computer vision benchmark for evolutionary biology-related tasks.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.2%) to scientific vocabulary
Repository
Computer vision benchmark for evolutionary biology-related tasks.
Basic Info
- Host: GitHub
- Owner: samuelstevens
- License: mit
- Language: Python
- Default Branch: main
- Homepage: http://samuelstevens.me/biobench/
- Size: 4.17 MB
Statistics
- Stars: 7
- Watchers: 1
- Forks: 1
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Biology Benchmark (biobench)
This library is an easy-to-read benchmark for biology-related computer vision tasks.
It aims to make it easy to:
- Evaluate new models.
- Add new tasks.
- Understand meaningful (or not) differences in model performance.
Check out the docs for an introduction.
Getting Started
I use uv for Python which makes it easy to manage Python versions, dependencies, virtual environments, etc.
To install uv, run curl -LsSf https://astral.sh/uv/install.sh | sh.
Then download at least one of the dataset. NeWT is really easy to download.
sh
uv run biobench/newt/download.py --dir ./newt
Download it wherever you want on your own filesystem.
Why?
For computational biologists: biobench gives you an overview of how different models perform on different tasks. If you have a concrete task that you need to solve, you can easily write a script that matches other, existing tasks and then evaluate many different models on your task. If you have an idea of a task, you can find the most similar existing task(s) on the leaderboard and compare model performance.
For computer vision researchers: biobench is a realistic set of benchmarks that more accurately reflect how your model will be used by downstream users. If you aim to train a new foundation vision model, be aware that downstream users will likely not fine-tune it, and will instead use the image embeddings to do all sorts of weird things. Your foundation model should output representations that are universally useful; biobench lets you measure to what degree this is true.
Concrete Goals
Easy, fast, reproducible, understandable evaluation of PyTorch computer vision models across a suite of realistic biology-related vision tasks.
- Easy: one launch script, with all options documented in the code and in auto-generated web documentation.
- Fast: Each evaluation takes at most 1 hour of A100 or A6000 time. There might be $n$ evaluations, so $n$ hours of A100, but it is embarrassingly parallel and the launch script supports easy parallel running and reporting.
- Reproducible: the results include instructions to regenerate these results from scratch, assuming access to the
biobenchGit repo and that web dependencies have not changed.[^web-deps] - Understandable: results are in a machine-readable format, but include a simple human-readable notebook for reading. Common analyses (mean score across all tasks) are included in the notebook and take under one second to run.
[^web-deps]: Web dependencies include things like datasets being available from their original source, Huggingface datasets can be re-downloaded, model checkpoints do not change, etc.
We at Imageomics use this library for testing BioCLIP and other internal models during development. Because of this, there are two main classes of tasks:
- Downstream applications. These are tasks like KABR or Beluga whale re-ID. These tasks represent real problems that computer vision systems fail to solve today.
- Benchmarks. These are made-up tasks like NeWT that are artificial tasks, created to help us understand how useful a model might be in the real world for similar tasks.
Road Map
- Add contributing guide.
- Add example images for each task to the docs.
- Add 5-shot RareSpecies with simpleshot (like in BioCLIP paper). This is blocked because the Huggingface dataset doesn't work (see this issue).
- Add FishVista for localized trait prediction. This is another non-classification task, and we are specifically interested in traits. But it will take more work because we have to match bounding boxes and patch-level features which is challenging after resizes.
Additional Tasks
Counting insects on sticky insect traps Predicting plant stem angle
Contributing New Tasks
We welcome new tasks. Here are a few guidelines for doing that.
Choose a task that offers new signal. We want tasks that:
- Uses a sensor or modality we do not cover (thermal, sonar, hyperspectral, LiDAR, microscopy, drone video, and so on),
- Introduces a different prediction type (counts, traits, time series, segmentation, ordinal labels),
- Or targets an under-represented group or environment (marine life, airborne organisms, underground roots, cell imagery).
Stay within our contraints:
- Evaluation must run on frozen image embeddings with a lightweight probe (logistic/linear, small MLP, or similar). See the
biobench.registry.VisionBackboneclass for the API that models conform to. - A ViT-L/14 checkpoint should finish your task in under two hours on a single A6000 or A100 GPU.
- Data must be publicly downloadable and licensed for academic use; we redistribute predictions.
Match the style:
download.pyfetches the dataset and verifies checksums.__init__.pyruns the benchmark, defines the bootstrapped evaluation metric.
If the task is simply another RGB species classification challenge, it probably fits better in iNat. Counting fish in noisy sonar frames or predicting tree-ring widths from microscopy slides—those are the kinds of additions we welcome.
Owner
- Name: Sam
- Login: samuelstevens
- Kind: user
- Website: https://samuelstevens.me
- Repositories: 12
- Profile: https://github.com/samuelstevens
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: BioBench
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Samuel
family-names: Stevens
email: samuel.robert.stevens@gmail.com
- given-names: Jianyang
family-names: Gu
repository-code: 'https://github.com/samuelstevens/biobench/'
url: 'https://samuelstevens.me/biobench/'
abstract: ' Computer vision benchmarks for evolutionary biology-related tasks. '
keywords:
- benchmarking
- computer vision
- evolutionary biology
license: MIT
GitHub Events
Total
- Watch event: 4
- Delete event: 3
- Push event: 104
- Pull request event: 7
- Fork event: 1
- Create event: 5
Last Year
- Watch event: 4
- Delete event: 3
- Push event: 104
- Pull request event: 7
- Fork event: 1
- Create event: 5
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 1 minute
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 1 minute
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- samuelstevens (4)
- vimar-gu (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- aim >=3.24.0
- beartype >=0.18.5
- cloudpickle >=3.0.0
- datasets >=2.21.0
- gdown >=5.2.0
- jaxtyping >=0.2.33
- kaggle >=1.6.17
- marimo >=0.8.7
- matplotlib >=3.9.2
- open-clip-torch >=2.26.1
- pdoc3 >=0.11.1
- polars >=1.6.0
- pycocotools >=2.0.8
- requests >=2.32.3
- scikit-learn >=1.5.1
- submitit >=1.5.2
- timm >=1.0.9
- torch >=2.4.0
- torchmetrics >=1.4.1
- torchvision >=0.19.1
- tqdm >=4.66.5
- tyro >=0.8.10
- wilds >=2.0.0