Zoobot
Zoobot: Adaptable Deep Learning Models for Galaxy Morphology - Published in JOSS (2023)
Science Score: 98.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 10 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: arxiv.org, joss.theoj.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Scientific Fields
Repository
Classifies galaxy morphology with Bayesian CNN
Basic Info
- Host: GitHub
- Owner: mwalmsley
- License: gpl-3.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 329 MB
Statistics
- Stars: 108
- Watchers: 2
- Forks: 30
- Open Issues: 4
- Releases: 12
Metadata Files
README.md
Zoobot
:tada: Zoobot 2.0 is now available. Bigger and better models with streamlined finetuning. Blog, paper :tada:
Zoobot classifies galaxy morphology with deep learning. <!-- At Galaxy Zoo, we use Zoobot to help our volunteers classify the galaxies in all our recent catalogues: GZ DECaLS, GZ DESI, GZ Rings and GZ Cosmic Dawn. -->
Zoobot is trained using millions of answers by Galaxy Zoo volunteers. This code will let you retrain Zoobot to accurately solve your own prediction task.
- Install
- Quickstart
- Worked Examples
- Pretrained Weights
- Datasets
- Documentation (for understanding/reference)
- Mailing List (for updates)
Installation
You can retrain Zoobot in the cloud with a free GPU using this Google Colab notebook. To install locally, keep reading.
Download the code using git:
git clone git@github.com:mwalmsley/zoobot.git
And then install Zoobot (and PyTorch, if not already installed):
pip install -e "zoobot[pytorch]"
This installs the downloaded Zoobot code using pip editable mode so you can easily change the code locally. Zoobot is also available directly from pip (pip install zoobot[option]). Only use this if you are sure you won't be making changes to Zoobot itself. For Google Colab, use pip install zoobot[pytorch_colab]
To use a GPU, you must already have CUDA installed and matching the versions above. I share my install steps here. GPUs are optional - Zoobot will run retrain fine on CPU, just slower.
Quickstart
The Colab notebook is the quickest way to get started. Alternatively, the minimal example below illustrates how Zoobot works.
Let's say you want to find ringed galaxies and you have a small labelled dataset of 500 ringed or not-ringed galaxies. You can retrain Zoobot to find rings like so:
```python import pandas as pd from galaxydatasets.pytorch.galaxydatamodule import CatalogDataModule from zoobot.pytorch.training import finetune
csv with 'ring' column (0 or 1) and 'file_loc' column (path to image)
labelleddf = pd.readcsv('/your/path/somelabelledgalaxies.csv')
datamodule = CatalogDataModule( labelcols=['ring'], catalog=labelleddf, batch_size=32 # will automatically apply default augmentations )
load trained Zoobot model
model = finetune.FinetuneableZoobotClassifier(checkpointloc, numclasses=2)
retrain to find rings
trainer = finetune.gettrainer(savedir) trainer.fit(model, datamodule) ```
Then you can make predict if new galaxies have rings:
```python from zoobot.pytorch.predictions import predictoncatalog
csv with 'file_loc' column (path to image). Zoobot will predict the labels.
unlabelleddf = pd.readcsv('/your/path/someunlabelledgalaxies.csv')
predictoncatalog.predict( unlabelleddf, model, labelcols=['ring'], # only used for saveloc='/your/path/finetunedpredictions.csv' ) ```
Zoobot includes many guides and working examples - see the Getting Started section below.
Getting Started
I suggest starting with the Colab notebook or the worked examples below, which you can copy and adapt.
For context and explanation, see the documentation.
Pretrained models are listed here and available on HuggingFace
Worked Examples
- pytorch/examples/finetuning/finetunebinaryclassification.py
- pytorch/examples/finetuning/finetunecountsfull_tree.py
- pytorch/examples/representations/get_representations.py
- pytorch/examples/trainmodelon_catalog.py (only necessary to train from scratch)
There is more explanation and an API reference on the docs.
(Optional) Install PyTorch with CUDA
If you're not using a GPU, skip this step
I highly recommend using conda (or mamba, same thing but faster) to do this. Conda will handle both creating a new virtual environment (conda create) and installing CUDA (cudatoolkit, cudnn)
CUDA 12.8 for PyTorch 2.7.0:
conda create --name zoobot39_torch python==3.9
conda activate zoobot39_torch
conda install nvidia/label/cuda-12.8.1::cuda
conda install nvidia/label/cuda-12.8.1::cuda-toolkit
Recent release features (v2.0.0)
- New in 2.0.1 Add greyscale encoders. Use
hf_hub:mwalmsley/zoobot-encoder-greyscale-convnext_nanoor similar. - New pretrained architectures: ConvNeXT, EfficientNetV2, MaxViT, and more. Each in several sizes.
- Reworked finetuning procedure. All these architectures are finetuneable through a common method.
- Reworked finetuning options. Batch norm finetuning removed. Cosine schedule option added.
- Reworked finetuning saving/loading. Auto-downloads encoder from HuggingFace.
- Now supports regression finetuning (as well as multi-class and binary). See
pytorch/examples/finetuning - Updated
timmto 0.9.10, allowing latest model architectures. Previously downloaded checkpoints may not load correctly! - (internal until published) GZ Evo v2 now includes Cosmic Dawn (HSC H2O). Significant performance improvement on HSC finetuning. Also now includes GZ UKIDSS (dragged from our archives).
- Updated
pytorchto2.1.0 - Added support for webdatasets (only recommended for large-scale distributed training)
- Improved per-question logging when training from scratch
- Added option to compile encoder for max speed (not recommended for finetuning, only for pretraining).
- Deprecates TensorFlow. The CS research community focuses on PyTorch and new frameworks like JAX.
Contributions are very welcome and will be credited in any future work. Please get in touch! See CODEOFCONDUCT.md for more.
Benchmarks and Replication - Training from Scratch
The benchmarks folder contains slurm and Python scripts to train Zoobot 1.0 from scratch.
Training Zoobot using the GZ DECaLS dataset option will create models very similar to those used for the GZ DECaLS catalogue and shared with the early versions of this repo. The GZ DESI Zoobot model is trained on additional data (GZD-1, GZD-2), as the GZ Evo Zoobot model (GZD-1/2/5, Hubble, Candels, GZ2).
Pretraining is becoming increasingly complex and is now partially refactored out to a separate repository. We are gradually migrating this zoobot repository to focus on finetuning.
Citing
If you use this software, or otherwise wish to cite Zoobot as a software package, please use the JOSS paper:
@article{Walmsley2023, doi = {10.21105/joss.05312}, url = {https://doi.org/10.21105/joss.05312}, year = {2023}, publisher = {The Open Journal}, volume = {8}, number = {85}, pages = {5312}, author = {Mike Walmsley and Campbell Allen and Ben Aussel and Micah Bowles and Kasia Gregorowicz and Inigo Val Slijepcevic and Chris J. Lintott and Anna M. m. Scaife and Maja Jabłońska and Kosio Karchev and Denise Lanzieri and Devina Mohan and David O’Ryan and Bharath Saiguhan and Crisel Suárez and Nicolás Guerra-Varas and Renuka Velu}, title = {Zoobot: Adaptable Deep Learning Models for Galaxy Morphology}, journal = {Journal of Open Source Software} }
You might be interested in reading papers using Zoobot:
- Galaxy Zoo DECaLS: Detailed visual morphology measurements from volunteers and deep learning for 314,000 galaxies (2022)
- A Comparison of Deep Learning Architectures for Optical Galaxy Morphology Classification (2022)
- Practical Galaxy Morphology Tools from Deep Supervised Representation Learning (2022)
- Towards Foundation Models for Galaxy Morphology (2022)
- Harnessing the Hubble Space Telescope Archives: A Catalogue of 21,926 Interacting Galaxies (2023)
- Galaxy Zoo DESI: Detailed morphology measurements for 8.7M galaxies in the DESI Legacy Imaging Surveys (2023)
- Galaxy mergers in Subaru HSC-SSP: A deep representation learning approach for identification, and the role of environment on merger incidence (2023)
- Rare Galaxy Classes Identified In Foundation Model Representations (2023)
- Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies (2024)
- Transfer learning for galaxy feature detection: Finding Giant Star-forming Clumps in low redshift galaxies using Faster R-CNN (2024)
- Euclid preparation. Measuring detailed galaxy morphologies for Euclid with Machine Learning (2024)
- Scaling Laws for Galaxy Images (2024, preprint)
- Euclid Q1: First visual morphology catalogue (2025, preprint)
- Euclid Q1, A first look at the fraction of bars in massive galaxies at z < 1 (2025, preprint)
- Euclid Q1: The Strong Lensing Discovery Engine A -- System overview and lens catalogue (2025, preprint)
- Euclid Q1. The Strong Lensing Discovery Engine C: Finding lenses with machine learning (2025, preprint)
- Euclid Q1. The Strong Lensing Discovery Engine D -- Double-source-plane lens candidates (2025, preprint)
- Euclid Q1. The Strong Lensing Discovery Engine E -- Ensemble classification of strong gravitational lenses: lessons for Data Release 1 (2025, preprint)
- Euclid: Q1: A census of dwarf galaxies across a range of distances and environments (2025, preprint)
- Euclid Q1: Exploring galaxy morphology across cosmic time through Sersic fits (2025, preprint)
- Galaxy Zoo Evo: 107M volunteer labels for 823k galaxy images (2025, submitted)
Many other works use Zoobot indirectly via the Galaxy Zoo DECaLS and Galaxy Zoo DESI morphology catalogs, for example:
- Galaxy zoo: stronger bars facilitate quenching in star-forming galaxies (2022)
- The Effect of Environment on Galaxy Spiral Arms, Bars, Concentration, and Quenching (2022)
- Galaxy Zoo: kinematics of strongly and weakly barred galaxies (2023)
- Dependence of galactic bars on the tidal density field in the SDSS (2023)
- Galaxy Zoo DESI: large-scale bars as a secular mechanism for triggering AGN (2024)
- Galaxy zoo: stronger bars facilitate quenching in star-forming galaxies (2024, submitted)
- Uncovering Tidal Treasures: Automated Classification of Faint Tidal Features in DECaLS Data (2024, submitted)
Zoobot is deployed on the Euclid pipeline to produce the OU-MER morphology catalog. This is available as part of each Euclid data release (currently internal only, public release of Q1 data anticipated in Q2 2025).
Owner
- Name: Mike Walmsley
- Login: mwalmsley
- Kind: user
- Location: Toronto
- Company: University of Toronto, @zooniverse
- Website: www.walmsley.dev
- Twitter: mike_walmsley_
- Repositories: 53
- Profile: https://github.com/mwalmsley
JOSS Publication
Zoobot: Adaptable Deep Learning Models for Galaxy Morphology
Authors
Jodrell Bank Centre for Astrophysics, Department of Physics and Astronomy, University of Manchester, Manchester, UK
Zooniverse.org, University of Oxford, Oxford, UK
Jodrell Bank Centre for Astrophysics, Department of Physics and Astronomy, University of Manchester, Manchester, UK
Jodrell Bank Centre for Astrophysics, Department of Physics and Astronomy, University of Manchester, Manchester, UK
Jodrell Bank Centre for Astrophysics, Department of Physics and Astronomy, University of Manchester, Manchester, UK, The Alan Turing Institute, London, UK
Theoretical and Scientific Data Science Group, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste Italy
Université Paris Cité, Université Paris-Saclay, CEA, CNRS, AIM, Gif-sur-Yvette, France
Jodrell Bank Centre for Astrophysics, Department of Physics and Astronomy, University of Manchester, Manchester, UK
Vanderbilt University, Nashville, USA, Center for Astrophysics | Harvard & Smithsonian, Cambridge, USA
Dipartimento di Fisica, Università di Roma "Tor Vergata", Roma, Italy, Department of Astronomy, Faculty of Mathematics, University of Belgrade, Belgrade, Serbia
Ruprecht Karl University of Heidelberg, Germany
Tags
astronomy deep learning galaxy morphology statistics citizen scienceCitation (CITATION.cff)
cff-version: 1.2.0
title: >-
Zoobot: Adaptable Deep Learning Models for Galaxy Morphology
message: >-
"Please cite the following work when using this software: https://joss.theoj.org/papers/10.21105/joss.05312#"
type: software
authors:
- family-names: Walmsley
given-names: Mike
orcid: https://orcid.org/0000-0002-6408-4181
- family-names: Allen
given-names: Campbell
- family-names: Aussel
given-names: Ben
orcid: https://orcid.org/0000-0003-2592-6806
- family-names: Bowles
given-names: Micah
orcid: https://orcid.org/0000-0001-5838-8405
- family-names: Gregorowicz
given-names: Kasia
orcid: https://orcid.org/0009-0003-0023-6240
- family-names: Slijepcevic
given-names: Inigo Val
orcid: https://orcid.org/0000-0002-7056-9599
- family-names: Lintott
given-names: Chris J.
orcid: https://orcid.org/0000-0001-5578-359X
- family-names: Scaife
given-names: Anna M. M.
orcid: https://orcid.org/0000-0002-5364-2301
- family-names: Jabłońska
given-names: Maja
orcid: https://orcid.org/0000-0001-6962-4979
- family-names: Karchev
given-names: Kosio
orcid: https://orcid.org/0000-0001-9344-736X
- family-names: Lanzieri
given-names: Denise
orcid: https://orcid.org/0000-0003-2787-1634
- family-names: Mohan
given-names: Devina
orcid: https://orcid.org/0000-0002-8566-7968
- family-names: O’Ryan
given-names: David
orcid: https://orcid.org/0000-0003-1217-4617
- family-names: Saiguhan
given-names: Bharath
orcid: https://orcid.org/0000-0001-7580-364X
- family-names: Suárez
given-names: Crisel
orcid: https://orcid.org/0000-0001-5243-7659
- family-names: Guerra-Varas
given-names: Nicolás
orcid: https://orcid.org/0000-0002-9718-6352
- family-names: Velu
given-names: Renuka
identifiers:
- type: doi
value: 10.21105/joss.05312
- type: url
value: 'https://joss.theoj.org/papers/10.21105/joss.05312#'
- type: ascl-id
value: "2203.027"
repository-code: 'https://github.com/mwalmsley/zoobot'
abstract: >-
Zoobot classifies galaxy morphology with Bayesian CNN.
Deep learning models were trained on volunteer
classifications; these models were able to both learn from
uncertain volunteer responses and predict full posteriors
(rather than point estimates) for what volunteers would
have said. The code reproduces and improves Galaxy Zoo
DECaLS automated classifications, and can be finetuned for
new tasks.
keywords:
- galaxies
- deep learning
- morphology
- astronomy
license: GPL-3.0
preferred-citation:
type: article
authors:
- family-names: Walmsley
given-names: Mike
orcid: https://orcid.org/0000-0002-6408-4181
- family-names: Allen
given-names: Campbell
- family-names: Aussel
given-names: Ben
orcid: https://orcid.org/0000-0003-2592-6806
- family-names: Bowles
given-names: Micah
orcid: https://orcid.org/0000-0001-5838-8405
- family-names: Gregorowicz
given-names: Kasia
orcid: https://orcid.org/0009-0003-0023-6240
- family-names: Slijepcevic
given-names: Inigo Val
orcid: https://orcid.org/0000-0002-7056-9599
- family-names: Lintott
given-names: Chris J.
orcid: https://orcid.org/0000-0001-5578-359X
- family-names: Scaife
given-names: Anna M. M.
orcid: https://orcid.org/0000-0002-5364-2301
- family-names: Jabłońska
given-names: Maja
orcid: https://orcid.org/0000-0001-6962-4979
- family-names: Karchev
given-names: Kosio
orcid: https://orcid.org/0000-0001-9344-736X
- family-names: Lanzieri
given-names: Denise
orcid: https://orcid.org/0000-0003-2787-1634
- family-names: Mohan
given-names: Devina
orcid: https://orcid.org/0000-0002-8566-7968
- family-names: O’Ryan
given-names: David
orcid: https://orcid.org/0000-0003-1217-4617
- family-names: Saiguhan
given-names: Bharath
orcid: https://orcid.org/0000-0001-7580-364X
- family-names: Suárez
given-names: Crisel
orcid: https://orcid.org/0000-0001-5243-7659
- family-names: Guerra-Varas
given-names: Nicolás
orcid: https://orcid.org/0000-0002-9718-6352
- family-names: Velu
given-names: Renuka
doi: "10.21105/joss.05312"
url: https://doi.org/10.21105/joss.05312
title: "Zoobot: Adaptable Deep Learning Models for Galaxy Morphology"
journal: "Journal of Open Source Software"
year: 2023
month: 5
volume: 8
number: 85
pages: 5312
GitHub Events
Total
- Create event: 2
- Release event: 1
- Issues event: 2
- Watch event: 24
- Member event: 2
- Issue comment event: 2
- Push event: 72
- Pull request review event: 1
- Pull request event: 7
- Fork event: 6
Last Year
- Create event: 2
- Release event: 1
- Issues event: 2
- Watch event: 24
- Member event: 2
- Issue comment event: 2
- Push event: 72
- Pull request review event: 1
- Pull request event: 7
- Fork event: 6
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Mike Walmsley | w****1@g****m | 1,420 |
| Mike Walmsley | w****1@g****m | 46 |
| Campbell Allen | c****n@g****m | 45 |
| Inigo Val Slijepcevic | i****l@g****m | 6 |
| mb010 | 5****0 | 5 |
| Maja Jabłońska | m****a@g****m | 4 |
| Mike Walmsley | w****l@J****l | 2 |
| Logan | 9****e | 1 |
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 31
- Total pull requests: 95
- Average time to close issues: 4 months
- Average time to close pull requests: 11 days
- Total issue authors: 11
- Total pull request authors: 9
- Average comments per issue: 1.26
- Average comments per pull request: 0.28
- Merged pull requests: 84
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 7
- Average time to close issues: N/A
- Average time to close pull requests: 17 days
- Issue authors: 1
- Pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mwalmsley (13)
- katgre (3)
- criselsuarez (3)
- camallen (3)
- mkurzner (2)
- ClarkGuilty (1)
- SauravMaheshkar (1)
- igorkolesnikov13 (1)
- BaranovMykola (1)
- NicoGalvarino (1)
- crhea93 (1)
Pull Request Authors
- mwalmsley (73)
- camallen (15)
- mb010 (4)
- inigoval (2)
- Logan-Locke (2)
- SauravMaheshkar (1)
- BSGalvan (1)
- katgre (1)
- maja-jablonska (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 161 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 11
- Total maintainers: 1
pypi.org: zoobot
Galaxy morphology classifiers
- Homepage: https://github.com/mwalmsley/zoobot
- Documentation: https://zoobot.readthedocs.io/
- License: GNU General Public License (GPL)
-
Latest release: 2.9.0
published 5 months ago
Rankings
Maintainers (1)
Dependencies
- zoobot tensorflow
- zoobot cuda
- Sphinx *
- astropy *
- boto3 *
- furo *
- keras_applications *
- matplotlib *
- numpy *
- pandas *
- pillow *
- pyarrow *
- python-dateutil ==2.8.1
- scikit-image *
- scikit-learn *
- scipy *
- seaborn *
- sphinxcontrib-napoleon *
- statsmodels *
- tensorflow >=2.3
- tensorflow_probability >=0.11
- tqdm *
- wandb *
- for *
- matplotlib *
- numpy *
- pandas *
- pillow *
- pyarrow *
- scikit-image *
- scikit-learn *
- scipy *
- statsmodels *
- tqdm *
- wandb *
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pypa/gh-action-pypi-publish release/v1 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- nvidia/cuda 11.3.1-base-ubuntu20.04 build