Zoobot

Zoobot: Adaptable Deep Learning Models for Galaxy Morphology - Published in JOSS (2023)

https://github.com/mwalmsley/zoobot

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: arxiv.org, joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Mathematics Computer Science - 88% confidence
Economics Social Sciences - 63% confidence
Last synced: 4 months ago · JSON representation ·

Repository

Classifies galaxy morphology with Bayesian CNN

Basic Info
  • Host: GitHub
  • Owner: mwalmsley
  • License: gpl-3.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 329 MB
Statistics
  • Stars: 108
  • Watchers: 2
  • Forks: 30
  • Open Issues: 4
  • Releases: 12
Created almost 5 years ago · Last pushed 4 months ago
Metadata Files
Readme License Code of conduct Citation

README.md

Zoobot

Downloads Documentation Status build publish PyPI DOI status ascl:2203.027


:tada: Zoobot 2.0 is now available. Bigger and better models with streamlined finetuning. Blog, paper :tada:


Zoobot classifies galaxy morphology with deep learning. <!-- At Galaxy Zoo, we use Zoobot to help our volunteers classify the galaxies in all our recent catalogues: GZ DECaLS, GZ DESI, GZ Rings and GZ Cosmic Dawn. -->

Zoobot is trained using millions of answers by Galaxy Zoo volunteers. This code will let you retrain Zoobot to accurately solve your own prediction task.

Installation

You can retrain Zoobot in the cloud with a free GPU using this Google Colab notebook. To install locally, keep reading.

Download the code using git:

git clone git@github.com:mwalmsley/zoobot.git

And then install Zoobot (and PyTorch, if not already installed):

pip install -e "zoobot[pytorch]"

This installs the downloaded Zoobot code using pip editable mode so you can easily change the code locally. Zoobot is also available directly from pip (pip install zoobot[option]). Only use this if you are sure you won't be making changes to Zoobot itself. For Google Colab, use pip install zoobot[pytorch_colab]

To use a GPU, you must already have CUDA installed and matching the versions above. I share my install steps here. GPUs are optional - Zoobot will run retrain fine on CPU, just slower.

Quickstart

The Colab notebook is the quickest way to get started. Alternatively, the minimal example below illustrates how Zoobot works.

Let's say you want to find ringed galaxies and you have a small labelled dataset of 500 ringed or not-ringed galaxies. You can retrain Zoobot to find rings like so:

```python import pandas as pd from galaxydatasets.pytorch.galaxydatamodule import CatalogDataModule from zoobot.pytorch.training import finetune

csv with 'ring' column (0 or 1) and 'file_loc' column (path to image)

labelleddf = pd.readcsv('/your/path/somelabelledgalaxies.csv')

datamodule = CatalogDataModule( labelcols=['ring'], catalog=labelleddf, batch_size=32 # will automatically apply default augmentations )

load trained Zoobot model

model = finetune.FinetuneableZoobotClassifier(checkpointloc, numclasses=2)

retrain to find rings

trainer = finetune.gettrainer(savedir) trainer.fit(model, datamodule) ```

Then you can make predict if new galaxies have rings:

```python from zoobot.pytorch.predictions import predictoncatalog

csv with 'file_loc' column (path to image). Zoobot will predict the labels.

unlabelleddf = pd.readcsv('/your/path/someunlabelledgalaxies.csv')

predictoncatalog.predict( unlabelleddf, model, labelcols=['ring'], # only used for saveloc='/your/path/finetunedpredictions.csv' ) ```

Zoobot includes many guides and working examples - see the Getting Started section below.

Getting Started

I suggest starting with the Colab notebook or the worked examples below, which you can copy and adapt.

For context and explanation, see the documentation.

Pretrained models are listed here and available on HuggingFace

Worked Examples

There is more explanation and an API reference on the docs.

(Optional) Install PyTorch with CUDA

If you're not using a GPU, skip this step

I highly recommend using conda (or mamba, same thing but faster) to do this. Conda will handle both creating a new virtual environment (conda create) and installing CUDA (cudatoolkit, cudnn)

CUDA 12.8 for PyTorch 2.7.0:

conda create --name zoobot39_torch python==3.9
conda activate zoobot39_torch
conda install nvidia/label/cuda-12.8.1::cuda
conda install nvidia/label/cuda-12.8.1::cuda-toolkit

Recent release features (v2.0.0)

  • New in 2.0.1 Add greyscale encoders. Use hf_hub:mwalmsley/zoobot-encoder-greyscale-convnext_nano or similar.
  • New pretrained architectures: ConvNeXT, EfficientNetV2, MaxViT, and more. Each in several sizes.
  • Reworked finetuning procedure. All these architectures are finetuneable through a common method.
  • Reworked finetuning options. Batch norm finetuning removed. Cosine schedule option added.
  • Reworked finetuning saving/loading. Auto-downloads encoder from HuggingFace.
  • Now supports regression finetuning (as well as multi-class and binary). See pytorch/examples/finetuning
  • Updated timm to 0.9.10, allowing latest model architectures. Previously downloaded checkpoints may not load correctly!
  • (internal until published) GZ Evo v2 now includes Cosmic Dawn (HSC H2O). Significant performance improvement on HSC finetuning. Also now includes GZ UKIDSS (dragged from our archives).
  • Updated pytorch to 2.1.0
  • Added support for webdatasets (only recommended for large-scale distributed training)
  • Improved per-question logging when training from scratch
  • Added option to compile encoder for max speed (not recommended for finetuning, only for pretraining).
  • Deprecates TensorFlow. The CS research community focuses on PyTorch and new frameworks like JAX.

Contributions are very welcome and will be credited in any future work. Please get in touch! See CODEOFCONDUCT.md for more.

Benchmarks and Replication - Training from Scratch

The benchmarks folder contains slurm and Python scripts to train Zoobot 1.0 from scratch.

Training Zoobot using the GZ DECaLS dataset option will create models very similar to those used for the GZ DECaLS catalogue and shared with the early versions of this repo. The GZ DESI Zoobot model is trained on additional data (GZD-1, GZD-2), as the GZ Evo Zoobot model (GZD-1/2/5, Hubble, Candels, GZ2).

Pretraining is becoming increasingly complex and is now partially refactored out to a separate repository. We are gradually migrating this zoobot repository to focus on finetuning.

Citing

If you use this software, or otherwise wish to cite Zoobot as a software package, please use the JOSS paper:

@article{Walmsley2023, doi = {10.21105/joss.05312}, url = {https://doi.org/10.21105/joss.05312}, year = {2023}, publisher = {The Open Journal}, volume = {8}, number = {85}, pages = {5312}, author = {Mike Walmsley and Campbell Allen and Ben Aussel and Micah Bowles and Kasia Gregorowicz and Inigo Val Slijepcevic and Chris J. Lintott and Anna M. m. Scaife and Maja Jabłońska and Kosio Karchev and Denise Lanzieri and Devina Mohan and David O’Ryan and Bharath Saiguhan and Crisel Suárez and Nicolás Guerra-Varas and Renuka Velu}, title = {Zoobot: Adaptable Deep Learning Models for Galaxy Morphology}, journal = {Journal of Open Source Software} } 

You might be interested in reading papers using Zoobot:

Many other works use Zoobot indirectly via the Galaxy Zoo DECaLS and Galaxy Zoo DESI morphology catalogs, for example:

Zoobot is deployed on the Euclid pipeline to produce the OU-MER morphology catalog. This is available as part of each Euclid data release (currently internal only, public release of Q1 data anticipated in Q2 2025).

Owner

  • Name: Mike Walmsley
  • Login: mwalmsley
  • Kind: user
  • Location: Toronto
  • Company: University of Toronto, @zooniverse

JOSS Publication

Zoobot: Adaptable Deep Learning Models for Galaxy Morphology
Published
May 08, 2023
Volume 8, Issue 85, Page 5312
Authors
Mike Walmsley ORCID
Jodrell Bank Centre for Astrophysics, Department of Physics and Astronomy, University of Manchester, Manchester, UK
Campbell Allen
Zooniverse.org, University of Oxford, Oxford, UK
Ben Aussel ORCID
Institut für Planetologie, Westfälische Wilhelms-Universität Münster, Münster, Germany
Micah Bowles ORCID
Jodrell Bank Centre for Astrophysics, Department of Physics and Astronomy, University of Manchester, Manchester, UK
Kasia Gregorowicz ORCID
Astronomical Observatory of the University of Warsaw, Warsaw, Poland
Inigo Val Slijepcevic ORCID
Jodrell Bank Centre for Astrophysics, Department of Physics and Astronomy, University of Manchester, Manchester, UK
Chris J. Lintott ORCID
Oxford Astrophysics, Department of Physics, University of Oxford, Oxford, UK
Anna M. m. Scaife ORCID
Jodrell Bank Centre for Astrophysics, Department of Physics and Astronomy, University of Manchester, Manchester, UK, The Alan Turing Institute, London, UK
Maja Jabłońska ORCID
Astronomical Observatory of the University of Warsaw, Warsaw, Poland
Kosio Karchev ORCID
Theoretical and Scientific Data Science Group, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste Italy
Denise Lanzieri ORCID
Université Paris Cité, Université Paris-Saclay, CEA, CNRS, AIM, Gif-sur-Yvette, France
Devina Mohan ORCID
Jodrell Bank Centre for Astrophysics, Department of Physics and Astronomy, University of Manchester, Manchester, UK
David O’Ryan ORCID
Department of Physics, Lancaster University, Lancaster, UK
Bharath Saiguhan ORCID
Physical Research Laboratory, Navrangpura, Ahmedabad, India
Crisel Suárez ORCID
Vanderbilt University, Nashville, USA, Center for Astrophysics | Harvard & Smithsonian, Cambridge, USA
Nicolás Guerra-Varas ORCID
Dipartimento di Fisica, Università di Roma "Tor Vergata", Roma, Italy, Department of Astronomy, Faculty of Mathematics, University of Belgrade, Belgrade, Serbia
Renuka Velu
Ruprecht Karl University of Heidelberg, Germany
Editor
Paul La Plante ORCID
Tags
astronomy deep learning galaxy morphology statistics citizen science

Citation (CITATION.cff)

cff-version: 1.2.0
title: >-
  Zoobot: Adaptable Deep Learning Models for Galaxy Morphology
message: >-
  "Please cite the following work when using this software: https://joss.theoj.org/papers/10.21105/joss.05312#"
type: software
authors:
  - family-names: Walmsley
    given-names: Mike
    orcid: https://orcid.org/0000-0002-6408-4181
  - family-names: Allen
    given-names: Campbell
  - family-names: Aussel
    given-names: Ben
    orcid: https://orcid.org/0000-0003-2592-6806
  - family-names: Bowles
    given-names: Micah
    orcid: https://orcid.org/0000-0001-5838-8405
  - family-names: Gregorowicz
    given-names: Kasia
    orcid: https://orcid.org/0009-0003-0023-6240
  - family-names: Slijepcevic
    given-names: Inigo Val
    orcid: https://orcid.org/0000-0002-7056-9599
  - family-names: Lintott
    given-names: Chris J.
    orcid: https://orcid.org/0000-0001-5578-359X
  - family-names: Scaife
    given-names: Anna M. M.
    orcid: https://orcid.org/0000-0002-5364-2301
  - family-names: Jabłońska
    given-names: Maja
    orcid: https://orcid.org/0000-0001-6962-4979
  - family-names: Karchev
    given-names: Kosio
    orcid: https://orcid.org/0000-0001-9344-736X
  - family-names: Lanzieri
    given-names: Denise
    orcid: https://orcid.org/0000-0003-2787-1634
  - family-names: Mohan
    given-names: Devina
    orcid: https://orcid.org/0000-0002-8566-7968
  - family-names: O’Ryan
    given-names: David
    orcid: https://orcid.org/0000-0003-1217-4617
  - family-names: Saiguhan
    given-names: Bharath
    orcid: https://orcid.org/0000-0001-7580-364X
  - family-names: Suárez
    given-names: Crisel
    orcid: https://orcid.org/0000-0001-5243-7659
  - family-names: Guerra-Varas
    given-names: Nicolás
    orcid: https://orcid.org/0000-0002-9718-6352
  - family-names: Velu
    given-names: Renuka
identifiers:
  - type: doi
    value: 10.21105/joss.05312
  - type: url
    value: 'https://joss.theoj.org/papers/10.21105/joss.05312#'
  - type: ascl-id
    value: "2203.027"
repository-code: 'https://github.com/mwalmsley/zoobot'
abstract: >-
  Zoobot classifies galaxy morphology with Bayesian CNN.
  Deep learning models were trained on volunteer
  classifications; these models were able to both learn from
  uncertain volunteer responses and predict full posteriors
  (rather than point estimates) for what volunteers would
  have said. The code reproduces and improves Galaxy Zoo
  DECaLS automated classifications, and can be finetuned for
  new tasks.
keywords:
  - galaxies
  - deep learning
  - morphology
  - astronomy
license: GPL-3.0
preferred-citation:
  type: article
  authors:
  - family-names: Walmsley
    given-names: Mike
    orcid: https://orcid.org/0000-0002-6408-4181
  - family-names: Allen
    given-names: Campbell
  - family-names: Aussel
    given-names: Ben
    orcid: https://orcid.org/0000-0003-2592-6806
  - family-names: Bowles
    given-names: Micah
    orcid: https://orcid.org/0000-0001-5838-8405
  - family-names: Gregorowicz
    given-names: Kasia
    orcid: https://orcid.org/0009-0003-0023-6240
  - family-names: Slijepcevic
    given-names: Inigo Val
    orcid: https://orcid.org/0000-0002-7056-9599
  - family-names: Lintott
    given-names: Chris J.
    orcid: https://orcid.org/0000-0001-5578-359X
  - family-names: Scaife
    given-names: Anna M. M.
    orcid: https://orcid.org/0000-0002-5364-2301
  - family-names: Jabłońska
    given-names: Maja
    orcid: https://orcid.org/0000-0001-6962-4979
  - family-names: Karchev
    given-names: Kosio
    orcid: https://orcid.org/0000-0001-9344-736X
  - family-names: Lanzieri
    given-names: Denise
    orcid: https://orcid.org/0000-0003-2787-1634
  - family-names: Mohan
    given-names: Devina
    orcid: https://orcid.org/0000-0002-8566-7968
  - family-names: O’Ryan
    given-names: David
    orcid: https://orcid.org/0000-0003-1217-4617
  - family-names: Saiguhan
    given-names: Bharath
    orcid: https://orcid.org/0000-0001-7580-364X
  - family-names: Suárez
    given-names: Crisel
    orcid: https://orcid.org/0000-0001-5243-7659
  - family-names: Guerra-Varas
    given-names: Nicolás
    orcid: https://orcid.org/0000-0002-9718-6352
  - family-names: Velu
    given-names: Renuka
  doi: "10.21105/joss.05312"
  url: https://doi.org/10.21105/joss.05312
  title: "Zoobot: Adaptable Deep Learning Models for Galaxy Morphology"
  journal: "Journal of Open Source Software"
  year: 2023
  month: 5
  volume: 8
  number: 85
  pages: 5312

GitHub Events

Total
  • Create event: 2
  • Release event: 1
  • Issues event: 2
  • Watch event: 24
  • Member event: 2
  • Issue comment event: 2
  • Push event: 72
  • Pull request review event: 1
  • Pull request event: 7
  • Fork event: 6
Last Year
  • Create event: 2
  • Release event: 1
  • Issues event: 2
  • Watch event: 24
  • Member event: 2
  • Issue comment event: 2
  • Push event: 72
  • Pull request review event: 1
  • Pull request event: 7
  • Fork event: 6

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 1,529
  • Total Committers: 8
  • Avg Commits per committer: 191.125
  • Development Distribution Score (DDS): 0.071
Past Year
  • Commits: 74
  • Committers: 2
  • Avg Commits per committer: 37.0
  • Development Distribution Score (DDS): 0.014
Top Committers
Name Email Commits
Mike Walmsley w****1@g****m 1,420
Mike Walmsley w****1@g****m 46
Campbell Allen c****n@g****m 45
Inigo Val Slijepcevic i****l@g****m 6
mb010 5****0 5
Maja Jabłońska m****a@g****m 4
Mike Walmsley w****l@J****l 2
Logan 9****e 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 31
  • Total pull requests: 95
  • Average time to close issues: 4 months
  • Average time to close pull requests: 11 days
  • Total issue authors: 11
  • Total pull request authors: 9
  • Average comments per issue: 1.26
  • Average comments per pull request: 0.28
  • Merged pull requests: 84
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 7
  • Average time to close issues: N/A
  • Average time to close pull requests: 17 days
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mwalmsley (13)
  • katgre (3)
  • criselsuarez (3)
  • camallen (3)
  • mkurzner (2)
  • ClarkGuilty (1)
  • SauravMaheshkar (1)
  • igorkolesnikov13 (1)
  • BaranovMykola (1)
  • NicoGalvarino (1)
  • crhea93 (1)
Pull Request Authors
  • mwalmsley (73)
  • camallen (15)
  • mb010 (4)
  • inigoval (2)
  • Logan-Locke (2)
  • SauravMaheshkar (1)
  • BSGalvan (1)
  • katgre (1)
  • maja-jablonska (1)
Top Labels
Issue Labels
enhancement (6) help wanted (2) bug (1)
Pull Request Labels
enhancement (2) bug (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 161 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 11
  • Total maintainers: 1
pypi.org: zoobot

Galaxy morphology classifiers

  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 161 Last month
Rankings
Stargazers count: 8.5%
Forks count: 8.7%
Dependent packages count: 10.1%
Average: 13.5%
Downloads: 18.8%
Dependent repos count: 21.6%
Maintainers (1)
Last synced: 4 months ago

Dependencies

docker-compose-tf.yml docker
  • zoobot tensorflow
docker-compose.yml docker
  • zoobot cuda
docs/requirements.txt pypi
  • Sphinx *
  • astropy *
  • boto3 *
  • furo *
  • keras_applications *
  • matplotlib *
  • numpy *
  • pandas *
  • pillow *
  • pyarrow *
  • python-dateutil ==2.8.1
  • scikit-image *
  • scikit-learn *
  • scipy *
  • seaborn *
  • sphinxcontrib-napoleon *
  • statsmodels *
  • tensorflow >=2.3
  • tensorflow_probability >=0.11
  • tqdm *
  • wandb *
setup.py pypi
  • for *
  • matplotlib *
  • numpy *
  • pandas *
  • pillow *
  • pyarrow *
  • scikit-image *
  • scikit-learn *
  • scipy *
  • statsmodels *
  • tqdm *
  • wandb *
.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/run_CI.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
Dockerfile docker
  • nvidia/cuda 11.3.1-base-ubuntu20.04 build