galaxies-datasets

Galaxies Datasets is a collection of ready-to-use extragalactic astronomy datasets for use with TensorFlow and other Machine Learning frameworks.

https://github.com/lbignone/galaxies_datasets

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Galaxies Datasets is a collection of ready-to-use extragalactic astronomy datasets for use with TensorFlow and other Machine Learning frameworks.

Basic Info
  • Host: GitHub
  • Owner: lbignone
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 9.66 MB
Statistics
  • Stars: 6
  • Watchers: 1
  • Forks: 2
  • Open Issues: 17
  • Releases: 3
Created over 4 years ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.rst

Galaxies Datasets
=================

|header|

.. |header| image:: header.png
   :alt: Galaxies Datasets

|PyPI| |Status| |Python Version| |License| |Read the Docs| |Tests|
|pre-commit| |Black| |DOI|

.. |PyPI| image:: https://img.shields.io/pypi/v/galaxies_datasets.svg
   :target: https://pypi.org/project/galaxies_datasets/
   :alt: PyPI
.. |Status| image:: https://img.shields.io/pypi/status/galaxies_datasets.svg
   :target: https://pypi.org/project/galaxies_datasets/
   :alt: Status
.. |Python Version| image:: https://img.shields.io/pypi/pyversions/galaxies_datasets
   :target: https://pypi.org/project/galaxies_datasets
   :alt: Python Version

.. |License| image:: https://img.shields.io/pypi/l/galaxies_datasets
   :target: https://opensource.org/licenses/MIT
   :alt: License
.. |Read the Docs| image:: https://img.shields.io/readthedocs/galaxies_datasets/latest.svg?label=Read%20the%20Docs
   :target: https://galaxies_datasets.readthedocs.io/
   :alt: Read the documentation at https://galaxies_datasets.readthedocs.io/
.. |Tests| image:: https://github.com/lbignone/galaxies_datasets/workflows/Tests/badge.svg
   :target: https://github.com/lbignone/galaxies_datasets/actions?workflow=Tests
   :alt: Tests

.. |Codecov| image:: https://codecov.io/gh/lbignone/galaxies_datasets/branch/main/graph/badge.svg
   :target: https://codecov.io/gh/lbignone/galaxies_datasets
   :alt: Codecov
.. |pre-commit| image:: https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white
   :target: https://github.com/pre-commit/pre-commit
   :alt: pre-commit
.. |Black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :target: https://github.com/psf/black
   :alt: Black
.. |DOI| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.5521450.svg
   :target: https://doi.org/10.5281/zenodo.5521450
   :alt: DOI


*Galaxies Datasets* is a collection of ready-to-use extragalactic astronomy datasets
for use with TensorFlow, Jax, and other Machine Learning frameworks.

It follows the `tensorflow_datasets`_ framework, making it very easy to switch
between different datasets. All datasets are exposed as `tf.data.Datasets`_, enabling
easy-to-use and high-performance input pipelines.


Usage
-----

Loading a dataset can be as easy as:

.. code-block:: python

    from galaxies_datasets import datasets
    import tensorflow_datasets as tfds

    # Construct a tf.data.Dataset
    ds = tfds.load("galaxy_zoo_challenge", split="train")

    # Build your input pipeline
    ds = ds.shuffle(1000).batch(128).prefetch(10).take(5)

In the example above:

.. code-block:: python

    from galaxies_datasets import datasets

registers the collection of galactic datasets with the `tensorflow_datasets`_ package
making them available through its API. And that is it! ...Almost.

For more details on tensorflow_datasets check out the `documentation`_.

Some datasets require that you first manually download data. Check each dataset for
instructions.


Datasets
--------

Currently `available datasets`_ focus on galaxy morphology.

They include observational data from the `Galaxy zoo project`_:

- galaxy_zoo_challenge
- galaxy_zoo2
- galaxy_zoo_decals

As well as mock galaxy images from the `EAGLE simulation`_:

- eagle


Installation
------------

You can install *Galaxies Datasets* via pip_ from PyPI_:

.. code:: console

   $ pip install galaxies-datasets


Scripts
-------

*Galaxies Datasets* provides some scripts to download and prepare data. The scripts
are available through a command-line interface powered by `Typer`_.

For example, to download images and data from the EAGLE simulation you could simply do::

    galaxies_datasets eagle download USER SIMULATION

where USER is your username for the EAGLE public database and SIMULATION is the name
of one of the EAGLE simulations.

For all available commands check the `Command-line Interface`_ reference, or run::

    galaxies_datasets --help

The command-line interface also supports automatic completion in all operating
systems, in all the shells (Bash, Zsh, Fish, PowerShell), so that you can just hit
TAB and get the available options or subcommands.

To install automatic completion in bash run::

    galaxies_datasets --install-completion bash


Citation
--------

If you use this software, please cite it as below, in addition to any citation
specific to the used datasets.

.. code:: bibtex

    @software{lucas_bignone_2021_5521451,
        author       = {Lucas Bignone},
        title        = {Galaxies Datasets},
        month        = sep,
        year         = 2021,
        publisher    = {Zenodo},
        version      = {v0.1.1},
        doi          = {10.5281/zenodo.5521450},
        url          = {https://doi.org/10.5281/zenodo.5521450}
    }


Contributing
------------

Contributions are very welcome.
To learn more, see the `Contributor Guide`_.


License
-------

Distributed under the terms of the `MIT license`_,
*Galaxies Datasets* is free and open source software.


Issues
------

If you encounter any problems,
please `file an issue`_ along with a detailed description.


Disclaimer
----------

This is a utility library that downloads and prepares datasets. We do not host
or distribute these datasets, vouch for their quality or fairness, or claim that you
have license to use the dataset. It is your responsibility to determine whether you
have permission to use the dataset under the dataset's license.

If you're a dataset owner and wish to update any part of it (description, citation,
etc.), or do not want your dataset to be included in this library, please get in
touch through a GitHub issue. Thanks for your contribution to the ML community!


Credits
-------

This project was generated from `@cjolowicz`_'s `Hypermodern Python Cookiecutter`_
template.

Icons made by `Freepik `_ from `www.flaticon.com
`_


.. _@cjolowicz: https://github.com/cjolowicz
.. _MIT license: https://opensource.org/licenses/MIT
.. _PyPI: https://pypi.org/
.. _Hypermodern Python Cookiecutter: https://github.com/cjolowicz/cookiecutter-hypermodern-python
.. _file an issue: https://github.com/lbignone/galaxies_datasets/issues
.. _pip: https://pip.pypa.io/
.. _tensorflow_datasets: https://www.tensorflow.org/datasets/
.. _tf.data.Datasets: https://www.tensorflow.org/api_docs/python/tf/data/Dataset
.. _documentation: https://www.tensorflow.org/datasets/overview
.. _Galaxy zoo project: https://www.zooniverse.org/projects/zookeeper/galaxy-zoo/
.. _EAGLE simulation: http://icc.dur.ac.uk/Eagle/
.. _Typer: https://typer.tiangolo.com/
.. github-only
.. _available datasets: docs/datasets.md
.. _Contributor Guide: CONTRIBUTING.rst
.. _Command-line Interface: https://galaxies-datasets.readthedocs.io/en/latest/cli.html
.. _Usage: https://galaxies_datasets.readthedocs.io/en/latest/usage.html

Owner

  • Name: Lucas Bignone
  • Login: lbignone
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Bignone
    given-names: Lucas
    orcid: https://orcid.org/0000-0003-4925-7248
title: "Galaxies Datasets"
version: 0.1.1
doi: 10.5281/zenodo.5521450
date-released: 2021-08-11
repository-code: https://github.com/lbignone/galaxies_datasets

GitHub Events

Total
  • Watch event: 2
  • Delete event: 17
  • Issue comment event: 23
  • Push event: 22
  • Pull request event: 41
  • Create event: 17
Last Year
  • Watch event: 2
  • Delete event: 17
  • Issue comment event: 23
  • Push event: 22
  • Pull request event: 41
  • Create event: 17

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 220
  • Total Committers: 3
  • Avg Commits per committer: 73.333
  • Development Distribution Score (DDS): 0.15
Top Committers
Name Email Commits
dependabot[bot] 4****]@u****m 187
lbignone l****e@g****m 22
Lucas Bignone l****e@i****r 11
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 218
  • Average time to close issues: N/A
  • Average time to close pull requests: 29 days
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.27
  • Merged pull requests: 160
  • Bot issues: 0
  • Bot pull requests: 214
Past Year
  • Issues: 0
  • Pull requests: 29
  • Average time to close issues: N/A
  • Average time to close pull requests: 4 months
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.41
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 26
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (256)
  • lbignone (7)
Top Labels
Issue Labels
Pull Request Labels
dependencies (256) python (178) github_actions (78) bug (4) ci (2)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 14 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 3
  • Total maintainers: 1
pypi.org: galaxies-datasets

Galaxies Datasets

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 14 Last month
Rankings
Dependent packages count: 10.1%
Dependent repos count: 21.6%
Forks count: 22.6%
Average: 24.0%
Stargazers count: 31.9%
Downloads: 34.0%
Maintainers (1)
Last synced: 7 months ago

Dependencies

docs/requirements.txt pypi
  • myst-parser ==0.18.0
  • sphinx ==5.1.1
  • sphinx-click ==4.3.0
  • sphinx-rtd-theme ==1.0.0
poetry.lock pypi
  • 132 dependencies
pyproject.toml pypi
  • Pygments ^2.12.0 develop
  • black ^22.6.0 develop
  • coverage ^6.4 develop
  • darglint ^1.8.1 develop
  • flake8 ^4.0.1 develop
  • flake8-bandit ^2.1.2 develop
  • flake8-bugbear ^22.4.25 develop
  • flake8-docstrings ^1.6.0 develop
  • flake8-rst-docstrings ^0.2.5 develop
  • mypy ^0.921 develop
  • myst-parser ^0.17.2 develop
  • pep8-naming ^0.13.0 develop
  • pre-commit ^2.20.0 develop
  • pre-commit-hooks ^4.3.0 develop
  • pytest ^7.1.2 develop
  • reorder-python-imports ^3.1.0 develop
  • safety ^2.1.1 develop
  • sphinx ^4.3.2 develop
  • sphinx-autobuild ^2021.3.14 develop
  • sphinx-click ^4.2.0 develop
  • sphinx-rtd-theme ^1.0.0 develop
  • typeguard ^2.13.3 develop
  • xdoctest ^1.0.1 develop
  • eagleSqlTools ^2.0.0
  • pandas ^1.3.2
  • python >=3.7.1,<4.0.0
  • tensorflow ^2.4.0
  • tensorflow-datasets ^4.4.0
  • typer >=0.4,<0.7
.github/workflows/labeler.yml actions
  • actions/checkout v3.2.0 composite
  • crazy-max/ghaction-github-labeler v4.1.0 composite
.github/workflows/release.yml actions
  • actions/checkout v3.2.0 composite
  • actions/setup-python v4.4.0 composite
  • pypa/gh-action-pypi-publish v1.6.4 composite
  • release-drafter/release-drafter v5.21.1 composite
  • salsify/action-detect-and-tag-new-version v2.0.3 composite
.github/workflows/tests.yml actions
  • actions/cache v3.2.2 composite
  • actions/checkout v3.2.0 composite
  • actions/setup-python v4.4.0 composite
  • actions/upload-artifact v3 composite