radiogalaxydataset
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: iop.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: floriangriese
- License: mit
- Language: Python
- Default Branch: main
- Size: 19.6 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 4
Metadata Files
README.md
Radio Galaxy Dataset
This Radio Galaxy Dataset is a collection and combination of several catalogues using the FIRST radio galaxy survey [1]. To the images from the FIRST radio galaxy survey the following license applies:
"Provenance: The FIRST project team: R.J. Becker, D.H. Helfand, R.L. White M.D. Gregg. S.A. Laurent-Muehleisen. Copyright: 1994, University of California. Permission is granted for publication and reproduction of this material for scholarly, educational, and private non-commercial use. Inquiries for potential commercial uses should be addressed to: Robert Becker, Physics Dept, University of California, Davis, CA 95616:
Further, the following catalogues are included in this dataset: * MiraBest [2], Source * Gendre [3-4], Supplementary Data: mnras0404-1719-SD1.pdf, data tables CoNFIG-1 to CoNFIG-4 * Capetti 2017a [5], Table * Capetti 2017b [6], Table * Baldi 2018 [7], Table * Proctor [8], Table, data from Table 1 with label “WAT” and “NAT”
Examples for the class definitions of FRI, FRII, Compact and Bent are shown below,
with the labels
| classes | Label |
| ----------- | ----------- |
| FRI | 0 |
| FRII | 1 |
| Compact| 2 |
| Bent | 3 |
The dataset has the following total number of samples per class.
| classes/split | FRI | FRII | Compact | Bent | Total | | ----------- | ----------- |----------- |----------- |----------- |-----------| | total | 495 |924 |391 |348 |2158 |
We provide two splitting options for the dataset. The first splitting option (galaxydatah5.zip) provides three splittings in train, valid and test with the following number of sample per class.
| classes/split | FRI | FRII | Compact | Bent | Total | | ----------- | ----------- |----------- |----------- |----------- |-----------| | train | 395 |824 |291 |248 |1758 | | valid | 50 | 50 | 50 | 50 |200 | | test | 50 | 50 | 50 | 50 |200 | | total | 495 |924 |391 |348 |2158 |
The second splitting option (galaxydatacrossvalid0h5.zip to galaxydatacrossvalid4h5.zip and galaxydatacrossvalidtesth5.zip) provides a 5-fold cross validation dataset with a larger test set.
| classes/split | FRI | FRII | Compact | Bent | Total | | ----------- | ----------- |----------- |----------- |----------- |-----------| | 5-fold cross train | 316 | 659 | 232 | 198 |1405 | | 5-fold cross valid | 79 | 165 | 59 | 50 |353 | | test | 100 | 100 | 100 | 100 |400 | | total | 495 |924 |391 |348 |2158 |
Installation usage with pytorch
If you want to use the dataset via the dataset class FIRSTGalaxyData with pytorch, install the necessary packages with
pip3 install -r requirements.txt
first, otherwise you can use the dataset * directly with *.png files on disk or * load the dataset directly from the HDF5 file.
Both options are descibed further below.
Usage with pytorch
from firstgalaxydata import FIRSTGalaxyData
import torchvision.transforms as transforms
transformRGB = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])
data = FIRSTGalaxyData(root="./", selected_split="train", input_data_list=["galaxy_data_h5.h5"],
is_PIL=True, is_RGB=True, transform=transformRGB)
print(data)
This will print out the following output:
Dataset FIRSTGalaxyData
Selected classes: dict_values(['FRI', 'FRII', 'Compact', 'Bent'])
Number of datapoints in total: 1758
Number of datapoint in class FRI: 395
Number of datapoint in class FRII: 824
Number of datapoint in class Compact: 291
Number of datapoint in class Bent: 248
Split: train
Root Location: ./
Transforms (if any): Compose(
ToTensor()
Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
)
Target Transforms (if any): None
Options
With selected_split the data split is selected. Choose either "train" or "valid" or "test".
With selected_classes only data containing the chosen classes is returned. e.g. ["FRI",FRII"] returns only FRI and FRII images.
With selected_catalogues the dataset uses only the selected catalogues. All possible catalogues are listed here:
selected_catalogues= ["Gendre", "MiraBest", "Capetti2017a", "Capetti2017b", "Baldi2018", "Proctor_Tab1"]
data = FIRSTGalaxyData(root="./", selected_split="train", input_data_list=["galaxy_data_h5.h5"], selected_catalogues=selected_catalogues, is_PIL=True, is_RGB=True, transform=transformRGB)
Basic usage with files on disk
You will also find the dataset in the 'galaxydata' folder by unzipping `galaxydata.zip.
It contains the following folder sturcture with *.png images. The most import information will also be part of the file name separated by underscores:
RADECLabelSource.png
E.g.14.084-9.6083MiraBest.png
galaxy_data
│
└───all
│ │ Bent
| | *.png
│ │ Compact
| | *.png
| | FRI
| | *.png
│ │ FRII
| | *.png
│
└───test
│ │ Bent
| | *.png
│ │ Compact
| | *.png
| | FRI
| | *.png
│ │ FRII
| | *.png
│
└───train
│ │ Bent
| | *.png
│ │ Compact
| | *.png
| | FRI
| | *.png
│ │ FRII
| | *.png
│
└───valid
│ │ Bent
| | *.png
│ │ Compact
| | *.png
| | FRI
| | *.png
│ │ FRII
| | *.png
`
Basic usage with HDF5 file
The dataset can also be accessed via the HDF5 file galaxy_data_h5.h5.
Every data entry consists of a group named data_$(i) with i=1...n where n is the total number of data entries.
Each group consists of the following data:
* Img: two-dimensional uint8 array with (300,300)
* Attributes of Img:
* RA right ascension equatorial coordinate system (J2000): double
* DEC declination equatorial coordinate system (J2000): double
* Source: string, ["Gendre", "MiraBest", "Capetti2017a", "Capetti2017b", "Baldi2018", "ProctorTab1"]
* `Filepathliterature: string, relative path to the *.png file in the foldergalaxydata
*Labelliterature: double scalar, 0: ”FRI”, 1: ”FRII”, 2: ”Compact”, 3: ”Bent”
*Split_literature`: string, ["train","test","valid"]
References
[1] R. H. Becker, R. L. White, D. J. Helfand, The FIRST Survey: Faint Images of the Radio Sky at Twenty Centimeters, The Astrophysical Journal 450 (1995) 559.
[2] H. Miraghaei, P. N. Best, The nuclear properties and extended morphologies of powerful radio galaxies: the roles of host galaxy and environment, Monthly Notices of the Royal Astronomical Society (2017) stx007.
[3] M. A. Gendre, P. N. Best, J. V. Wall, The combined nvss-first galaxies (config) sample - ii. comparison of space densities in the fanaroff-riley dichotomy, Monthly Notices of the Royal Astronomical Society (2010).
[4] M. A. Gendre, J. V. Wall, The combined nvss-first galaxies (config) sample - i. sample definition, classification and evolution, Monthly Notices of the Royal Astronomical Society (2008).
[5] A. Capetti, F. Massaro, R. D. Baldi, Fricat: A first catalog of fr i radio galaxies, Astronomy & Astrophysics 598 (2017) A49.
[6] A. Capetti, F. Massaro, R. D. Baldi, Friicat: A first catalog of fr ii radio galaxies, Astronomy & Astrophysics 601 (2017) A81.
[7] R. D. Baldi, A. Capetti, F. Massaro, Fr0cat: a first catalog of fr 0 radio galaxies, Astronomy & Astrophysics 609 (2017) A1.
[8] D. D. Proctor, Morphological annotations for groups in the first database, The Astrophysical Journal Supplement Series 194 (2011) 31.
Owner
- Name: Florian Griese
- Login: floriangriese
- Kind: user
- Repositories: 2
- Profile: https://github.com/floriangriese
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit. # Visit https://bit.ly/cffinit to generate yours today! cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Griese" given-names: "Florian" affiliation: "CDCS" orcid: "https://orcid.org/0000-0003-3309-9783" - family-names: "Kummer" given-names: "Janis" affiliation: "CDCS" orcid: "https://orcid.org/0000-0002-7853-0103" - family-names: "Rustige" given-names: "Lennart" affiliation: "CDCS" orcid: "https://orcid.org/0000-0002-0292-2477" title: "Radio Galaxy Dataset" version: 0.1.1 doi: 10.5281/zenodo.7120632 url: "https://github.com/floriangriese/RadioGalaxyDataset" license: MIT date-released: 2022-10-6
GitHub Events
Total
Last Year
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Florain Griese | f****e@t****e | 21 |
| Florian | f****e@g****e | 16 |
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- Pillow *
- astropy *
- h5py *
- matplotlib *
- numpy *
- setuptools *
- torch *
- torchvision *