psat-ml

Machine Learning code for Pan-STARRS and ATLAS

https://github.com/genghisken/psat-ml

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
1 of 2 committers (50.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Machine Learning code for Pan-STARRS and ATLAS

Basic Info

Host: GitHub
Owner: genghisken
License: gpl-3.0
Language: Python
Default Branch: master
Size: 293 KB

Statistics

Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 1

Created about 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

psat-ml

Automatic classification of Pan-STARRS and ATLAS images. Based on the code originally written by Darryl Wright, Ken W. Smith and Amanda Ibsen. Documentation written by Amanda Ibsen.

In a Nutshell:

This repo contains a pipeline to connect to the ATLAS (or PS1) database, get cutouts of difference images, build a data set, train a classifier to differentiate between real and bogus images, and plot the results.

How does it work?

alt text

GetCutOuts

Input options

configFile : .yaml with database credentials
mjds : list of nights 
stampSize : size of cutouts
stampLocation : where to store cutouts
camera : '02a' for Haleakala, '01a' for Maunaloa
downloadthreads : number of threads
stampThreads : number of threads

Explanation

-getATLASTrainingSetCutouts.py: It takes as input a config file, a list of dates (in MJD) and a directory to store the output in. It connects to the ATLAS database using the credentials in the config file and gets all exposures for the given time frame. For each exposure it creates a .txt file containing all x,y positions for the objects in the images and a 40x40 pixels cutout image for each object. It also creates a "good.txt" and a "bad.txt" file, containing the x,y positions for the real and bogus objects, respectively.

-getPS1TrainingSetCutouts.py: Same as the above file, but it connects to the PS1 data base instead.

BuildMLDataset

Input options

good : file with x,y pixel positions for real objects
bad : file with x,y pixel positions for bogus objects
outputFile : .h5 output file
e : extent (default=10)
E : Extension (default=0)
s : skew, how many bogus objects per real ones(default=3)
r : rotation (default=None)
N : normalization function (default='signPreserveNorm')

Explanation

-buildMLDataset.py: It takes as input the good.txt and bad.txt files with all x,y positions for real and bogus objects. From those, it builds an .h5 file containing the features (20x20 pixels of the image) and targets (real or bogus label) to be used later as training set.

KerasTensorflowClassifier

Input options

outputcsv : output csv file
trainingset : .h5 input dataset 
classifierfile : .h5 file to store model (classifier)

Explanation

-kerasTensorflowClassifier.py: It takes as input an .h5 file with the training set and a path to store a classifier as an .h5 file. If the model doesn't exist yet, it creates it, trains it and classifies a test set. It returns a .csv file containing the targets and scores for all images. The classifier used is a CNN with the following architecture:

alt text

PlotResults

Input options

inputFiles : csv files to be plotted, with both target and score for each object
outputFile : output .png file with the plots

Explanation

-plotResults.py: It takes as input a csv file with the scores and targets for all images and plots the ROC curve and the Detection error tradeoff graph for the data set.

Some results

ROC curve and trade-off plots for ATLAS test data-set

alt text

Recall for 'confirmed' and 'good' transients

alt text

How to run the pipeline?

When trying to run one task, the pipeline will search for the necessary resources to complete it and try to run it. If it doesn't find them, it'll run the task that's needed to produce those resources and will keep doing this recursively until it can run the task.

To run a task:

python atlasClassificationPipeline.py Name_of_Task --local-scheduler --name_of_oition1 option1 ... --name_of_optionN optionN

Examples:

-To run the PlotResults task python atlasClassificationPipeline.py PlotResults --local-scheduler --inputfiles [file1.csv,...,filen.csv] --outputFile output.png## How to run the pipeline? For more information on how to run a pipeline, go check the luigi documentation

To set up:

create virtual environment with python 3.6 and activate it
pip install -r requirements.txt

Owner

Login: genghisken
Kind: user

Twitter: TheGenghisKen
Repositories: 15
Profile: https://github.com/genghisken

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Smith"
  given-names: "Ken W."
  orcid: "https://orcid.org/0000-0001-9535-3199"
- family-names: "Wright"
  given-names: "Darryl E."
- family-names: "Ibsen"
  given-names: "Amanda"
title: "psat-ml"
version: 0.1.0
doi: 10.5281/zenodo.10869720
date-released: 2024-03-25
url: "https://github.com/genghisken/psat-ml"

GitHub Events

Total

Watch event: 1
Push event: 1

Last Year

Watch event: 1
Push event: 1

Committers

Last synced: about 1 year ago

All Time

Total Commits: 39
Total Committers: 2
Avg Commits per committer: 19.5
Development Distribution Score (DDS): 0.026

Past Year

Commits: 6
Committers: 2
Avg Commits per committer: 3.0
Development Distribution Score (DDS): 0.167

Top Committers

Name	Email	Commits
Ken Smith	k**h@q**k	38
joshgithubbin	j**m@g**m	1

Committer Domains (Top 20 + Academic)

qub.ac.uk: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 4 days
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

joshgithubbin (2)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

Keras ==2.0.4
Markdown ==2.6.11
Pillow ==5.2.0
PyMySQL ==0.9.2
PyWavelets ==0.5.2
PyYAML ==3.13
Theano ==1.0.2
Werkzeug ==0.14.1
absl-py ==0.2.2
asn1crypto ==0.24.0
astor ==0.7.1
astropy ==3.0.3
certifi ==2018.4.16
cffi ==1.11.5
chardet ==3.0.4
cloudpickle ==0.5.3
cov-core ==1.15.0
coverage ==4.5.1
cryptography >=2.3
cycler ==0.10.0
dask ==0.18.1
decorator ==4.3.0
dill ==0.2.8.2
docopt ==0.6.2
docutils ==0.14
eventlet ==0.23.0
fundamentals ==1.6.0
gast ==0.2.0
gkutils >=0.2.22
greenlet ==0.4.14
grpcio ==1.13.0
h5py ==2.8.0
idna ==2.7
kiwisolver ==1.0.1
lockfile ==0.12.2
luigi ==2.7.6
matplotlib ==2.2.2
multiprocess ==0.70.6.1
mysqlclient ==1.3.13
networkx ==2.1
nose2 ==0.7.4
numpy ==1.14.5
pandas ==0.23.3
panstamps ==0.5.1
protobuf ==3.6.0
psutil ==5.4.6
pycparser ==2.18
pyparsing ==2.2.0
pyprof2calltree ==1.4.3
python-daemon ==2.1.2
python-dateutil ==2.7.3
pytz ==2018.5
requests ==2.19.1
scikit-image ==0.14.0
scikit-learn ==0.19.2
scipy ==1.1.0
six ==1.11.0
tensorboard ==1.9.0
tensorflow ==1.9.0
termcolor ==1.1.0
threadpool ==1.3.2
toolz ==0.9.0
tornado ==4.5.3
unicodecsv ==0.14.1
urllib3 ==1.23

psat-ml

Science Score: 77.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

psat-ml

In a Nutshell:

How does it work?

GetCutOuts

Input options

Explanation

BuildMLDataset

Input options

Explanation

KerasTensorflowClassifier

Input options

Explanation

PlotResults

Input options

Explanation

Some results

ROC curve and trade-off plots for ATLAS test data-set

Recall for 'confirmed' and 'good' transients

How to run the pipeline?

To run a task:

Examples:

To set up:

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies