ar-flares
Code related to SDO HMI dataset of active regions and flare activity.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 128 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Repository
Code related to SDO HMI dataset of active regions and flare activity.
Basic Info
- Host: GitHub
- Owner: DuckDuckPig
- License: gpl-3.0
- Language: Python
- Default Branch: master
- Size: 1.07 MB
Statistics
- Stars: 7
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
AR-flares
This github repository contains codes related to solar flare prediction using SDO HMI active regions. This code is related to the data described in the Dryad repositories at https://doi.org/10.5061/dryad.jq2bvq898 (reduced resolution preconfigured dataset), https://doi.org/10.5061/dryad.dv41ns23n (full resolution preconfigured dataset), and https://doi.org/10.5061/dryad.qjq2bvqmj (extra images removed in configuration process).
This repository contains code related to general manipulation of the SDO HMI dataset and the use of that for two machine learning problems for flare prediction: 1) a classical machine learning problem using extracted features of magnetic complexity and a support vector machine (SVM) classifier and 2) a deep learning problem using transfer learning on the VGG network.
Requirements: requirements.yml
- This environment file specifies all packages necessary for implementation of the SVM classification code, the VGG classification code, and the general code as described below. Some packages may not be necessary for some code (e.g., tensorflow is not necessary for the SVM classification but is necessary for the VGG classification).
- This environment file does NOT specify the selenium package which is needed by the JSOC_driver.py file. Please see notes below about the fragility of the JSOC_driver.py code.
SVM Classification
Code for the SVM classifier is included in the classifier_SVM/ folder. This code can operate on .fits files or .png files.

Buid_Featureset.py: Main code to extract 29 magnetic complexity features from HMI magnetograms. This code is implemented using the python multiprocessing package, but can be modified for serial implementation.- Edit the lines under
## User Definitionsto specify paths and other parameters. - Outputs a "FeatureFile" in
csvformat with the complexity features, labels (regression and classification), and filename.
- The "FeatureFile" for the preconfigured reduced resolution dataset is available in the Dryad dataset https://doi.org/10.5061/dryad.dv41ns23n (file
Lat60_Lon60_Nans0_C1.0_24hr_png_224_features.csv) and for the full resolution dataset at https://doi.org/10.5061/dryad.jq2bvq898 (fileLat60_Lon60_Nans0_C1.0_24hr_png_224_features.csv). It is recommended that you save the "FeatureFile" in theclassifier_SVM/directory (i.e., the same directory as the SVM code), although subsequent code will allow you to specify the path to those files.
- The "FeatureFile" for the preconfigured reduced resolution dataset is available in the Dryad dataset https://doi.org/10.5061/dryad.dv41ns23n (file
- Relies on
FeaturesetTools.py. - Requires the "AR Dataset":
- The "FlareLabels" file (`C1.024hr224pngLabels.txt
orC1.024hrLabels.txt, available on Dryad at [https://doi.org/10.5061/dryad.dv41ns23n](https://doi.org/10.5061/dryad.dv41ns23n) (reduced resolutionpngfiles) or [https://doi.org/10.5061/dryad.jq2bvq898](https://doi.org/10.5061/dryad.jq2bvq898) (full resolutionfits` files). It is recommended that you save the "FlareLabels" file in the baseAR-flares/directory, although subsequent code will allow you to specify the path to those files. - The corresponding "SDO HMI AR Images" available on Dryad at https://doi.org/10.5061/dryad.dv41ns23n (reduced resolution
pngfiles) or https://doi.org/10.5061/dryad.jq2bvq898 (full resolutionfitsfiles). The location of the "SDO HMI AR Images" will be specified by the user in subsequent code. You may save those data in the baseAR-flares/directory or any other location.
- The "FlareLabels" file (`C1.024hr224pngLabels.txt
- Edit the lines under
FeaturesetTools.py: Helper functions for feature extraction.
- Relies on
FunctionsP3.py.
- Relies on
FunctionsP3.py: Functions to extract magnetic complexity features.AR_Classifier.py: Main code for the SVM classifier.
- Edit the lines under
## User Definitionsto specify paths and other parameters. - Outputs three
csv"FeatureFiles" with train, test, and validation data (i.e., the magnetic complexity features, filename, and label); atxtfile "WeightFile" used for equalization of features, atxtfile "Performance" wih classifier statistics, and apicklefile "Model" with the trained model. - Relies on
FeaturesetTools.py. - Requires the "FeatureFile" output by
Build_Featureset.pyand the "DataSplits" (lists of test and val active regions) available on Dryad (details below)- The "FeatureFile" for the preconfigured reduced resolution dataset
Lat60_Lon60_Nans0_C1.0_24hr_png_224_features.csvis available on Dryad at https://doi.org/10.5061/dryad.dv41ns23n and for the full resolution datasetLat60_Lon60_Nans0_C1.0_24hr_features.csvis available on Dryad at https://doi.org/10.5061/dryad.jq2bvq898. It is recommended that you save the "FeatureFile" in the sameclassifier_SVM/directory (i.e., the same directory as the SVM code), although subsequent code will allow you to specify the path to those files. - The "DataSplits" (lists of test and val active regions)
List_of_AR_in_Test_Data_by_AR.csv,List_of_AR_in_Train_data_by_AR.csv, andList_of_AR_in_Validation_data_by_AR.csvare available on Dryad (https://doi.org/10.5061/dryad.dv41ns23n or https://doi.org/10.5061/dryad.jq2bvq898). It is recommended that you save the "DataSplits" files in the baseAR-flares/directory, although subsequent code will allow you to specify the path to those files. - Note--if the "DataSplits" are not available to the code, the code will randomly select 10% of active regions for the test and val sets; this will not result in the same split as the files available on Dryad.
- The "FeatureFile" for the preconfigured reduced resolution dataset
- Edit the lines under
VGG Classification
Code for the transfer learning of VGG is included in the classifier_VGG/ folder. This code can operate on .fits files or .png files.

Code:
- Build_dataframes.py: Code to generate files that can be read in as dataframes for the tensorflow dataloaders.
- Edit the lines under ## User Definitions to specify paths and other parameters.
- Outputs "Dataframes" in csv format with filename and classification label in the format expected for a tensorflow dataloader.
- The "Dataframes" for the preconfigured reduced resolution dataset Test_Data_by_AR_png_224.csv, Train_Data_by_AR_png_224.csv, and Validation_Data_by_AR_png_224.csv are available on Dryad at https://doi.org/10.5061/dryad.dv41ns23n and for the full resolution dataset Test_Data_by_AR.csv, Train_Data_by_AR.csv, and Validation_Data_by_AR.csv are available on Dryad at https://doi.org/10.5061/dryad.jq2bvq898. It is recommended that you save the "Dataframes" in the classifier_VGG/ directory (i.e., the same directory as the VGG code), although subsequent code will allow you to specify the path to those files.
- Requires the the "FlareLabels" file from the "AR Dataset" and the "DataSplits" (lists of test and val active regions) available on Dryad.
- The "FlareLabels" file (C1.0_24hr_224_png_Labels.txt or C1.0_24hr_Labels.txt) are available on Dryad at https://doi.org/10.5061/dryad.dv41ns23n (reduced resolution png files) or https://doi.org/10.5061/dryad.jq2bvq898 (full resolution fits files). It is recommended that you save the "FlareLabels" file in the base AR-flares/ directory, although subsequent code will allow you to specify the path to those files.
- The "DataSplits" (lists of test and val active regions) `ListofARinTestDatabyAR.csv,ListofARinTraindatabyAR.csv, andListofARinValidationdatabyAR.csvare available on Dryad ([https://doi.org/10.5061/dryad.dv41ns23n](https://doi.org/10.5061/dryad.dv41ns23n) or [https://doi.org/10.5061/dryad.jq2bvq898](https://doi.org/10.5061/dryad.jq2bvq898)). It is recommended that you save the "DataSplits" files in the baseAR-flares/directory, although subsequent code will allow you to specify the path to those files.
-transferlearning.ipynb: jupyter notebook to perform transfer learning with the VGG16 architecture.
- Comments throughout the notebook indicate paths and other paramters that can be specified.
- Displays "Performance" within the notebook and outputs anhdf5files "Model" with the trained model for each epoch (can be configured to only output the final or best model with appropriate options in the tensorflowmodel.fitcall).
- Requires the the "Dataframes" files as output byBuilddataframes.pyand the "SDO HMI AR Images" corresponding to the "Dataframes".
- The "Dataframes" for the preconfigured reduced resolution datasetTestDatabyARpng224.csv,TrainDatabyARpng224.csv, andValidationDatabyARpng224.csvare available on Dryad at [https://doi.org/10.5061/dryad.dv41ns23n](https://doi.org/10.5061/dryad.dv41ns23n) and for the full resolution datasetTestDatabyAR.csv,TrainDatabyAR.csv, andValidationDatabyAR.csvare available on Dryad at [https://doi.org/10.5061/dryad.jq2bvq898](https://doi.org/10.5061/dryad.jq2bvq898). It is recommended that you save the "Dataframes" in theclassifier_VGG/directory (i.e., the same directory as the VGG code), although subsequent code will allow you to specify the path to those files.
- The "SDO HMI AR Images" are available on Dryad at [https://doi.org/10.5061/dryad.dv41ns23n](https://doi.org/10.5061/dryad.dv41ns23n) (reduced resolutionpngfiles) or [https://doi.org/10.5061/dryad.jq2bvq898](https://doi.org/10.5061/dryad.jq2bvq898) (full resolutionfitsfiles)). The location of the "SDO HMI AR Images" will be specified in subsequent code. You may save those data in the baseAR-flares/` directory or any other location.
General code
General code for wrangling the dataset for use in classification are included in the general_code/ folder. If you are using a preconfigured dataset, you do not need to use any of this general code. If you wish to download and/or generate a customized dataset, e.g., with different flare size or prediction window parameters, you will need to use some of this code.

Code:
Code to configure a dataset according to latitute, longitude, NaNs, minimum flare size, and/or time-to-flare:
- customize_dataset.py: Code to generate a customized dataset based on flare size, flare prediction window, latitude, longitude, and number of NaNs. This code will copy the magnetogram images that satisfy the given parameters to a user-specified directory and generate a label file mapping those magnetograms to their flaring behavior. Please be sure you have adequate disk space for this copy.
- Edit the lines in the configuration dictionary cfg.
- Outputs an "EventList" file, the "AR Dataset", and the "DataSplits".
- The "EventList" file specifies a list of flares with information about date, time, NOAA AR, and flare size. This .txt file is generated from the SWPC Event Reports (ER) and used by subsequent code to specify labels for magnetograms as flaring or not. The "EventList" for the 2010-2018 timespan is available on Dryad at https://doi.org/10.5061/dryad.qjq2bvqmj. Save the "EventList" file in the same directory as the customize_dataset.py file to use it as is. If the a file called eventList.txt does not exist in the same directory as customize_dataset.py, one will be created for you--this assumes the existence of the Events/ directory structure available for download at ftp://ftp.swpc.noaa.gov/pub/indices/events/README.
- The "AR Dataset" consists of the "FlareLabels" file (`C1.024hr224pngLabels.txtorC1.024hrLabels.txt), available on Dryad at [https://doi.org/10.5061/dryad.dv41ns23n](https://doi.org/10.5061/dryad.dv41ns23n) (reduced resolutionpngfiles) or [https://doi.org/10.5061/dryad.jq2bvq898](https://doi.org/10.5061/dryad.jq2bvq898) (full resolutionfitsfiles) and the corresponding "SDO HMI AR Images" available on Dryad at [https://doi.org/10.5061/dryad.dv41ns23n](https://doi.org/10.5061/dryad.dv41ns23n) (reduced resolutionpngfiles) or [https://doi.org/10.5061/dryad.jq2bvq898](https://doi.org/10.5061/dryad.jq2bvq898) (full resolutionfits` files). It is recommended that you save the "FlareLabels" file in the base AR-flares/ directory, although subsequent code will allow you to specify the path to those files. The location of the "SDO HMI AR Images" will be specified in subsequent code. You may save those data in the base AR-flares/ directory or any other location.
- The "DataSplits" (lists of test and val active regions) List_of_AR_in_Test_Data_by_AR.csv, List_of_AR_in_Train_data_by_AR.csv, and List_of_AR_in_Validation_data_by_AR.csv are available on Dryad https://doi.org/10.5061/dryad.dv41ns23n or https://doi.org/10.5061/dryad.jq2bvq898). It is recommended that you save the "DataSplits" files in the base AR-flares/ directory, although subsequent code will allow you to specify the path to those files.
- Relies on the "Image Set" and the "Solar Region Summaries (SRS)".
- "Image Set": This code assumes that you have downloaded all magnetogram images within the timespan of interest for the dataset. If you are customizing a dataset for the same timespan (2010-2018) as the preconfigured dataset, you will need all images from the preconfigured full-resolution dataset (https://doi.org/10.5061/dryad.jq2bvq898) and the extra images from https://doi.org/10.5061/dryad.qjq2bvqmj.
- "Solar Region Summaries (SRS)": This code will assume the existence of the SRS/ directory structures as available for download at ftp://ftp.swpc.noaa.gov/pub/warehouse/. Note--The file for 01 January 2012 is missing from the SRS archive. In dataset configuration, all HMI images from this day are excluded from the configured dataset due to lack of information about the latitude and longitude.
Code to download your own dataset:
- SRS_Parse.py: Code to parse the Space Weather Prediction Center (SWPC) Solar Region Summaries (SRS) to generate information used to download magnetogram images of all active regions appearing on disk within the timespan of the data.
- Edit the lines ## User Definitions to specify paths and other parameters.
- Outputs a txt file "ARList" containing a list of all active regions present during the dataset timespan along with the starting date of appearance and number of days of existence. This file is used by the JSOC_Driver.py code to download magnetograms. The ARList.txt file for the timespan 2010-2018 is available here in the general_code/ folder. If you are configuring a dataset for a different time range, you will need to run the SRS_Parse.py code over the SRS data.
- This code assumes that you have downloaded the SRS/ directory from ftp://ftp.swpc.noaa.gov/pub/warehouse/.
- JSOC_Driver.py: Code to automate the interaction with the JSOC LookData webpage (http://jsoc.stanford.edu/ajax/lookdata.html) to download magnetograms. NOTE--this code is extremely fragile and will break with browser driver changes and changes to the underlying html code used for the JSOC webpage. This code is provided as is as a reference for those who may wish to modify the code for their purposes. There is no guarantee that the code provided will currently work. This code will assume the presence of the ARList.txt file (see notes for SRS_Parse.py above).
- Edit the lines under # User define variables: to specify paths and other parameters.
- Downloads and stores the "SDO HMI AR Images" associated with the "ARList" file.
- Relies on the selenium package (not included in the requirements file above) and a driver appropriate for the browser (e.g., geckodriver for firefox).
- Requires the "ARList" file output by SRS_Parse.py or available in the general_code/ directory here.
Owner
- Name: DuckDuckPig
- Login: DuckDuckPig
- Kind: organization
- Repositories: 1
- Profile: https://github.com/DuckDuckPig
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: AR-flares
message: "If you use this software, please cite it as below."
authors:
- given-names: Laura E.
family-names: Boucheron
email: lboucher@nmsu.edu
affiliation: New Mexico State University
orcid: 'https://orcid.org/0000-0002-8187-1566'
- given-names: Ty
family-names: Vincent
affiliation: New Mexico State University
orcid: 'https://orcid.org/0009-0001-9682-8667'
- given-names: Jeremy A.
family-names: Grajeda
affiliation: New Mexico State University
orcid: 'https://orcid.org/0009-0008-3189-8200'
- given-names: Ellery
family-names: Wuest
affiliation: New Mexico State University
orcid: 'https://orcid.org/0009-0000-4665-3128'
identifiers:
- type: doi
value: 10.5281/zenodo.7596222
repository-code: 'https://github.com/DuckDuckPig/AR-flares'
preferred-citation:
type: article
authors:
- given-names: Laura E.
family-names: Boucheron
email: lboucher@nmsu.edu
affiliation: New Mexico State University
orcid: 'https://orcid.org/0000-0002-8187-1566'
- given-names: Ty
family-names: Vincent
affiliation: New Mexico State University
orcid: 'https://orcid.org/0009-0001-9682-8667'
- given-names: Jeremy A.
family-names: Grajeda
affiliation: New Mexico State University
orcid: 'https://orcid.org/0009-0008-3189-8200'
- given-names: Ellery
family-names: Wuest
affiliation: New Mexico State University
orcid: 'https://orcid.org/0009-0000-4665-3128'
doi: "10.1038/s41597-023-02628-8"
journal: "Scientific Data"
start: 825 # First page number
title: "Solar active region magnetogram image dataset for studies of space weather"
issue: 10
year: 2023