Solar Data Tools

Solar Data Tools: a Python library for automated analysis of unlabeled PV data - Published in JOSS (2025)

https://github.com/NREL/solar-data-tools

Science Score: 96.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 13 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
    Organization slacgismo has institutional domain (gismo.slac.stanford.edu)
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 83% confidence
Mathematics Computer Science - 65% confidence
Last synced: 4 months ago · JSON representation

Repository

Some data analysis tools for working with historical PV solar time-series data sets.

Basic Info
Statistics
  • Stars: 83
  • Watchers: 9
  • Forks: 29
  • Open Issues: 3
  • Releases: 38
Created almost 7 years ago · Last pushed 4 months ago
Metadata Files
Readme Contributing License Code of conduct

README.md



Explore our documentation

<a href="https://github.com/slacgismo/solar-data-tools/issues"><strong>Report Issue </strong></a>
<br />
<br />

Repo Status Project Status: Active  The project has reached a stable, usable state and is being actively developed.
pyOpenSci Peer-Reviewed Project has been peer-reviewed by pyOpenSci.
JOSS Paper DOI badge
Python Versions latest release
Latest Release latest release
License license
Build Status documentation build status Actions build status
Publications DOI
PyPI Downloads PyPI downloads
Conda Downloads conda-forge downloads
Test-Coverage test-coverage

Solar Data Tools is an open-source Python library for analyzing PV power (and irradiance) time-series data. It was developed to enable analysis of unlabeled PV data, i.e. with no model, no meteorological data, and no performance index required, by taking a statistical signal processing approach in the algorithms used in the packages main data processing pipeline. Solar Data Tools empowers PV system fleet owners or operators to analyze system performance a hundred times faster even when they only have access to the most basic data streampower output of the system.

Solar Data Tools provides methods for data I/O, cleaning, filtering, plotting, and analysis. These methods are largely automated and require little to no input from the user regardless of system typefrom utility tracking systems to multi-pitch rooftop systems. Head over to our Getting Started pages in our documentation for a demo! For an in-depth tutorial on Solar Data Tools, we recommend taking a look at the recent webinar we did with the DOE's Solar Energy Technologies Office (SETO) with our colleagues at NREL, linked below:


You can also check the notebooks folder in this repo for more examples.

This work is supported by the U.S. Department of Energys Office of Energy Efficiency and Renewable Energy (EERE) under the Solar Energy Technologies Office Award Number 38529.

Install & Setup

Recommended: Install with pip

In a fresh Python virtual environment, simply run:

bash $ pip install solar-data-tools

or if you would like to use MOSEK, install the optional dependency as well:

bash $ pip install "solar-data-tools[mosek]"

Install with conda

[!WARNING] solar-data-tools is now available on conda-forge! You can specify the channel using the -c flag as shown in the examples below. The use of the slacgismo channel is deprecated and packages on that channel will not be up-to-date with the latest releases.

Creating the environment and directly installing the package and its dependencies from the appropriate conda channels:

bash $ conda create -n pvi-user solar-data-tools -c conda-forge

Starting the environment:

bash $ conda activate pvi-user

Stopping the environment:

bash $ conda deactivate

Or alternatively install the package in an already existing environment:

bash $ conda install solar-data-tools -c conda-forge

Solvers

CLARABEL

By default, the CLARABEL solver is used to solve the signal decomposition problems. CLARABEL (as well as other solvers) is compatible with OSD, the modeling language used to solve signal decomposition problems in Solar Data Tools. Both are open source and are dependencies of Solar Data Tools.

MOSEK

MOSEK is a commercial software package. Since it is more stable and offers faster solve times, we provide continuing support for it (with signal decomposition problem formulations using CVXPY). However, you will still need to obtain a license. If installing with pip, you can install the optional MOSEK dependency by running pip install "solar-data-tools[mosek]". If installing from conda, you will have to manually install MOSEK if you desire to use it as conda does not support optional dependencies like pip.

More information about MOSEK and how to obtain a license is available here:

Usage

Users will primarily interact with this software through the DataHandler class. By default, Solar Data Tools uses CLARABEL as the solver all signal decomposition problems. If you would like to specify another solver (such as MOSEK), just pass the keyword argument solver to DataHandler.pipeline with the solver of choice.

```python from solardatatools import DataHandler from solardatatools.dataio import getpvdaqdata

pvsystemdata = getpvdaqdata(sysid=35, apikey='DEMOKEY', year=[2011, 2012, 2013])

dh = DataHandler(pvsystemdata) dh.runpipeline(powercol='dc_power') ``` If everything is working correctly, you should see a run summary like the following

```

total time: 25.99 seconds

Breakdown

Preprocessing 6.76s Cleaning 0.41s Filtering/Summarizing 18.83s Data quality 0.21s Clear day detect 0.44s Clipping detect 15.51s Capacity change detect 2.67s ```

You can also find more in-depth tutorials and guides in our documentation.

Contributing

We welcome contributions of any form! Please see our Contribution Guidelines for more information.

Citing Solar Data Tools

If you use Solar Data Tools in your research, please cite:

Recommended citation

Sara A. Miskovich, Bennet E. Meyers, et al., "Solar Data Tools: a Python library for automated analysis of unlabeled PV data," Journal of Open Source Software, 10(110), 8478, 2025, doi: 10.21105/joss.08478

Citing technical details (e.g., SDT algorithms)

Bennet E. Meyers, PVInsight (Final Technical Report), SLAC Report SLAC-R-1155, 2021, doi: 10.2172/1897181

Bennet E. Meyers, Elpiniki Apostolaki-Iosifidou and Laura Schelhas, "Solar Data Tools: Automatic Solar Data Processing Pipeline," 2020 47th IEEE Photovoltaic Specialists Conference (PVSC), Calgary, AB, Canada, 2020, pp. 0655-0656, doi: 10.1109/PVSC45281.2020.9300847.

Citing a specific version

You can also cite the DOI corresponding to the specific version of Solar Data Tools that you used. Solar Data Tools DOIs are listed at here.

Versioning

We use Semantic Versioning for versioning. For the versions available, see the tags on this repository.

Authors

See also the list of contributors who participated in this project.

Owner

  • Name: SLAC GISMo
  • Login: slacgismo
  • Kind: organization
  • Email: slacgismo@gmail.com
  • Location: SLAC National Accelerator Laboratory, Menlo Park, CA 94025

100% Clean Energy for All

JOSS Publication

Solar Data Tools: a Python library for automated analysis of unlabeled PV data
Published
June 27, 2025
Volume 10, Issue 110, Page 8478
Authors
Sara A. Miskovich ORCID
SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA
Bennet E. Meyers ORCID
SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA
Elpiniki Apostolaki-Iosifidou
SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA
Claire Berschauer
SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA
Chengcheng Ding
Carnegie Mellon University, Pittsburgh, PA 15213, USA
Aramis Dufour
Stanford University, Stanford, CA, 94305, USA
David Jose Florez Rodriguez
Carnegie Mellon University, Pittsburgh, PA 15213, USA
Jonathan Goncalves
SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA
Alejandro Londono-Hurtado
SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA
Victor-Haoyang Lian
Carnegie Mellon University, Pittsburgh, PA 15213, USA
Tristan Lin
Carnegie Mellon University, Pittsburgh, PA 15213, USA
Junlin Luo
Carnegie Mellon University, Pittsburgh, PA 15213, USA
Xiao Ming
Carnegie Mellon University, Pittsburgh, PA 15213, USA
Duncan Ragsdale
SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA
Derin Serbetcioglu
SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA
Shixian Sheng
Carnegie Mellon University, Pittsburgh, PA 15213, USA
Jose St Louis
Carnegie Mellon University, Pittsburgh, PA 15213, USA
Tadatoshi Takahashi
SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA
Nimish Telang
Independent Researcher, USA
Mitchell Victoriano
SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA
Haoxi Zhang
Carnegie Mellon University, Pittsburgh, PA 15213, USA
Nimish Yadav
Carnegie Mellon University, Pittsburgh, PA 15213, USA
Editor
Kyle Niemeyer ORCID
Tags
photovoltaics solar power signal decomposition convex optimization

GitHub Events

Total
  • Create event: 23
  • Release event: 8
  • Issues event: 22
  • Watch event: 18
  • Delete event: 14
  • Issue comment event: 19
  • Push event: 86
  • Pull request review comment event: 3
  • Pull request review event: 12
  • Pull request event: 39
  • Fork event: 3
Last Year
  • Create event: 23
  • Release event: 8
  • Issues event: 22
  • Watch event: 18
  • Delete event: 14
  • Issue comment event: 19
  • Push event: 86
  • Pull request review comment event: 3
  • Pull request review event: 12
  • Pull request event: 39
  • Fork event: 3

Dependencies

requirements.txt pypi
  • Mosek *
  • cvxpy >=1.1.0
  • jupyter *
  • matplotlib *
  • numpy >=1.22.0
  • pandas *
  • pv-system-profiler *
  • pvlib *
  • requests *
  • scikit-learn *
  • scipy *
  • seaborn *
  • statistical-clear-sky *
.github/workflows/build.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/test.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • aws-actions/configure-aws-credentials v1 composite
  • conda-incubator/setup-miniconda v2 composite