fpca-load-tools

Functional Principal Component Analysis and Functional Regression Tools for Electricity Load Curves

https://github.com/berrieslab/fpca-load-tools

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: sciencedirect.com, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Functional Principal Component Analysis and Functional Regression Tools for Electricity Load Curves

Basic Info
  • Host: GitHub
  • Owner: BerriesLab
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 3.67 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Citation

README.md

Functional Principal Component Analysis and Functional Regression Tools for Electricity Load Curves

DOI

Based on the work by D. Beretta et al., Sustainable Energy, Grids and Networks, Volume 21, March 2020, 100308. Readers are encouraged to consult the manuscript to master the methodology.

These tools are designed for scientists, researchers, and engineers working in all fields of electrical grid optimization. They provide capabilities to apply FPCA to daily electricity load curves and predict future consumption patterns based on historical data.

Installation

The fpca-load-tools can be installed either via pip or by cloning this repository.

To install the tools via pip, use the following commands based on your operating system:

  • On Unix/Mac: Open a terminal and run: ```bash pip install fpca-load-tools
  • On Windows: Open Command Prompt or PowerShell and run: ```bash pip install fpca-load-tools

To install the app by cloning the repository to your local machine, follow these steps:

  • Clone the repository: ```bash git clone https://github.com/BerriesLab/fpca-load-tools.git
  • Navigate to the project directory: ```bash cd project-name
  • Install the required dependencies: ```bash pip install -r requirements.txt

Overview

fpca-load-tools is designed around three main classes:

  • ElectricityLoadTimeSeries: Manages time series data within a Pandas DataFrame, including pre-processing operations such as filtering for complete time series and augmenting time series with calendar information.
  • ElectricityLoadFPCA: Applies Functional Principal Component Analysis (FPCA) to daily electricity load curves. This class requires an ElectricityLoadTimeSeries object as attribute.
    • Note 1: The ElectricityLoadTimeSeries object is stored as a reference in ElectricityLoadFPCA. Therefore, any changes made to the ElectricityLoadTimeSeries object outside ElectricityLoadFPCA will also affect the data being processed by the FPCA.
    • Note 2: The results of the FPCA, including the scores, are stored in the results attribute of the class.
  • ElectricityLoadRegression: Trains a model and predicts daily electricity load curves using FPCA data. It requires an ElectricityLoadFPCA object as attribute.
    • Note 1: The ElectricityLoadFPCA object is stored as a reference in ElectricityLoadRegression.Therefore, any modifications to the ElectricityLoadFPCA object outside ElectricityLoadRegression will affect the data being processed by the regressor.
    • Note 2: The results of the training are stored in the class attributes 'model' and 'scaler'.

The following figure is a graphical representation of the classes with their own attributes and methods.

```mermaid classDiagram class ElectricityLoadTimeSeries { • ts: pd.DataFrame • filtercompletedata() • filtercompletedays() • filtercompletemonths() • filtercompleteyears() • filternonnullentries() • resampledays() • augmenttimeserieswithdayoftheweek() • augmenttimeserieswithyearmonthday() • dropyearmonthday() • convertutctolocaltimestamp() • sort() • savetimeseries() • loadtimeseries() • loadexampleentsoetransparency() } class ElectricityLoadFPCA { • ts: ElectricityLoadTimeSeries • results: ElectricityLoadFPCAResults • applyfpcatoalldaysgroupedbydate() • applyfpcatoalldaysgroupedbyweekday() • applyfpcatoalldaysgroupedbymonth() • plotscoresvsdayoftheweek() • plotscoresvsmonthoftheyear() • plotcdfofexplainedvariability() • plotfpc() • plotfunctionalboxplot() • savefpcaresults() • loadfpcaresults() } class ElectricityLoadFPCAResults { • day = None • dayoftheweek = None • monthoftheyear = None } class ElectricityLoadRegression { • fpca: ElectricityLoadFPCA • model: dict[LinearRegression] • scaler: StandardScaler • trainlinearmodel() • predictdailyeletricityloadcurve() • savemodel() • load_model() } ElectricityLoadFPCAResults --|> ElectricityLoadFPCA ElectricityLoadFPCA --|> ElectricityLoadRegression ElectricityLoadTimeSeries --|> ElectricityLoadFPCA

```

Time Series

Loading time series

Users can load time series from CSV files using the load_time_series() method of the ElectricityLoadTimeSeries class. The expected data structure in the CSV file is:

| utctimestamp | load | feature1 | feature2 | ... | featuren | |---------------|------|-----------|-----------|-----|-----------| | ... | ... | ... | ... | ... | ... |

  • utc_timestamp: The timestamp in Coordinated Universal Time (UTC) or Greenwich Mean Time (GMT).
  • load: The electricity load measurement.
  • feature1 to featuren: Additional features for analysis and/or prediction.

Upon loading, the CSV file is converted into a Pandas DataFrame with a DateTimeIndex based on the utc_timestamp.

Users can load multiple files and features as needed. The method automatically merges new CSV files with the existing DataFrame in memory on the DateTimeIndex. Users should ensure that only one column named load is present in memory. To help the user, load_time_series() allows to select which columns to load from teh CSV file and to choose the names for these columns in the destination DataFrame. If multiple columns with the same name are loaded, Pandas will handle them by renaming the new columns with suffixes (e.g., columnnamer).

An example of meteorological time series data that could be merged with the electricity load time series is:

| utctimestamp | temperature | radiation | relativehumidity | |---------------|-------------|-----------|-------------------| | ... | ... | ... | ... |

To save time series data to CSV files, users can use the save_time_series() method.

Converting between UTC and Local time

When studying electricity load time series, the choice between using UTC (Coordinated Universal Time) and local time depends on the objectives of the analysis and the nature of the data. For standardization purposes, such as comparing electricity load across different time zones, using UTC provides a uniform time reference and simplifies time zone conversions. However, if the goal is to investigate consumer behavior, local time may be more relevant since electricity load often correlates with human activities and routines, which follow local time patterns (e.g., peak load times during mornings and evenings). Similarly, for operational planning, such as scheduling generation or demand response activities, local time aligns better with the actual timing of events and conditions experienced by consumers and grid operators.

To facilitate this, the ElectricityLoadTimeSeries class includes the method convert_utc_to_local_timestamp, which converts the UTC DateTimeIndex to the corresponding local timestamp. This method requires the user to specify the geographical area in Olson Timezone format.

It is important to note that converting from UTC to local time, including accounting for daylight saving time, can result in days with duplicate entries or missing values. To address these issues, you can either resample the days with duplicates or missing entries or remove days that do not meet completeness and integrity requirements. For detailed guidance on handling these issues, see the section on Filtering complete time series

Filtering complete time-series

To execute FPCA and predict future electricity load curves, it is essential that the dataset is complete. The completeness criteria are as follows:

  • Year Completeness: A year is considered complete if the number of months with non-null entries meets or exceeds a tolerance percentage of the expected number of months, which is 12. By default, this tolerance level is set to 11/12.
  • Month Completeness: A month is considered complete if the number of days with non-null entries meets or exceeds a tolerance percentage of the expected number of calendar days. By default, this tolerance level is set to 95% of the month's calendar days.
  • Day Completeness: A day is considered complete if the number of non-null entries meets or exceeds a tolerance percentage of the expected number of entries. By default, this tolerance level is set to 100% of the expected entries.

To filter a complete dataset, the user can use the filter_complete_data() method from the ElectricityLoadTimeSeries class. This method utilizes four sequentially executed methods:

  1. filter_non_null_entries(): Delete all rows with at least one None value.
  2. filter_complete_years(): Remove incomplete years. The default tolerance is set to 11/12.
  3. filter_complete_months(): Remove incomplete months. The default tolerance is set to 95% of the month’s calendar days.
  4. filter_complete_days(): Remove incomplete days. The default tolerance is set to 100% of the mode of the time series grouped by date.

Note 1: As the first step, entries with null data are dropped by the filter_non_null_entries method. This is essential because the subsequent methods only evaluate the DateTimeIndex values, regardless of the columns actual values.

Note 2: When filtering days with a tolerance level less than 100% or converting timestamps from UTC to local time, the resulting time series may include missing values. To address this, the user can use the resample_days() method. This method resamples the time series daily with a user-defined frequency (defaulting to one hour). Missing values are linearly interpolated between their nearest neighbors, and any remaining None values at the beginning or end of an interpolated period are filled with the nearest neighbor value.

FPCA

The standard PCA in scikit-learn expects a 2D data matrix where each row represents a sample and each column represents a feature. In the context of FPCA, the "features" are values of the functions at discretized points.

Applying FPCA

The class ElectricityLoadFPCA offers three methods to apply three different types of FPCAs:

The results from each FPCA are stored in an instance of the ElectricityLoadFPCAResults class, which is the 'results' attribute of ElectricityLoadFPCA.Note that only one result per FPCA type can be stored at a time: performing an analysis again will overwrite any previous results. For example, running apply_fpca_to_all_days_grouped_by_date() a second time will replace the results from the first analysis.

Saving and loading FPCA results

FPCA results can be saved to and loaded from a pickle file on disk using the following methods:

Displaying FPCA results

The ElectricityLoadFPCAResults class provides several plotting methods for visualizing FPCA results, similar to the visualizations reported in D. Beretta et al., Sustainable Energy, Grids and Networks, Volume 21, March 2020, 100308. These methods include:

  • plot_functional_boxplot(): Plots a functional boxplot that overlays all daily load curves with median and interquartile bands.

Figure 1: Functional boxplot for a representative dataset.

  • plot_fpc(): Plots the Functional Principal Components (FPCs), rescaled according to their explained variance ratio.

Figure 2: FPCs of a representative dataset rescaled by their explained variance ratio.

Figure 3: CDF of a representative dataset.

Figure 4: Scores boxplot of a representative dataset vs day of the week.

Figure 5: Scores boxplot of a representative dataset vs month of the year.

Note: All above methods collect the data to plot from the ElectricityLoadFPCA class' attributes.

Functional Regression

FPCA can be integrated into any time-series predictive model to predict daily electricity load curves. Unlike traditional time-series models that predict actual data, FPCA-based models predict the scores of a selected number of Functional Principal Components (FPCs). This approach balances model complexity and explained variability. For more details on this methodology, refer to D. Beretta et al., Sustainable Energy, Grids and Networks, Volume 21, March 2020, 100308.

The model

The functional decomposition allows to cast the electricity load curve of a given day in the form:

$$f^{(i)}(t) = \sum{c^{(i)}k \phik} $$

where $f(t)$ is the electricity load curve of the i-th day, $c^{(i)}k$ is the score of the k-th FPC for the i-th day, and $\phik$ is the k-th FPC of the time series grouped by date. The $c^{(i)}_k$ can be estimated with the linear model:

$$ ck^{(i)} = wk^{(i)} + w{k,1}^{(i)} * x1^{(i)} + w{k,2}^{(i)} * x2^{(i)} + ... + w{k,m}^{(i)} xm^{(i)} $$

where $ck^{(i)}$ is the score of the k-th FPC for the i-th day, $xl^{(i)}$ is the l-th feature for the i-th day, and $w_{k,l}^{(i)}$ is the l-th feature weight for the k-th FPC of the i-th day.

Note: Since the model predicts the FPCs scores, and since the FPCs are daily time series, the features must be averaged over the day, e.g. the average temperature of the day.

Prediction

The class ElectricityLoadRegression handles the prediction process. It can be instantiated with or without passing an instance of ElectricityLoadFPCA.

The ElectricityLoadRegression class provides a method for training a linear model and a method for predicting the electricity load curves. Specifically:

Figure 6: Actual vs predicted scores of FPC1 for a representative dataset.

Figure 7: Actual vs predicted electricity load curve.

Loading and Saving

The ElectricityLoadRegression class includes methods for saving and loading the model parameters and the feature scaler:

  • save_model(): Saves the model and feature scaler to pickle file.
  • load_model(): Loads a previously saved model and feature scaler from pickle file.

Tutorial

Please follow the tutorial to learn how to use fpca-load-tools in practice.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests.

Credits

This app has been developed by D. Beretta, building on the work by D. Beretta et al., Sustainable Energy, Grids and Networks, Volume 21, March 2020, 100308. Please refer to CREDITS.md and CITATION.md for more details.

License

This project is licensed under the GNU License - see the LICENSE file for details.

Owner

  • Name: Davide
  • Login: BerriesLab
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Beretta"
    given-names: "Davide"
    orcid: ""
title: "fpca-load-tools"
version: 1.0.0
doi:
date-released: 2024-08-08
url: "https://github.com/BerriesLab/fpca-load-tools"

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Dependencies

pyproject.toml pypi
requirements.txt pypi
  • matplotlib *
  • numpy *
  • pandas *
  • pytz *
  • scikit-learn *
  • setuptools *
setup.py pypi