smana

Repairing tool for time series with weekly seasonality

https://github.com/tobe-analytics/smana

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Repairing tool for time series with weekly seasonality

Basic Info
  • Host: GitHub
  • Owner: ToBe-Analytics
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 245 KB
Statistics
  • Stars: 2
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md



smana: repairing tool for time series with weekly seasonality

What is it?

smana is a Python package useful to restore missing values of a time series with a weekly pattern.

Table of Contents

Main Features

  • Missing values restoring for time series with weekly seasonal pattern
  • Any time series with sub-daily resolution is supported
  • Handling of calendar information on public holidays (if provided by the user)

Dependencies

How it works

This package arises from the need to restore energy time series data, which usually present weekly seasonality and not rarely even a correlation with public holidays. Nevertheless, the implementation is based only on the assumption that the time series shows a weekly pattern, thus this tool can be used to repair data of whatever nature with this seasonal characteristic.

The core of the algorithm is based on STL decomposition ("Seasonal and Trend decomposition using Loess"), a robust method for decomposing time series into trend, seasonal and remainder components, implemented in statsmodels module.

The main method of this package, smana.repair(), aims to restore sequences of missing data (represented as numpy.NaN ) by means of locally approximation of the trend and the seasonal components of the time series; in order to get the seasonality estimation, the algorithm tries to identify a sequence of at least 14 consecutive days of valid data: if it does not exist, linear interpolation or lookup table strategies are iteratively applied (using a ranking criteria on missing-values sequences) until a 14-days sequence appears.

In addition, this tool is able to handle calendar information on public holidays: this feature is useful only if the time series presents a correlation with these specific days, in particular if its daily pattern resemble that of standard week holidays; for this reason, it is recommended to leverage this feature only if this assumption is verified.

How to get it

The source code is currently hosted on GitHub at: https://github.com/ToBe-Analytics/smana

Binary installers for the latest released version are available at the Python Package Index (PyPI).

```sh

PyPI

pip install smana ```

The list of changes to smana between each release can be found here. For full details, see the commit logs at https://github.com/ToBe-Analytics/smana.

Documentation

The package provides the following main method, which implements the whole procedure described:

smana.repair(input_df, scan_column, **datetime_column=None, **trendapproxdays=7, **nonnegative_constraint=False, **holidays_stl=False, **weekholidayint=6, **holidays_column=None, **inplace=False)

This function restores missing values (numpy.NaN) of the time series scan_column in input_df dataframe, with datetime_column as timestamps column, by a process based on the STL decomposition. Optionally, setting holidays_stl to True, it is possible to apply a similar strategy to repair missing data related to public holidays (this procedure is based on week holiday data).

Parameters

  • input_df: pandas.DataFrame
    Input dataframe which collects the time series to be repaired, the datetime series and optionally the column with public holidays information.
  • scan_column: str
    Label of the numeric column of input_df to be restored. Missing values must be represented as numpy.NaN.
  • datetime_column: str, default None
    Label of the datetime column of input_df; aware or naive datetime are supported. If unspecified, input_df.index is considered.
  • trendapproxdays: int, default 7
    Number of days to consider for trend estimation; higher values lead to approximations over longer periods. Integers less than 7 will be replaced by default value. It is not necessary to modify this parameter.
  • nonnegative_constraint: bool, default False
    Set to True to check and repair negative restored values.
  • holidays_stl: bool, default False
    Apply a specific strategy for the restoring of missing values related to public holidays.
  • weekholidayint: int, default 6
    Index corresponding to the week holiday, from 0 (Monday) to 6 (Sunday). This argument is considered only if holidays_stl is set to True.
  • holidays_column: str, default None
    Label of the column which collects holidays information; for each row in input_df, the allowed values are only 0 (working day) or 1 (holiday, including standard week holiday). This argument is considered only if holidays_stl is set to True.
  • inplace: bool, default False
    If False, return a copy. Otherwise, do operation inplace and the method returns None.

Returns

  • pandas.DataFrame or None
    DataFrame restored or None if inplace is set to True.

Check out some example of usage of smana here.

License

BSD 3

Contributing to smana

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome. A detailed overview on how to contribute can be found in the contributing guide. As contributors and maintainers to this project, you are expected to abide by our code of conduct. More information can be found at: Contributor Code of Conduct

Go to Top

Owner

  • Name: ToBe Analytics
  • Login: ToBe-Analytics
  • Kind: organization
  • Location: Italy

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: ToBe Analytics
title: "smana: repairing tool for time series with weekly seasonality"
version: 0.1.2
date-released: 2024-06-30

GitHub Events

Total
Last Year

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 8 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 3
pypi.org: smana

Repairing tool for time series with weekly seasonality

  • Documentation: https://smana.readthedocs.io/
  • License: BSD 3-Clause License Copyright (c) 2024, ToBe Analytics. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  • Latest release: 0.1.2
    published over 2 years ago
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 8 Last month
Rankings
Dependent packages count: 9.9%
Average: 37.7%
Dependent repos count: 65.6%
Last synced: 10 months ago