https://github.com/nixtla/mlforecast

Scalable machine πŸ€– learning for time series forecasting.

https://github.com/nixtla/mlforecast

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • β—‹
    CITATION.cff file
  • βœ“
    codemeta.json file
    Found codemeta.json file
  • βœ“
    .zenodo.json file
    Found .zenodo.json file
  • β—‹
    DOI references
  • β—‹
    Academic publication links
  • β—‹
    Committers with academic emails
  • β—‹
    Institutional organization owner
  • β—‹
    JOSS paper metadata
  • β—‹
    Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary

Keywords

dask forecast forecasting lightgbm machine-learning python time-series xgboost

Keywords from Contributors

econometrics data-mining distributed arima automl baselines ets exponential-smoothing fbprophet mstl
Last synced: 5 months ago · JSON representation

Repository

Scalable machine πŸ€– learning for time series forecasting.

Basic Info
Statistics
  • Stars: 1,057
  • Watchers: 10
  • Forks: 102
  • Open Issues: 33
  • Releases: 41
Topics
dask forecast forecasting lightgbm machine-learning python time-series xgboost
Created almost 5 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Code of conduct

README.md

mlforecast

Tweet Slack

Machine Learning πŸ€– Forecast

Scalable machine learning for time series forecasting

[![CI](https://github.com/Nixtla/mlforecast/actions/workflows/ci.yaml/badge.svg)](https://github.com/Nixtla/mlforecast/actions/workflows/ci.yaml) [![Python](https://img.shields.io/pypi/pyversions/mlforecast.png)](https://pypi.org/project/mlforecast/) [![PyPi](https://img.shields.io/pypi/v/mlforecast?color=blue.png)](https://pypi.org/project/mlforecast/) [![conda-forge](https://img.shields.io/conda/vn/conda-forge/mlforecast?color=blue.png)](https://anaconda.org/conda-forge/mlforecast) [![License](https://img.shields.io/github/license/Nixtla/mlforecast.png)](https://github.com/Nixtla/mlforecast/blob/main/LICENSE) **mlforecast** is a framework to perform time series forecasting using machine learning models, with the option to scale to massive amounts of data using remote clusters.

Install

PyPI

pip install mlforecast

conda-forge

conda install -c conda-forge mlforecast

For more detailed instructions you can refer to the installation page.

Quick Start

Get Started with this quick guide.

Follow this end-to-end walkthrough for best practices.

Videos

Sample notebooks

Why?

Current Python alternatives for machine learning models are slow, inaccurate and don’t scale well. So we created a library that can be used to forecast in production environments. MLForecast includes efficient feature engineering to train any machine learning model (with fit and predict methods such as sklearn) to fit millions of time series.

Features

  • Fastest implementations of feature engineering for time series forecasting in Python.
  • Out-of-the-box compatibility with pandas, polars, spark, dask, and ray.
  • Probabilistic Forecasting with Conformal Prediction.
  • Support for exogenous variables and static covariates.
  • Familiar sklearn syntax: .fit and .predict.

Missing something? Please open an issue or write us in Slack

Examples and Guides

πŸ“š End to End Walkthrough: model training, evaluation and selection for multiple time series.

πŸ”Ž Probabilistic Forecasting: use Conformal Prediction to produce prediciton intervals.

πŸ‘©β€πŸ”¬ Cross Validation: robust model’s performance evaluation.

πŸ”Œ Predict Demand Peaks: electricity load forecasting for detecting daily peaks and reducing electric bills.

πŸ“ˆ Transfer Learning: pretrain a model using a set of time series and then predict another one using that pretrained model.

🌑️ Distributed Training: use a Dask, Ray or Spark cluster to train models at scale.

How to use

The following provides a very basic overview, for a more detailed description see the documentation.

Data setup

Store your time series in a pandas dataframe in long format, that is, each row represents an observation for a specific serie and timestamp.

``` python from mlforecast.utils import generatedailyseries

series = generatedailyseries( nseries=20, maxlength=100, nstaticfeatures=1, staticascategorical=False, with_trend=True ) series.head() ```

| | uniqueid | ds | y | static0 | |-----|-----------|------------|------------|----------| | 0 | id00 | 2000-01-01 | 17.519167 | 72 | | 1 | id00 | 2000-01-02 | 87.799695 | 72 | | 2 | id00 | 2000-01-03 | 177.442975 | 72 | | 3 | id00 | 2000-01-04 | 232.704110 | 72 | | 4 | id_00 | 2000-01-05 | 317.510474 | 72 |

Note: The unique_id serves as an identifier for each distinct time series in your dataset. If you are using only single time series from your dataset, set this column to a constant value.

Models

Next define your models, each one will be trained on all series. These can be any regressor that follows the scikit-learn API.

python import lightgbm as lgb from sklearn.linear_model import LinearRegression

python models = [ lgb.LGBMRegressor(random_state=0, verbosity=-1), LinearRegression(), ]

Forecast object

Now instantiate an MLForecast object with the models and the features that you want to use. The features can be lags, transformations on the lags and date features. You can also define transformations to apply to the target before fitting, which will be restored when predicting.

python from mlforecast import MLForecast from mlforecast.lag_transforms import ExpandingMean, RollingMean from mlforecast.target_transforms import Differences

python fcst = MLForecast( models=models, freq='D', lags=[7, 14], lag_transforms={ 1: [ExpandingMean()], 7: [RollingMean(window_size=28)] }, date_features=['dayofweek'], target_transforms=[Differences([1])], )

Training

To compute the features and train the models call fit on your Forecast object.

python fcst.fit(series)

MLForecast(models=[LGBMRegressor, LinearRegression], freq=D, lag_features=['lag7', 'lag14', 'expanding_mean_lag1', 'rolling_mean_lag7_window_size28'], date_features=['dayofweek'], num_threads=1)

Predicting

To get the forecasts for the next n days call predict(n) on the forecast object. This will automatically handle the updates required by the features using a recursive strategy.

python predictions = fcst.predict(14) predictions

| | uniqueid | ds | LGBMRegressor | LinearRegression | |-----|-----------|------------|---------------|------------------| | 0 | id00 | 2000-04-04 | 299.923771 | 311.432371 | | 1 | id00 | 2000-04-05 | 365.424147 | 379.466214 | | 2 | id00 | 2000-04-06 | 432.562441 | 460.234028 | | 3 | id00 | 2000-04-07 | 495.628000 | 524.278924 | | 4 | id00 | 2000-04-08 | 60.786223 | 79.828767 | | ... | ... | ... | ... | ... | | 275 | id19 | 2000-03-23 | 36.266780 | 28.333215 | | 276 | id19 | 2000-03-24 | 44.370984 | 33.368228 | | 277 | id19 | 2000-03-25 | 50.746222 | 38.613001 | | 278 | id19 | 2000-03-26 | 58.906524 | 43.447398 | | 279 | id_19 | 2000-03-27 | 63.073949 | 48.666783 |

280 rows Γ— 4 columns

Visualize results

python from utilsforecast.plotting import plot_series

python fig = plot_series(series, predictions, max_ids=4, plot_random=False)

How to contribute

See CONTRIBUTING.md.

Owner

  • Name: Nixtla
  • Login: Nixtla
  • Kind: organization
  • Email: ops@nixtla.io
  • Location: United States of America

Open Source Time Series Ecosystem

GitHub Events

Total
  • Create event: 35
  • Release event: 7
  • Issues event: 51
  • Watch event: 168
  • Delete event: 26
  • Member event: 1
  • Issue comment event: 139
  • Push event: 114
  • Pull request review event: 7
  • Pull request review comment event: 4
  • Pull request event: 55
  • Fork event: 15
Last Year
  • Create event: 35
  • Release event: 7
  • Issues event: 51
  • Watch event: 168
  • Delete event: 26
  • Member event: 1
  • Issue comment event: 139
  • Push event: 114
  • Pull request review event: 7
  • Pull request review comment event: 4
  • Pull request event: 55
  • Fork event: 15

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 146
  • Total Committers: 6
  • Avg Commits per committer: 24.333
  • Development Distribution Score (DDS): 0.425
Past Year
  • Commits: 100
  • Committers: 3
  • Avg Commits per committer: 33.333
  • Development Distribution Score (DDS): 0.22
Top Committers
Name Email Commits
JosΓ© Morales j****2@g****m 84
JosΓ© Morales j****s@g****m 26
fede f****z@g****m 21
capybara 6****z@u****m 11
dependabot[bot] 4****]@u****m 3
Max Mergenthaler m****m@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 170
  • Total pull requests: 251
  • Average time to close issues: 19 days
  • Average time to close pull requests: 6 days
  • Total issue authors: 114
  • Total pull request authors: 17
  • Average comments per issue: 2.6
  • Average comments per pull request: 1.42
  • Merged pull requests: 221
  • Bot issues: 0
  • Bot pull requests: 27
Past Year
  • Issues: 42
  • Pull requests: 50
  • Average time to close issues: 14 days
  • Average time to close pull requests: 3 days
  • Issue authors: 36
  • Pull request authors: 6
  • Average comments per issue: 1.88
  • Average comments per pull request: 1.36
  • Merged pull requests: 40
  • Bot issues: 0
  • Bot pull requests: 24
Top Authors
Issue Authors
  • jmoralez (14)
  • iamyihwa (8)
  • kkckk1110 (8)
  • pst2154 (4)
  • FedericoGarza (4)
  • matsuobasho (3)
  • ncooder (3)
  • adriaanvh1 (3)
  • braaannigan (2)
  • MrTangsai (2)
  • NudnikShpilkis (2)
  • DsDev1 (2)
  • Sandy4321 (2)
  • tblume1992 (2)
  • SyedKumailHussainNaqvi (2)
Pull Request Authors
  • jmoralez (202)
  • dependabot[bot] (39)
  • FedericoGarza (32)
  • Naren8520 (12)
  • adriaanvh1 (4)
  • deven367 (3)
  • Ammar-Azman (2)
  • tblume1992 (2)
  • rpmccarter (2)
  • tracykteal (2)
  • mergenthaler (1)
  • MarcoGorelli (1)
  • hahnbeelee (1)
  • christian-adam (1)
  • PierD86 (1)
Top Labels
Issue Labels
bug (64) enhancement (50) feature (50) awaiting response (32) documentation (11) discussion (1) good first issue (1) dependencies (1)
Pull Request Labels
feature (47) enhancement (46) dependencies (44) fix (36) documentation (23) breaking change (9) breaking (6) maintenance (4) github_actions (4) bug (1)

Dependencies

.github/workflows/ci.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • mamba-org/provision-with-micromamba main composite
.github/workflows/deploy.yaml actions
  • fastai/workflows/quarto-ghp master composite
.github/workflows/lint.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/release.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • pypa/gh-action-pypi-publish master composite
environment.yml conda
  • dask <2023.1.1
  • holidays <0.21
  • lightgbm
  • matplotlib
  • nbformat
  • numba
  • pandas
  • pip
  • prophet
  • pyspark >=3.3
  • scikit-learn
  • shap
  • statsmodels
  • window-ops
  • xgboost
.github/workflows/no-response.yaml actions
  • lee-dohm/no-response v0.5.0 composite
.github/workflows/release-drafter.yml actions
  • release-drafter/release-drafter v5 composite
pyproject.toml pypi
setup.py pypi